date:20140813


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095331#comment-14095331
 ] 

Andreas Lehmkühler edited comment on PDFBOX-2261 at 8/13/14 9:53 AM:
-

Maybe I wasn't specific enough 

I've understood that according to the spec everything is fine with the 
structure of the pdf. But the mentiond piece of code seems wrong to me
{code}
private static boolean isButton(PDAcroForm form, COSDictionary field) 
throws IOException
{
String fieldType = PDField.findFieldType(field);
ListCOSObjectable kids = PDField.getKids(form, field);
if (fieldType == null  kids != null  !kids.isEmpty())
{
// sometimes if it is a button the type is only defined by one of 
the kids entries
// TODO JH: this is due to inheritance, we need proper support for 
non-terminal fields

COSDictionary kid = (COSDictionary)kids.get(0).getCOSObject();
return isButton(form, kid);
}
return Btn.equals(fieldType);
}
{code}
The question is, does it make sense to search for a button field typ among the 
child nodes if the parent node hasn't any field type? IMHO not but maybe I'm 
missing something.


was (Author: lehmi):
Maybe I wasn't specific enough 

I've understood that according to the spec everything is fine with the 
structure of the pdf. But the mentiond piece of code seems wrong to me
{code}
private static boolean isButton(PDAcroForm form, COSDictionary field) 
throws IOException
{
String fieldType = PDField.findFieldType(field);
ListCOSObjectable kids = PDField.getKids(form, field);
if (fieldType == null  kids != null  !kids.isEmpty())
{
// sometimes if it is a button the type is only defined by one of 
the kids entries
// TODO JH: this is due to inheritance, we need proper support for 
non-terminal fields

COSDictionary kid = (COSDictionary)kids.get(0).getCOSObject();
return isButton(form, kid);
}
return Btn.equals(fieldType);
}
{code}
The question is, does it make sense to search for a button field typ among the 
child nodes if the parent node hasn't any field type?

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Priority: Minor
 Attachments: 966679.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095331#comment-14095331
 ] 

Andreas Lehmkühler commented on PDFBOX-2261:


Maybe I wasn't specific enough 

I've understood that according to the spec everything is fine with the 
structure of the pdf. But the mentiond piece of code seems wrong to me
{code}
private static boolean isButton(PDAcroForm form, COSDictionary field) 
throws IOException
{
String fieldType = PDField.findFieldType(field);
ListCOSObjectable kids = PDField.getKids(form, field);
if (fieldType == null  kids != null  !kids.isEmpty())
{
// sometimes if it is a button the type is only defined by one of 
the kids entries
// TODO JH: this is due to inheritance, we need proper support for 
non-terminal fields

COSDictionary kid = (COSDictionary)kids.get(0).getCOSObject();
return isButton(form, kid);
}
return Btn.equals(fieldType);
}
{code}
The question is, does it make sense to search for a button field typ among the 
child nodes if the parent node hasn't any field type?

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Priority: Minor
 Attachments: 966679.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-1511) pdfMerger App produces Garbage


[ 
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095353#comment-14095353
 ] 

Andreas Lehmkühler commented on PDFBOX-1511:


Identically named resources are problematic if 2 or more of the pdfs to be 
merged are using global resources and if the merger merges the page related 
resources and the global resources separately as it did befroe the patch. The 
proposed patch merges by using findResources() instead of getResources() 
the global and the page specific resources _before_ adding them to the page 
itself, so that there aren't any duplicted names anymore. I don't know if that 
was intended in the first place but it solves the problem :-) OTOH pdfs using 
global resources will grow after merging as all resources are multiplied. But 
AFAIKT global resources aren't used that often.



 pdfMerger App produces Garbage
 --

 Key: PDFBOX-1511
 URL: https://issues.apache.org/jira/browse/PDFBOX-1511
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.7.1
 Environment: Win XP; Windows Server 2008 R2; java version 1.6.0_21, 
Reporter: Michael Huber
 Fix For: 1.8.7, 2.0.0

 Attachments: 1.pdf, 2.pdf, PDFMergerUtility.java, 
 PDFMergerUtility.java.diff, PdfRenderer.java, targetPdfMergeJava.pdf, 
 targetPdfMergeUtilityApp.pdf


 pdfbox Utility pdfMerger produces a merged document containing garbage. All 
 merged pdf files are contained but Strings are destroyed.
 The source pdf files are created with graphviz and are readable without error 
 or disturbance both with Acrobat X and pdfbox pdfDebug Utility.
 Another astounding thing is that a handcoded merger using pdfMergerUtility 
 class works fine when run within Eclipse Juno and creates same garbage when 
 run from cmd line (pls. see attached source PdfRenderer.java)
 I checked everything that comes in mind to find the differences, e.g. Java 
 version, encoding/codepage issues, memory settings, found nothing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-1511) pdfMerger App produces Garbage

[
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095353#comment-14095353
]

Andreas Lehmkühler edited comment on PDFBOX-1511 at 8/13/14 11:03 AM:
--

Identically named resources are problematic if 2 or more of the pdfs to be
merged are using global resources and if the merger merges the page related
resources and the global resources separately as it did befroe the patch.
The proposed patch merges by using findResources() instead of
getResources() the global and the page specific resources _before_ adding
them to the page itself, so that there aren't any duplicted names anymore. I
don't know if that was intended in the first place but it solves the problem
:-)
OTOH pdfs using global resources will grow after merging as all resources are
multiplied. But AFAIKT global resources aren't used that often.

was (Author: lehmi):
Identically named resources are problematic if 2 or more of the pdfs to be
merged are using global resources and if the merger merges the page related
resources and the global resources separately as it did befroe the patch. The
proposed patch merges by using findResources() instead of getResources()
the global and the page specific resources _before_ adding them to the page
itself, so that there aren't any duplicted names anymore. I don't know if that
was intended in the first place but it solves the problem :-) OTOH pdfs using
global resources will grow after merging as all resources are multiplied. But
AFAIKT global resources aren't used that often.

pdfMerger App produces Garbage
--

Key: PDFBOX-1511
URL: https://issues.apache.org/jira/browse/PDFBOX-1511
Project: PDFBox
Issue Type: Bug
Components: Utilities
Affects Versions: 1.7.1
Environment: Win XP; Windows Server 2008 R2; java version 1.6.0_21,
Reporter: Michael Huber
Fix For: 1.8.7, 2.0.0

Attachments: 1.pdf, 2.pdf, PDFMergerUtility.java,
PDFMergerUtility.java.diff, PdfRenderer.java, targetPdfMergeJava.pdf,
targetPdfMergeUtilityApp.pdf

pdfbox Utility pdfMerger produces a merged document containing garbage. All
merged pdf files are contained but Strings are destroyed.
The source pdf files are created with graphviz and are readable without error
or disturbance both with Acrobat X and pdfbox pdfDebug Utility.
Another astounding thing is that a handcoded merger using pdfMergerUtility
class works fine when run within Eclipse Juno and creates same garbage when
run from cmd line (pls. see attached source PdfRenderer.java)
I checked everything that comes in mind to find the differences, e.g. Java
version, encoding/codepage issues, memory settings, found nothing.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


 [ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maruan Sahyoun updated PDFBOX-2261:
---

Attachment: RadioButtons.pdf

Sample form with RadioButtons and Pushbuttons to clarify the behavior. As can 
be seen the parent does not have the field type setting. This is fine as the 
parents in this case are acting as groups for the containing and nested fields. 

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Priority: Minor
 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095419#comment-14095419
 ] 

Maruan Sahyoun edited comment on PDFBOX-2261 at 8/13/14 12:35 PM:
--

if it’s a non terminal field wo a field type there is no need to lookup the 
field type for it IMHO. Maybe change it so if it’s a terminal field for 
inheritable attributes, such as field type, we do something like

{code}
field.getInheritableAttribute(‚FT‘)
{code}

which would look up the parent hierachy if the attribute is not part of the 
fields dictionary. WDYT


was (Author: msahyoun):
if it’s a non terminal field wo a field type there is no need to lookup the 
field type for it IMHO. Maybe change it so if it’s a terminal field for 
inheritable attributes, such as field type, we do something like

{{code}}
field.getInheritableAttribute(‚FT‘)
{{code}}

which would look up the parent hierachy if the attribute is not part of the 
fields dictionary. WDYT

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Priority: Minor
 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095419#comment-14095419
 ] 

Maruan Sahyoun commented on PDFBOX-2261:


if it’s a non terminal field wo a field type there is no need to lookup the 
field type for it IMHO. Maybe change it so if it’s a terminal field for 
inheritable attributes, such as field type, we do something like

{{code}}
field.getInheritableAttribute(‚FT‘)
{{code}}

which would look up the parent hierachy if the attribute is not part of the 
fields dictionary. WDYT

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Priority: Minor
 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


 [ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler reassigned PDFBOX-2261:
--

Assignee: Andreas Lehmkühler

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


 [ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-2261:
---

Fix Version/s: 2.0.0

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095639#comment-14095639
 ] 

Maruan Sahyoun commented on PDFBOX-2261:


Wouldn’t it be good to start using enums for field type and flags?

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095648#comment-14095648
 ] 

Andreas Lehmkühler commented on PDFBOX-2261:


In a first step I've removed the recursion, which didn't make sense. Now 
PrintFields comes up with a result within seconds. But the result is 
incomplete. All top-level fields which are simple dictionaries without a field 
type are discarded. I'm working on a solution


 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Build failed in Jenkins: PDFBox-trunk » PDFBox parent #1200

2014-08-13 Thread Apache Jenkins Server

See 
https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$pdfbox-parent/1200/

--
maven3-agent.jar already up to date
maven3-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
===[JENKINS REMOTING CAPACITY]===   channel started
log4j:WARN No appenders could be found for logger 
(org.apache.commons.beanutils.converters.BooleanConverter).
log4j:WARN Please initialize the log4j system properly.
Executing Maven:  -B -f 
/home/jenkins/jenkins-slave/workspace/PDFBox-trunk/trunk/pom.xml 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/maven-repositories/0 clean 
deploy -Ppedantic
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] PDFBox parent
[INFO] Apache FontBox
[INFO] Apache JempBox
[INFO] Apache XmpBox
[INFO] Apache PDFBox
[INFO] Apache Preflight
[INFO] Apache Preflight application
[INFO] Apache PDFBox tools
[INFO] Apache PDFBox application
[INFO] Apache PDFBox examples
[INFO] PDFBox reactor
[INFO] 
[INFO] 
[INFO] Building PDFBox parent 2.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ pdfbox-parent ---
[TASKS] Scanning folder 
'https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$pdfbox-parent/ws/'
 for files matching the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #1199
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ pdfbox-parent 
---
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- apache-rat-plugin:0.10:check (default) @ pdfbox-parent ---
[INFO] 51 implicit excludes (use -debug for more details).
[INFO] Exclude: release.properties
[INFO] 1 resources included (use -debug for more details)
[INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0 generated: 0 
approved: 1 licence.
[INFO] 
[INFO] --- maven-install-plugin:2.5.1:install (default-install) @ pdfbox-parent 
---
[INFO] Installing 
https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$pdfbox-parent/ws/pom.xml
 to 
/home/jenkins/jenkins-slave/maven-repositories/0/org/apache/pdfbox/pdfbox-parent/2.0.0-SNAPSHOT/pdfbox-parent-2.0.0-SNAPSHOT.pom
[INFO] 
[INFO] --- maven-deploy-plugin:2.8.1:deploy (default-deploy) @ pdfbox-parent ---
Downloading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-parent/2.0.0-SNAPSHOT/maven-metadata.xml
Downloaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-parent/2.0.0-SNAPSHOT/maven-metadata.xml
 (611 B at 0.0 KB/sec)
Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-parent/2.0.0-SNAPSHOT/pdfbox-parent-2.0.0-20140813.160457-535.pom
Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-parent/2.0.0-SNAPSHOT/pdfbox-parent-2.0.0-20140813.160457-535.pom
 (12 KB at 0.2 KB/sec)
Downloading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-parent/maven-metadata.xml
Downloaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-parent/maven-metadata.xml
 (390 B at 0.0 KB/sec)
Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-parent/2.0.0-SNAPSHOT/maven-metadata.xml
Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-parent/2.0.0-SNAPSHOT/maven-metadata.xml
 (611 B at 0.2 KB/sec)
Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-parent/maven-metadata.xml

[jira] [Commented] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095841#comment-14095841
 ] 

John Hewson commented on PDFBOX-2261:
-

I encountered this issue in PDFBOX-2164 but only added a workaround for a 
specific NPE. The recursive approach used by PDField was indeed incorrect, the 
PDF spec explains why:

{quote}
For purposes of definition and naming, the fields can be organized 
hierarchically and can inherit attributes from their ancestors in the field 
hierarchy
{quote}

It seems that the problem with PDFBox's current design is that each node in the 
field tree is represented by a PDField, however not every node in the field 
tree is really a field, some nodes are just there to organise the tree 
structure. One solution would be to have PDAcroForm read the field tree and 
have it produce a MapString, PDField of named fields, with all of the 
inheritance taken into account. Another solution would be to have fields be 
aware of their parent in the field tree and look-up appropriate values (this 
would preserve the field tree structure between writes), but the parent node 
should not be a PDField (!!!) it should be PDNonTerminalField or some similar 
new class, the PDF spec is clear on this:

{quote}
A non-terminal field does not logically have a type of its own; it is merely  a 
container for inheritable attributes that are intended for descendant  terminal 
fields of any type. 
{quote}

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095841#comment-14095841
 ] 

John Hewson edited comment on PDFBOX-2261 at 8/13/14 6:15 PM:
--

I encountered this issue in PDFBOX-2164 but only added a workaround for a 
specific NPE. The recursive approach used by PDField was indeed incorrect, the 
PDF spec explains why:

{quote}
For purposes of definition and naming, the fields can be organized 
hierarchically and can inherit attributes from their ancestors in the field 
hierarchy
{quote}

It seems that the problem with PDFBox's current design is that each node in the 
field tree is represented by a PDField, however not every node in the field 
tree is really a field, some nodes are just there to organise the tree 
structure. One solution would be to have PDAcroForm read the field tree and 
have it produce a MapString, PDField of named fields, with all of the 
inheritance taken into account. Another solution would be to have fields be 
aware of their parent in the field tree and look-up appropriate values (this 
would preserve the field tree structure between writes), but the parent node 
should not be a PDField (!!!) it should be PDNonTerminalField* or some similar 
new class, the PDF spec is clear on this:

{quote}
A non-terminal field does not logically have a type of its own; it is merely  a 
container for inheritable attributes that are intended for descendant  terminal 
fields of any type. 
{quote}

\* Any new PDNonTerminalField class should not inherit from PDField, either.


was (Author: jahewson):
I encountered this issue in PDFBOX-2164 but only added a workaround for a 
specific NPE. The recursive approach used by PDField was indeed incorrect, the 
PDF spec explains why:

{quote}
For purposes of definition and naming, the fields can be organized 
hierarchically and can inherit attributes from their ancestors in the field 
hierarchy
{quote}

It seems that the problem with PDFBox's current design is that each node in the 
field tree is represented by a PDField, however not every node in the field 
tree is really a field, some nodes are just there to organise the tree 
structure. One solution would be to have PDAcroForm read the field tree and 
have it produce a MapString, PDField of named fields, with all of the 
inheritance taken into account. Another solution would be to have fields be 
aware of their parent in the field tree and look-up appropriate values (this 
would preserve the field tree structure between writes), but the parent node 
should not be a PDField (!!!) it should be PDNonTerminalField* or some similar 
new class, the PDF spec is clear on this:

{quote}
A non-terminal field does not logically have a type of its own; it is merely  a 
container for inheritable attributes that are intended for descendant  terminal 
fields of any type. 
{quote}

* Any new PDNonTerminalField class should not inherit from PDField, either.

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095841#comment-14095841
 ] 

John Hewson edited comment on PDFBOX-2261 at 8/13/14 6:15 PM:
--

I encountered this issue in PDFBOX-2164 but only added a workaround for a 
specific NPE. The recursive approach used by PDField was indeed incorrect, the 
PDF spec explains why:

{quote}
For purposes of definition and naming, the fields can be organized 
hierarchically and can inherit attributes from their ancestors in the field 
hierarchy
{quote}

It seems that the problem with PDFBox's current design is that each node in the 
field tree is represented by a PDField, however not every node in the field 
tree is really a field, some nodes are just there to organise the tree 
structure. One solution would be to have PDAcroForm read the field tree and 
have it produce a MapString, PDField of named fields, with all of the 
inheritance taken into account. Another solution would be to have fields be 
aware of their parent in the field tree and look-up appropriate values (this 
would preserve the field tree structure between writes), but the parent node 
should not be a PDField (!!!) it should be PDNonTerminalField* or some similar 
new class, the PDF spec is clear on this:

{quote}
A non-terminal field does not logically have a type of its own; it is merely  a 
container for inheritable attributes that are intended for descendant  terminal 
fields of any type. 
{quote}

* Any new PDNonTerminalField class should not inherit from PDField, either.


was (Author: jahewson):
I encountered this issue in PDFBOX-2164 but only added a workaround for a 
specific NPE. The recursive approach used by PDField was indeed incorrect, the 
PDF spec explains why:

{quote}
For purposes of definition and naming, the fields can be organized 
hierarchically and can inherit attributes from their ancestors in the field 
hierarchy
{quote}

It seems that the problem with PDFBox's current design is that each node in the 
field tree is represented by a PDField, however not every node in the field 
tree is really a field, some nodes are just there to organise the tree 
structure. One solution would be to have PDAcroForm read the field tree and 
have it produce a MapString, PDField of named fields, with all of the 
inheritance taken into account. Another solution would be to have fields be 
aware of their parent in the field tree and look-up appropriate values (this 
would preserve the field tree structure between writes), but the parent node 
should not be a PDField (!!!) it should be PDNonTerminalField or some similar 
new class, the PDF spec is clear on this:

{quote}
A non-terminal field does not logically have a type of its own; it is merely  a 
container for inheritable attributes that are intended for descendant  terminal 
fields of any type. 
{quote}

Note: Any new PDNonTerminalField class should not inherit from PDField, either.

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095841#comment-14095841
 ] 

John Hewson edited comment on PDFBOX-2261 at 8/13/14 6:18 PM:
--

I encountered this issue in PDFBOX-2164 but only added a workaround for a 
specific NPE. The recursive approach used by PDField was indeed incorrect, the 
PDF spec explains why:

{quote}
For purposes of definition and naming, the fields can be organized 
hierarchically and can inherit attributes from their ancestors in the field 
hierarchy
{quote}

It seems that the problem with PDFBox's current design is that each node in the 
field tree is represented by a PDField, however not every node in the field 
tree is really a field, some nodes are just there to organise the tree 
structure. One solution would be to have PDAcroForm read the field tree and 
have it produce a MapString, PDField of named fields, with all of the 
inheritance taken into account. Another solution would be to have fields be 
aware of their parent in the field tree and look-up appropriate values (this 
would preserve the field tree structure between writes), but the parent node 
should not be a PDField (!!!) it should be PDNonTerminalField* or some similar 
new class, the PDF spec is clear on this:

{quote}
A non-terminal field does not logically have a type of its own; it is merely  a 
container for inheritable attributes that are intended for descendant  terminal 
fields of any type. 
{quote}

\* Any new PDNonTerminalField class should probably not inherit from PDField, 
either.


was (Author: jahewson):
I encountered this issue in PDFBOX-2164 but only added a workaround for a 
specific NPE. The recursive approach used by PDField was indeed incorrect, the 
PDF spec explains why:

{quote}
For purposes of definition and naming, the fields can be organized 
hierarchically and can inherit attributes from their ancestors in the field 
hierarchy
{quote}

It seems that the problem with PDFBox's current design is that each node in the 
field tree is represented by a PDField, however not every node in the field 
tree is really a field, some nodes are just there to organise the tree 
structure. One solution would be to have PDAcroForm read the field tree and 
have it produce a MapString, PDField of named fields, with all of the 
inheritance taken into account. Another solution would be to have fields be 
aware of their parent in the field tree and look-up appropriate values (this 
would preserve the field tree structure between writes), but the parent node 
should not be a PDField (!!!) it should be PDNonTerminalField* or some similar 
new class, the PDF spec is clear on this:

{quote}
A non-terminal field does not logically have a type of its own; it is merely  a 
container for inheritable attributes that are intended for descendant  terminal 
fields of any type. 
{quote}

\* Any new PDNonTerminalField class should not inherit from PDField, either.

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095967#comment-14095967
 ] 

Andreas Lehmkühler commented on PDFBOX-2261:


I agree with John, we have to introduce a new class like PDNonTerminalField 
(mine is called PDFieldDictionary) and yes it shouldn't inherit from PDField. 
It should be the other way around. PDField contains most of the inheritable 
values which should be moved to the new class and the the others, such as T, 
TU, TM and AA should be left in PDField. Kids, Parent and FT should be moved 
too.
That would follow the spec and each node of the tree would be represented by a 
single object.

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-1511) pdfMerger App produces Garbage


[ 
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095996#comment-14095996
 ] 

Tilman Hausherr commented on PDFBOX-1511:
-

To verify this we need
- a file with global resources
- to create an uncompressed copy where we shuffle the names of the resources
- merge it and see what happens.
I did try it and no mayhem followed, there were no longer global resources in 
the merged file. However I can't share the file (PDFBOX-2048) but I need one 
that I can attach it here so that Michael and Kirk can also have a look. Now 
that GSoC2014 is done and weather is less warm I'll run my tests on the 
digitalcorpora site until I hit a file with global resources.



{code}
PDResources globalRes = document.getDocumentCatalog().getPages().getResources();
if (globalRes != null)
{
System.out.println (global resources size:  + 
globalRes.getXObjects().size());
for (String key : globalRes.getXObjects().keySet())
{
System.out.println (global resource:  + key);
}
}
else
System.out.println (no global resources);
{code}


 pdfMerger App produces Garbage
 --

 Key: PDFBOX-1511
 URL: https://issues.apache.org/jira/browse/PDFBOX-1511
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.7.1
 Environment: Win XP; Windows Server 2008 R2; java version 1.6.0_21, 
Reporter: Michael Huber
 Fix For: 1.8.7, 2.0.0

 Attachments: 1.pdf, 2.pdf, PDFMergerUtility.java, 
 PDFMergerUtility.java.diff, PdfRenderer.java, targetPdfMergeJava.pdf, 
 targetPdfMergeUtilityApp.pdf


 pdfbox Utility pdfMerger produces a merged document containing garbage. All 
 merged pdf files are contained but Strings are destroyed.
 The source pdf files are created with graphviz and are readable without error 
 or disturbance both with Acrobat X and pdfbox pdfDebug Utility.
 Another astounding thing is that a handcoded merger using pdfMergerUtility 
 class works fine when run within Eclipse Juno and creates same garbage when 
 run from cmd line (pls. see attached source PdfRenderer.java)
 I checked everything that comes in mind to find the differences, e.g. Java 
 version, encoding/codepage issues, memory settings, found nothing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (PDFBOX-1511) pdfMerger App produces Garbage

[
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tilman Hausherr updated PDFBOX-1511:

Attachment: 078117u2.pdf
078117u1.pdf

The two 078117*.pdf files have global resources. The difference between the two
files is that I have swapped /F1 with F9, and /Im11 with /Im14 everywhere in
the second file. I can't attach the result file after merge, but it displays
fine, try it yourself. The merged file has no global resources.

pdfMerger App produces Garbage
--

Attachments: 078117u1.pdf, 078117u2.pdf, 1.pdf, 2.pdf,
PDFMergerUtility.java, PDFMergerUtility.java.diff, PdfRenderer.java,
targetPdfMergeJava.pdf, targetPdfMergeUtilityApp.pdf

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-1511) pdfMerger App produces Garbage

[
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096029#comment-14096029
]

Tilman Hausherr edited comment on PDFBOX-1511 at 8/13/14 8:12 PM:
--

The two 078117*.pdf files have global resources. The difference between the two
files is that I have swapped /F1 with F9, and /Im11 with /Im14 everywhere in
the second file. I can't attach the result file after merge because it is too
large, but it displays fine, try it yourself. The merged file has no global
resources.

was (Author: tilman):
The two 078117*.pdf files have global resources. The difference between the two
files is that I have swapped /F1 with F9, and /Im11 with /Im14 everywhere in
the second file. I can't attach the result file after merge, but it displays
fine, try it yourself. The merged file has no global resources.

pdfMerger App produces Garbage
--

Attachments: 078117u1.pdf, 078117u2.pdf, 1.pdf, 2.pdf,
PDFMergerUtility.java, PDFMergerUtility.java.diff, PdfRenderer.java,
targetPdfMergeJava.pdf, targetPdfMergeUtilityApp.pdf

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-1511) pdfMerger App produces Garbage


[ 
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096071#comment-14096071
 ] 

Tilman Hausherr edited comment on PDFBOX-1511 at 8/13/14 8:33 PM:
--

Another file (078118.pdf), with global xobject resources:
global resource: Im78
global resource: Im79
global resource: Im60
global resource: Im61
global resource: Im75
global resource: Im39
global resource: Im11
global resource: Im12
global resource: Im36
global resource: Im80
global resource: Im53
global resource: Im83
global resource: Im16
global resource: Im33
global resource: Im32
global resource: Im31
global resource: Im57
global resource: Im19
global resource: Im56
global resource: Im68
global resource: Im84
global resource: Im65
global resource: Im62
global resource: Im50
global resource: Im4
global resource: Im49
global resource: Im27
global resource: Im26
global resource: Im3
global resource: Im28
global resource: Im72
global resource: Im71
global resource: Im43
global resource: Im42

The 078117u1.pdf file has these global xobject resources:

global resource: Im96
global resource: Im73
global resource: Im59
global resource: Im14
global resource: Tr22
global resource: Im11
global resource: Im17
global resource: Im18
global resource: Im35
global resource: Im53
global resource: Im15
global resource: Im34
global resource: Im83
global resource: Im16
global resource: Im82
global resource: Im58
global resource: Im57
global resource: Im30
global resource: Im87
global resource: Im84
global resource: Im66
global resource: Im67
global resource: Im88
global resource: Im89
global resource: Im5
global resource: Im6
global resource: Im29
global resource: Im28
global resource: Im23
global resource: Im41
global resource: Im90
global resource: Im72
global resource: Im40
global resource: Im71
global resource: Im24
global resource: Im45
global resource: Im21
global resource: Im46

Merging them works too.


was (Author: tilman):
Another file, with global xobject resources:
global resource: Im78
global resource: Im79
global resource: Im60
global resource: Im61
global resource: Im75
global resource: Im39
global resource: Im11
global resource: Im12
global resource: Im36
global resource: Im80
global resource: Im53
global resource: Im83
global resource: Im16
global resource: Im33
global resource: Im32
global resource: Im31
global resource: Im57
global resource: Im19
global resource: Im56
global resource: Im68
global resource: Im84
global resource: Im65
global resource: Im62
global resource: Im50
global resource: Im4
global resource: Im49
global resource: Im27
global resource: Im26
global resource: Im3
global resource: Im28
global resource: Im72
global resource: Im71
global resource: Im43
global resource: Im42

The 078117u1.pdf file has these global xobject resources:

global resource: Im96
global resource: Im73
global resource: Im59
global resource: Im14
global resource: Tr22
global resource: Im11
global resource: Im17
global resource: Im18
global resource: Im35
global resource: Im53
global resource: Im15
global resource: Im34
global resource: Im83
global resource: Im16
global resource: Im82
global resource: Im58
global resource: Im57
global resource: Im30
global resource: Im87
global resource: Im84
global resource: Im66
global resource: Im67
global resource: Im88
global resource: Im89
global resource: Im5
global resource: Im6
global resource: Im29
global resource: Im28
global resource: Im23
global resource: Im41
global resource: Im90
global resource: Im72
global resource: Im40
global resource: Im71
global resource: Im24
global resource: Im45
global resource: Im21
global resource: Im46

Merging them works too.

 pdfMerger App produces Garbage
 --

 Key: PDFBOX-1511
 URL: https://issues.apache.org/jira/browse/PDFBOX-1511
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.7.1
 Environment: Win XP; Windows Server 2008 R2; java version 1.6.0_21, 
Reporter: Michael Huber
 Fix For: 1.8.7, 2.0.0

 Attachments: 078117u1.pdf, 078117u2.pdf, 078118.pdf, 1.pdf, 2.pdf, 
 PDFMergerUtility.java, PDFMergerUtility.java.diff, PdfRenderer.java, 
 targetPdfMergeJava.pdf, targetPdfMergeUtilityApp.pdf


 pdfbox Utility pdfMerger produces a merged document containing garbage. All 
 merged pdf files are contained but Strings are destroyed.
 The source pdf files are created with graphviz and are readable without error 
 or disturbance both with Acrobat X and pdfbox pdfDebug Utility.
 Another astounding thing is that a handcoded merger using pdfMergerUtility 
 class works fine when run within Eclipse Juno and creates same garbage when 
 run from cmd line (pls. see attached source PdfRenderer.java)
 I checked everything that comes in mind to find the differences, e.g. Java 
 version,

[jira] [Updated] (PDFBOX-1511) pdfMerger App produces Garbage


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-1511:


Attachment: 078118.pdf

Another file, with global xobject resources:
global resource: Im78
global resource: Im79
global resource: Im60
global resource: Im61
global resource: Im75
global resource: Im39
global resource: Im11
global resource: Im12
global resource: Im36
global resource: Im80
global resource: Im53
global resource: Im83
global resource: Im16
global resource: Im33
global resource: Im32
global resource: Im31
global resource: Im57
global resource: Im19
global resource: Im56
global resource: Im68
global resource: Im84
global resource: Im65
global resource: Im62
global resource: Im50
global resource: Im4
global resource: Im49
global resource: Im27
global resource: Im26
global resource: Im3
global resource: Im28
global resource: Im72
global resource: Im71
global resource: Im43
global resource: Im42

The 078117u1.pdf file has these global xobject resources:

global resource: Im96
global resource: Im73
global resource: Im59
global resource: Im14
global resource: Tr22
global resource: Im11
global resource: Im17
global resource: Im18
global resource: Im35
global resource: Im53
global resource: Im15
global resource: Im34
global resource: Im83
global resource: Im16
global resource: Im82
global resource: Im58
global resource: Im57
global resource: Im30
global resource: Im87
global resource: Im84
global resource: Im66
global resource: Im67
global resource: Im88
global resource: Im89
global resource: Im5
global resource: Im6
global resource: Im29
global resource: Im28
global resource: Im23
global resource: Im41
global resource: Im90
global resource: Im72
global resource: Im40
global resource: Im71
global resource: Im24
global resource: Im45
global resource: Im21
global resource: Im46

Merging them works too.

 pdfMerger App produces Garbage
 --

 Key: PDFBOX-1511
 URL: https://issues.apache.org/jira/browse/PDFBOX-1511
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.7.1
 Environment: Win XP; Windows Server 2008 R2; java version 1.6.0_21, 
Reporter: Michael Huber
 Fix For: 1.8.7, 2.0.0

 Attachments: 078117u1.pdf, 078117u2.pdf, 078118.pdf, 1.pdf, 2.pdf, 
 PDFMergerUtility.java, PDFMergerUtility.java.diff, PdfRenderer.java, 
 targetPdfMergeJava.pdf, targetPdfMergeUtilityApp.pdf


 pdfbox Utility pdfMerger produces a merged document containing garbage. All 
 merged pdf files are contained but Strings are destroyed.
 The source pdf files are created with graphviz and are readable without error 
 or disturbance both with Acrobat X and pdfbox pdfDebug Utility.
 Another astounding thing is that a handcoded merger using pdfMergerUtility 
 class works fine when run within Eclipse Juno and creates same garbage when 
 run from cmd line (pls. see attached source PdfRenderer.java)
 I checked everything that comes in mind to find the differences, e.g. Java 
 version, encoding/codepage issues, memory settings, found nothing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096104#comment-14096104
 ] 

John Hewson commented on PDFBOX-2261:
-

Sounds good. One small thought: given that a PDField already represents a 
dictionary, the name PDFieldDictionary is ambiguous, as the Dictionary is 
implicit in most existing PDFBox names, e.g. PDFont wraps a Font Dictionary. 
I'd avoid the suffix Dictionary, most PDFBox classes don't use it.

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096104#comment-14096104
 ] 

John Hewson edited comment on PDFBOX-2261 at 8/13/14 8:55 PM:
--

Sounds good. One small thought: given that a PDField already represents a 
dictionary, the name PDFieldDictionary is ambiguous, as the Dictionary is 
implicit in most existing PDFBox names, e.g. PDFont wraps a Font Dictionary. 
I'd avoid the suffix Dictionary, most PDFBox classes don't use it, especially 
as a dictionary is a COS-level concept.


was (Author: jahewson):
Sounds good. One small thought: given that a PDField already represents a 
dictionary, the name PDFieldDictionary is ambiguous, as the Dictionary is 
implicit in most existing PDFBox names, e.g. PDFont wraps a Font Dictionary. 
I'd avoid the suffix Dictionary, most PDFBox classes don't use it.

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files

[
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096104#comment-14096104
]

John Hewson edited comment on PDFBOX-2261 at 8/13/14 8:56 PM:
--

[~lehmi], sounds good. One small thought: given that a PDField already
represents a dictionary, the name PDFieldDictionary is ambiguous, as the
Dictionary is implicit in most existing PDFBox names, e.g. PDFont wraps a
Font Dictionary. I'd avoid the suffix Dictionary, most PDFBox classes don't
use it, especially as a dictionary is a COS-level concept.

was (Author: jahewson):
Sounds good. One small thought: given that a PDField already represents a
dictionary, the name PDFieldDictionary is ambiguous, as the Dictionary is
implicit in most existing PDFBox names, e.g. PDFont wraps a Font Dictionary.
I'd avoid the suffix Dictionary, most PDFBox classes don't use it, especially
as a dictionary is a COS-level concept.

Extremely long hang during getFields() on a few PDF files
-

Key: PDFBOX-2261
URL: https://issues.apache.org/jira/browse/PDFBOX-2261
Project: PDFBox
Issue Type: Bug
Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
Fix For: 2.0.0

Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png

When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang
during acroForm.getFields(). This is a heavy load hang.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096143#comment-14096143
 ] 

John Hewson edited comment on PDFBOX-2261 at 8/13/14 9:21 PM:
--

[~msahyoun], I'm not sure that anyone was suggesting that PDField should 
inherit from PDNonTerminalField, as you say that wouldn't be correct.

Here's what I had in mind: if we want to preserve the existing notion that a 
PDField represents an actual field (i.e. a non-terminal field) then we could 
use a class hierarchy like that below:

{code}
abstract class PDFieldTreeNode
class PDNonTerminalField extends PDFieldTreeNode
class PDField extends PDFieldTreeNode
{code}

And the following constructors:

{code}
protected PDFieldTreeNode()
protected PDFieldTreeNode(PDNonTerminalField parent)

public PDNonTerminalField()
public PDNonTerminalField(PDNonTerminalField parent)

public PDField()
public PDField(PDNonTerminalField parent)
{code}

The PDFieldTreeNode class would expose only the properties which can be 
inherited, it will also include the field inheritance logic, which will lookup 
the given key on it's parent when it does not have the value locally. The 
PDNonTerminalField class would contain little code, and exist mostly just to be 
a concrete implementation of PDFieldTreeNode. The PDField class will contain 
just the code for those extra properties supported by terminal fields.


was (Author: jahewson):
[~msahyoun], I'm not sure that anyone was suggesting that PDField should 
inherit from PDNonTerminalField, as you say that wouldn't be correct.

Here's what I had in mind: if we want to preserve the existing notion that a 
PDField represents an actual field (i.e. a non-terminal field) then we could 
use a class hierarchy like that below:

{code}
abstract class PDFieldTreeNode
class PDNonTerminalField extends PDFieldTreeNode
class PDField extends PDFieldTreeNode
{code}

And the following constructors:

{code}
protected PDFieldTreeNode()
protected PDFieldTreeNode(PDNonTerminalField parent)

public PDNonTerminalField()
public PDNonTerminalField(PDNonTerminalField parent)

public PDField()
public PDField(PDNonTerminalField parent)
{code}

The PDFieldTreeNode class would expose only the properties which can be 
inherited. The PDNonTerminalField class would contain little code, and exist 
mostly just to be a concrete implementation of PDFieldTreeNode. The PDField 
class will contain just the code for those extra properties supported by 
terminal fields. The field inheritance logic will be contained exclusively in 
PDNonTerminalField, which will lookup the given key on it's parent when it 
does not have the value locally.

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096185#comment-14096185
 ] 

Maruan Sahyoun commented on PDFBOX-2261:


[~jahewson] I like that suggestion. 

One question. If I understand the current model correctly PDField doesn’t 
represent an actual field but it’s subclasses so instead of
{code}
class PDField extends PDFieldTreeNode
{code}
it will be
{code}
abstract class PDField extends PDFieldTreeNode
{code}

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-1511) pdfMerger App produces Garbage

[
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096208#comment-14096208
]

Maruan Sahyoun commented on PDFBOX-1511:

If I understand the spec correctly

{quote}
Resources (Required; inheritable) A dictionary containing any resources
required by the page (see 7.8.3, Resource Dictionaries). If the page requires
no resources, the value of this entry shall be an empty dictionary. Omitting
the entry entirely indicates that the resources shall be inherited from an
ancestor node in the page tree.
{quote}

for a specific page it has either it’s own resources, uses ancestor resources
or none but there is no mix.

pdfMerger App produces Garbage
--

Attachments: 078117u1.pdf, 078117u2.pdf, 078118.pdf, 1.pdf, 2.pdf,
PDFMergerUtility.java, PDFMergerUtility.java.diff, PdfRenderer.java,
targetPdfMergeJava.pdf, targetPdfMergeUtilityApp.pdf

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096515#comment-14096515
 ] 

John Hewson commented on PDFBOX-2261:
-

Yes, a PDField's subclasses represent actual fields, PDField's should remain 
abstract. I skipped over that part. PDField is the base class of all terminal 
fields.

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096515#comment-14096515
 ] 

John Hewson edited comment on PDFBOX-2261 at 8/14/14 3:41 AM:
--

Yes, a PDField's subclasses represent actual fields, PDField should remain 
abstract. I skipped over that part. PDField is the base class of all terminal 
fields.


was (Author: jahewson):
Yes, a PDField's subclasses represent actual fields, PDField's should remain 
abstract. I skipped over that part. PDField is the base class of all terminal 
fields.

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096515#comment-14096515
 ] 

John Hewson edited comment on PDFBOX-2261 at 8/14/14 3:42 AM:
--

Yes, a PDField's subclasses represent actual fields, PDField should remain 
abstract. I skipped over that part. PDField is the base class of all terminal 
fields, i.e. the superclass of all actual fields.


was (Author: jahewson):
Yes, a PDField's subclasses represent actual fields, PDField should remain 
abstract. I skipped over that part. PDField is the base class of all terminal 
fields.

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files


[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096515#comment-14096515
 ] 

John Hewson edited comment on PDFBOX-2261 at 8/14/14 3:43 AM:
--

Yes, a PDField's subclasses represent actual fields, PDField should remain 
abstract. I skipped over that part. PDField is the superclass of all terminal 
fields, i.e. actual fields.


was (Author: jahewson):
Yes, a PDField's subclasses represent actual fields, PDField should remain 
abstract. I skipped over that part. PDField is the base class of all terminal 
fields, i.e. the superclass of all actual fields.

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files