from:"Kirk Haines \(JIRA\)"

[jira] [Updated] (PDFBOX-1511) pdfMerger App produces Garbage

2014-07-16 Thread Kirk Haines (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kirk Haines updated PDFBOX-1511:

Attachment: PDFMergerUtility.java

This version of PDFMergerUtility.java (based on 1.7.1 iirc) removes the shared
resources section and instead applies resources on the page level. The cloner
will create references for resources used on multiple pages, so there is not
excessive resource duplication. The previous method assumed resources with the
same name were identical, which is not valid (see prior comment about Font
resource CMaps).

pdfMerger App produces Garbage
--

Key: PDFBOX-1511
URL: https://issues.apache.org/jira/browse/PDFBOX-1511
Project: PDFBox
Issue Type: Bug
Components: Utilities
Affects Versions: 1.7.1
Environment: Win XP; Windows Server 2008 R2; java version 1.6.0_21,
Reporter: Michael Huber
Attachments: 1.pdf, 2.pdf, PDFMergerUtility.java, PdfRenderer.java,
targetPdfMergeJava.pdf, targetPdfMergeUtilityApp.pdf

pdfbox Utility pdfMerger produces a merged document containing garbage. All
merged pdf files are contained but Strings are destroyed.
The source pdf files are created with graphviz and are readable without error
or disturbance both with Acrobat X and pdfbox pdfDebug Utility.
Another astoundig thing is that a handcoded merger using pdfMergerUtility
class works fine when run within Eclipse Juno and creates same garbage when
run from cmd line (pls. see attached source)
I checked everything that comes in mind to find the differences, e.g. Java
version, encoding/codepage issues, memory settings, found nothing.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (PDFBOX-1511) pdfMerger App produces Garbage

2014-07-16 Thread Kirk Haines (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Haines updated PDFBOX-1511:


Attachment: PDFMergerUtility.java.diff

Diff version of changes.

 pdfMerger App produces Garbage
 --

 Key: PDFBOX-1511
 URL: https://issues.apache.org/jira/browse/PDFBOX-1511
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.7.1
 Environment: Win XP; Windows Server 2008 R2; java version 1.6.0_21, 
Reporter: Michael Huber
 Attachments: 1.pdf, 2.pdf, PDFMergerUtility.java, 
 PDFMergerUtility.java.diff, PdfRenderer.java, targetPdfMergeJava.pdf, 
 targetPdfMergeUtilityApp.pdf


 pdfbox Utility pdfMerger produces a merged document containing garbage. All 
 merged pdf files are contained but Strings are destroyed.
 The source pdf files are created with graphviz and are readable without error 
 or disturbance both with Acrobat X and pdfbox pdfDebug Utility.
 Another astoundig thing is that a handcoded merger using pdfMergerUtility 
 class works fine when run within Eclipse Juno and creates same garbage when 
 run from cmd line (pls. see attached source)
 I checked everything that comes in mind to find the differences, e.g. Java 
 version, encoding/codepage issues, memory settings, found nothing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-1511) pdfMerger App produces Garbage

2013-07-31 Thread Kirk Haines (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721174#comment-13721174
]

Kirk Haines edited comment on PDFBOX-1511 at 7/31/13 7:00 PM:
--

I have also experienced this (Windows 7, Java 1.6.0_35-b10 64-bit) in PDFBox
1.7.1 thru the current trunk. I tried Maruan's suggestion and it resolved the
issue, at the expense of creating unnecessary duplicate resources. However, it
did not create extra copies of these resources for each page. Once a Resources
object was cloned the first time, it was reused. Consequently there is only
one copy of the Resources from each input file, with a reference to the
appropriate Resources object on each page. My documents did not have existing
page level Resources, so I am not sure how Maruan's suggestion would work in
those cases. Creating a PageGroup to hold all pages from a given input
document may be a better option to avoid this issue.

I had noticed that the corruption in subsequent documents resulted in those
pages having their formatting preserved, but the text content had many letters
substituted (all 'd' replaced by 'f', all 'y' replaced by 'd', etc.) I also
found that the degree of corruption depended on how similar the beginning text
content of each input document was. When there was a common header in the
documents being merged, there were only a few substitutions. When it was
merging a document with itself, there were no errors. When the document header
was very different, the resulting text was undecipherable garbage. This made
me suspect that it may be a problem with a compression dictionary, using the
dictionary from the first file on subsequent files. At first I thought this
dictionary was in the flate compression being applied to the stream, but found
that it was in the CMap of a font resource. Both documents used the same name
for the font, so the PDFMerger only retained the copy from the first PDF in the
merged PDF. When subsequent pages from subsequent input documents referenced
the font, they used the CMap dictionary from the first input document,
resulting in various degrees of garbled text. Lesson learned, Font resources
may have content that is dependent on the strings they were used to display.

was (Author: kirk.haines):
I have also experienced this (Windows 7, Java 1.6.0_35-b10 64-bit) in
PDFBox 1.7.1 thru the current trunk. I tried Maruan's suggestion and it
resolved the issue, at the expense of creating unnecessary duplicate resources.
I had noticed that the corruption in subsequent documents resulted in those
pages having their formatting preserved, but the text content had many letters
substituted (all 'd' replaced by 'f', all 'y' replaced by 'd', etc.) I also
found that the degree of corruption depended on how similar the beginning text
content of each input document was. When there was a common header in the
documents being merged, there were only a few substitutions. When it was
merging a document with itself, there were no errors. When the document header
was very different, the resulting text was undecipherable garbage. This made
me suspect that it may be a problem with a compression dictionary, using the
dictionary from the first file on subsequent files. At first I thought this
dictionary was in the flate compression being applied to the stream, but found
that it was in the CMap of a font resource. Both documents used the same name
for the font, so the PDFMerger only retained the copy from the first PDF in the
merged PDF. When subsequent pages from subsequent input documents referenced
the font, they used the CMap dictionary from the first input document,
resulting in various degrees of garbled text. Lesson learned, Font resources
may have content that is dependent on the strings they were used to display.

pdfMerger App produces Garbage
--

Key: PDFBOX-1511
URL: https://issues.apache.org/jira/browse/PDFBOX-1511
Project: PDFBox
Issue Type: Bug
Components: Utilities
Affects Versions: 1.7.1
Environment: Win XP; Windows Server 2008 R2; java version 1.6.0_21,
Reporter: Michael Huber
Attachments: 1.pdf, 2.pdf, PdfRenderer.java, targetPdfMergeJava.pdf,
targetPdfMergeUtilityApp.pdf

[jira] [Commented] (PDFBOX-1511) pdfMerger App produces Garbage

2013-07-26 Thread Kirk Haines (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721174#comment-13721174
]

Kirk Haines commented on PDFBOX-1511:
-

I have also experienced this (Windows 7, Java 1.6.0_35-b10 64-bit) in PDFBox
1.7.1 thru the current trunk. I tried Maruan's suggestion and it resolved the
issue, at the expense of creating unnecessary duplicate resources. I had
noticed that the corruption in subsequent documents resulted in those pages
having their formatting preserved, but the text content had many letters
substituted (all 'd' replaced by 'f', all 'y' replaced by 'd', etc.) I also
found that the degree of corruption depended on how similar the beginning text
content of each input document was. When there was a common header in the
documents being merged, there were only a few substitutions. When it was
merging a document with itself, there were no errors. When the document header
was very different, the resulting text was undecipherable garbage. This made
me suspect that it may be a problem with the deflate compression being applied
to the stream. I thought that it might be using the (compression) dictionary
from the first document and copying the physical bytes from the source document
rather than the reading the logical bytes and allowing the deflate filter in
the context of the destination document to re-encode them.

pdfMerger App produces Garbage
--

Key: PDFBOX-1511
URL: https://issues.apache.org/jira/browse/PDFBOX-1511
Project: PDFBox
Issue Type: Bug
Components: Utilities
Affects Versions: 1.7.1
Environment: Win XP; Windows Server 2008 R2; java version 1.6.0_21,
Reporter: Michael Huber
Attachments: 1.pdf, 2.pdf, PdfRenderer.java, targetPdfMergeJava.pdf,
targetPdfMergeUtilityApp.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PDFBOX-1511) pdfMerger App produces Garbage

[jira] [Updated] (PDFBOX-1511) pdfMerger App produces Garbage

[jira] [Comment Edited] (PDFBOX-1511) pdfMerger App produces Garbage

[jira] [Commented] (PDFBOX-1511) pdfMerger App produces Garbage

4 matches

Site Navigation

Mail list logo

Footer information