[jira] [Commented] (PDFBOX-3142) PDFMergerUtility with scratch file generates result with blank pages for certain source files.

2015-12-02 Thread Jim deVos (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036699#comment-15036699
 ] 

Jim deVos commented on PDFBOX-3142:
---

Quick update: I rewrote the above test using PDFBox 2.0.0RC2.   The test passes 
and the result pdf doesn't have the blank pages I'm seeing when using 1.8.10.  
The API looks pretty straightforward, but please let me know if I'm not 
actually utilizing a scratch disk:

{code}
@Test
public void testMergeWithScratchFiles() throws IOException {
MemoryUsageSetting settings = 
MemoryUsageSetting.setupTempFileOnly().setTempDir(ROOT_DIR);
File result = new File(ROOT_DIR, "result.pdf");
PDFMergerUtility ut = new PDFMergerUtility();
ut.addSource(coverpage);
ut.addSource(document);
ut.setDestinationFileName(result.getCanonicalPath());
ut.mergeDocuments(settings);
assertThat(result.length(), is( greaterThan(document.length(;
}
{code}



> PDFMergerUtility with scratch file generates result with blank pages for 
> certain source files.
> --
>
> Key: PDFBOX-3142
> URL: https://issues.apache.org/jira/browse/PDFBOX-3142
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.8.10
> Environment: Ubuntu 14.04.3, java 1.8.0_66
>Reporter: Jim deVos
>
> My team uses PDFMergerUtility to attach cover pages to various pdfs .   We 
> recently we tried utilizing a scratch file (e.g. 
> PDFMergerUtility.mergeDocumentsNonSeq())  to cut down on the amount of RAM we 
> are using. This approach works for the majority of pdf's in our system, but 
> some files cause the merger utility to generate resultant pdf's with a blank 
> page.  Specifically, the result pdf contains a blank page after the coverpage 
> instead of the first page of the second document sent to merger utility.
> Whenever this problem occurs, we see the following line in our logs:
> {{org.apache.pdfbox.pdfparser.NonSequentialPDFParser - Can't find the object 
> 52 0 (origin offset 7187557)}}
> I'll try to attach/link an example pdf soon, but currently I don't have 
> permission to redistribute any files that exhibit the problem.  However,  
> here's a simple snippet that replicates the problem - it's pretty 
> straightforward.
> {code}
> @Test
> public void testMergeNonSeq() throws IOException, COSVisitorException {
> destinationPdf = new File(TMP_FOLDER, "result-nonseq.pdf");
> PDFMergerUtility ut = new PDFMergerUtility();
> RandomAccess ram = new 
> RandomAccessFile(File.createTempFile("mergeram", ".bin"), "rw");
> ut.addSource(coverpagePdf);
> ut.addSource(documentPdf);
> ut.setDestinationFileName(destinationPdf.getCanonicalPath());
> ut.mergeDocumentsNonSeq(ram);  
> 
> //the only automated way we have to tell that something went wrong is 
> to check the size of the result
> assertThat("destination pdf should be larger than the original pdf", 
> destinationPdf.length(), is( greaterThan(documentPdf.length(;
> }
> {code}
> Note we only see this problem with PDFMergerUtility.mergeDocumentsNonSeq().  
> Using PDFMergerUtility.mergeDocuments() does not exhibit any problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3142) PDFMergerUtility with scratch file generates result with blank pages for certain source files.

2015-12-01 Thread Jim deVos (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034433#comment-15034433
 ] 

Jim deVos commented on PDFBOX-3142:
---

Andreas - thanks for your reply. I'll run these source documents through a pdf 
validator to see what it finds.  Individually they open just fine (i.e. no 
blank pages) in various pdf viewers, but I suspect that these viewers are 
pretty forgiving w/ non-compliant files.   On that note, it would be  nice to 
know of a way to anticipate if the file will cause these issues before 
attempting to merge it with a coverpage.   At the moment all I see is the 
aforementioned error  message in the log, but I don't see a way to interrogate 
the parser to see if it has issues w/ the file.

As for v2,  that's a good suggestion. I'll rewrite my test for 2.0.0 and report 
the results.

> PDFMergerUtility with scratch file generates result with blank pages for 
> certain source files.
> --
>
> Key: PDFBOX-3142
> URL: https://issues.apache.org/jira/browse/PDFBOX-3142
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.8.10
> Environment: Ubuntu 14.04.3, java 1.8.0_66
>Reporter: Jim deVos
>
> My team uses PDFMergerUtility to attach cover pages to various pdfs .   We 
> recently we tried utilizing a scratch file (e.g. 
> PDFMergerUtility.mergeDocumentsNonSeq())  to cut down on the amount of RAM we 
> are using. This approach works for the majority of pdf's in our system, but 
> some files cause the merger utility to generate resultant pdf's with a blank 
> page.  Specifically, the result pdf contains a blank page after the coverpage 
> instead of the first page of the second document sent to merger utility.
> Whenever this problem occurs, we see the following line in our logs:
> {{org.apache.pdfbox.pdfparser.NonSequentialPDFParser - Can't find the object 
> 52 0 (origin offset 7187557)}}
> I'll try to attach/link an example pdf soon, but currently I don't have 
> permission to redistribute any files that exhibit the problem.  However,  
> here's a simple snippet that replicates the problem - it's pretty 
> straightforward.
> {code}
> @Test
> public void testMergeNonSeq() throws IOException, COSVisitorException {
> destinationPdf = new File(TMP_FOLDER, "result-nonseq.pdf");
> PDFMergerUtility ut = new PDFMergerUtility();
> RandomAccess ram = new 
> RandomAccessFile(File.createTempFile("mergeram", ".bin"), "rw");
> ut.addSource(coverpagePdf);
> ut.addSource(documentPdf);
> ut.setDestinationFileName(destinationPdf.getCanonicalPath());
> ut.mergeDocumentsNonSeq(ram);  
> 
> //the only automated way we have to tell that something went wrong is 
> to check the size of the result
> assertThat("destination pdf should be larger than the original pdf", 
> destinationPdf.length(), is( greaterThan(documentPdf.length(;
> }
> {code}
> Note we only see this problem with PDFMergerUtility.mergeDocumentsNonSeq().  
> Using PDFMergerUtility.mergeDocuments() does not exhibit any problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3142) PDFMergerUtility with scratch file generates result with blank pages for certain source files.

2015-11-30 Thread Jim deVos (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim deVos updated PDFBOX-3142:
--
Summary: PDFMergerUtility with scratch file generates result with blank 
pages for certain source files.  (was: PDFMergerUtility generates result with 
blank pages for certain source files.)

> PDFMergerUtility with scratch file generates result with blank pages for 
> certain source files.
> --
>
> Key: PDFBOX-3142
> URL: https://issues.apache.org/jira/browse/PDFBOX-3142
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.8.10
> Environment: Ubuntu 14.04.3, java 1.8.0_66
>Reporter: Jim deVos
>
> My team uses PDFMergerUtility to attach cover pages to various pdfs .   We 
> recently we tried utilizing a scratch file (e.g. 
> PDFMergerUtility.mergeNonSeq())  to cut down on the amount of RAM we are 
> using. This approach works for the majority of pdf's in our system, but some 
> files cause the merger utility to generate resultant pdf's with a blank page. 
>  Specifically, the result pdf contains a blank page after the coverpage 
> instead of the first page of the second document sent to merger utility.
> Whenever this problem occurs, we see the following line in our logs:
> {{org.apache.pdfbox.pdfparser.NonSequentialPDFParser - Can't find the object 
> 52 0 (origin offset 7187557)}}
> I'll try to attach/link an example pdf soon, but currently I don't have 
> permission to redistribute any files that exhibit the problem.  However,  
> here's a simple snippet that replicates the problem - it's pretty 
> straightforward.
> {code}
> @Test
> public void testMergeNonSeq() throws IOException, COSVisitorException {
> destinationPdf = new File(TMP_FOLDER, "result-nonseq.pdf");
> PDFMergerUtility ut = new PDFMergerUtility();
> RandomAccess ram = new 
> RandomAccessFile(File.createTempFile("mergeram", ".bin"), "rw");
> ut.addSource(coverpagePdf);
> ut.addSource(documentPdf);
> ut.setDestinationFileName(destinationPdf.getCanonicalPath());
> ut.mergeDocumentsNonSeq(ram);
> 
> //the only automated way we have to tell that something went wrong is 
> to check the size of the result
> assertThat("destination pdf should be larger than the original pdf", 
> destinationPdf.length(), is( greaterThan(documentPdf.length(;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3142) PDFMergerUtility with scratch file generates result with blank pages for certain source files.

2015-11-30 Thread Jim deVos (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim deVos updated PDFBOX-3142:
--
Description: 
My team uses PDFMergerUtility to attach cover pages to various pdfs .   We 
recently we tried utilizing a scratch file (e.g. 
PDFMergerUtility.mergeDocumentsNonSeq())  to cut down on the amount of RAM we 
are using. This approach works for the majority of pdf's in our system, but 
some files cause the merger utility to generate resultant pdf's with a blank 
page.  Specifically, the result pdf contains a blank page after the coverpage 
instead of the first page of the second document sent to merger utility.

Whenever this problem occurs, we see the following line in our logs:

{{org.apache.pdfbox.pdfparser.NonSequentialPDFParser - Can't find the object 52 
0 (origin offset 7187557)}}

I'll try to attach/link an example pdf soon, but currently I don't have 
permission to redistribute any files that exhibit the problem.  However,  
here's a simple snippet that replicates the problem - it's pretty 
straightforward.

{code}
@Test
public void testMergeNonSeq() throws IOException, COSVisitorException {
destinationPdf = new File(TMP_FOLDER, "result-nonseq.pdf");
PDFMergerUtility ut = new PDFMergerUtility();
RandomAccess ram = new RandomAccessFile(File.createTempFile("mergeram", 
".bin"), "rw");
ut.addSource(coverpagePdf);
ut.addSource(documentPdf);
ut.setDestinationFileName(destinationPdf.getCanonicalPath());

ut.mergeDocumentsNonSeq(ram);  

//the only automated way we have to tell that something went wrong is 
to check the size of the result
assertThat("destination pdf should be larger than the original pdf", 
destinationPdf.length(), is( greaterThan(documentPdf.length(;
}
{code}

Note we only see this problem with PDFMergerUtility.mergeDocumentsNonSeq().  
Using PDFMergerUtility.mergeDocuments() does not exhibit any problems.


  was:
My team uses PDFMergerUtility to attach cover pages to various pdfs .   We 
recently we tried utilizing a scratch file (e.g. 
PDFMergerUtility.mergeNonSeq())  to cut down on the amount of RAM we are using. 
This approach works for the majority of pdf's in our system, but some files 
cause the merger utility to generate resultant pdf's with a blank page.  
Specifically, the result pdf contains a blank page after the coverpage instead 
of the first page of the second document sent to merger utility.

Whenever this problem occurs, we see the following line in our logs:

{{org.apache.pdfbox.pdfparser.NonSequentialPDFParser - Can't find the object 52 
0 (origin offset 7187557)}}

I'll try to attach/link an example pdf soon, but currently I don't have 
permission to redistribute any files that exhibit the problem.  However,  
here's a simple snippet that replicates the problem - it's pretty 
straightforward.

{code}
@Test
public void testMergeNonSeq() throws IOException, COSVisitorException {
destinationPdf = new File(TMP_FOLDER, "result-nonseq.pdf");
PDFMergerUtility ut = new PDFMergerUtility();
RandomAccess ram = new RandomAccessFile(File.createTempFile("mergeram", 
".bin"), "rw");
ut.addSource(coverpagePdf);
ut.addSource(documentPdf);
ut.setDestinationFileName(destinationPdf.getCanonicalPath());
ut.mergeDocumentsNonSeq(ram);

//the only automated way we have to tell that something went wrong is 
to check the size of the result
assertThat("destination pdf should be larger than the original pdf", 
destinationPdf.length(), is( greaterThan(documentPdf.length(;
}

{code}





> PDFMergerUtility with scratch file generates result with blank pages for 
> certain source files.
> --
>
> Key: PDFBOX-3142
> URL: https://issues.apache.org/jira/browse/PDFBOX-3142
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.8.10
> Environment: Ubuntu 14.04.3, java 1.8.0_66
>Reporter: Jim deVos
>
> My team uses PDFMergerUtility to attach cover pages to various pdfs .   We 
> recently we tried utilizing a scratch file (e.g. 
> PDFMergerUtility.mergeDocumentsNonSeq())  to cut down on the amount of RAM we 
> are using. This approach works for the majority of pdf's in our system, but 
> some files cause the merger utility to generate resultant pdf's with a blank 
> page.  Specifically, the result pdf contains a blank page after the coverpage 
> instead of the first page of the second document sent to merger utility.
> Whenever this problem occurs, we see the following line in our logs:
> {{org.apache.pdfbox.pdfparser.NonSequentialPDFParser - Can't find the object 
> 52 0 (origin offset 7187557)}}
> I'll try to 

[jira] [Created] (PDFBOX-3142) PDFMergerUtility generates result with blank pages for certain source files.

2015-11-30 Thread Jim deVos (JIRA)
Jim deVos created PDFBOX-3142:
-

 Summary: PDFMergerUtility generates result with blank pages for 
certain source files.
 Key: PDFBOX-3142
 URL: https://issues.apache.org/jira/browse/PDFBOX-3142
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.10
 Environment: Ubuntu 14.04.3, java 1.8.0_66
Reporter: Jim deVos


My team uses PDFMergerUtility to attach cover pages to various pdfs .   We 
recently we tried utilizing a scratch file (e.g. 
PDFMergerUtility.mergeNonSeq())  to cut down on the amount of RAM we are using. 
This approach works for the majority of pdf's in our system, but some files 
cause the merger utility to generate resultant pdf's with a blank page.  
Specifically, the result pdf contains a blank page after the coverpage instead 
of the first page of the second document sent to merger utility.

Whenever this problem occurs, we see the following line in our logs:

{{org.apache.pdfbox.pdfparser.NonSequentialPDFParser - Can't find the object 52 
0 (origin offset 7187557)}}

I'll try to attach/link an example pdf soon, but currently I don't have 
permission to redistribute any files that exhibit the problem.  However,  
here's a simple snippet that replicates the problem - it's pretty 
straightforward.

{code}
@Test
public void testMergeNonSeq() throws IOException, COSVisitorException {
destinationPdf = new File(TMP_FOLDER, "result-nonseq.pdf");
PDFMergerUtility ut = new PDFMergerUtility();
RandomAccess ram = new RandomAccessFile(File.createTempFile("mergeram", 
".bin"), "rw");
ut.addSource(coverpagePdf);
ut.addSource(documentPdf);
ut.setDestinationFileName(destinationPdf.getCanonicalPath());
ut.mergeDocumentsNonSeq(ram);

//the only automated way we have to tell that something went wrong is 
to check the size of the result
assertThat("destination pdf should be larger than the original pdf", 
destinationPdf.length(), is( greaterThan(documentPdf.length(;
}

{code}






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2847) mergeDocumentsNonSeq does not utilize scratchFile

2015-06-30 Thread Jim deVos (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608756#comment-14608756
 ] 

Jim deVos commented on PDFBOX-2847:
---

Great to hear, thanks so much for the quick response.

 mergeDocumentsNonSeq does not utilize scratchFile
 -

 Key: PDFBOX-2847
 URL: https://issues.apache.org/jira/browse/PDFBOX-2847
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.9
Reporter: Jim deVos
Assignee: Tilman Hausherr
 Attachments: pdfbox1.8.x.patch


 I noticed when merging relatively large pdfs (1gb)  that the heap would grow 
 by at least the same amount until complete, even when I call 
 mergeDocumentsNonSeq() and supplying a read/write scratchfile.   
 When I looked at the source for mergeDocuments(bool, RandomAccess),  it looks 
 like the scratch file is never used.
 {code}
 private void mergeDocuments(boolean isNonSeq, RandomAccess scratchFile) 
 throws IOException, COSVisitorException
 {
 //...snip
 if (isNonSeq)
 {
 source = PDDocument.loadNonSeq(sourceFile, null);
 }
 //...snip
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-2847) mergeDocumentsNonSeq does not utilize scratchFile

2015-06-29 Thread Jim deVos (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607143#comment-14607143
 ] 

Jim deVos edited comment on PDFBOX-2847 at 6/30/15 4:41 AM:


I attached a patch for r1686617 on /branches/1.8  that has a potential fix for 
the bug as well as a tweaked version of the mergeDocumentsNonSeq() test.  
Apologies in advance - this is my first bug report and my first patch so I 
probably screwed up many aspects of the protocol / code style for these types 
of reports.


was (Author: jtdevos):
This is a patch for r1686617 on /branches/1.8  that has a potential fix for the 
bug as well as a tweaked version of the mergeDocumentsNonSeq() test.  Apologies 
in advance - this is my first bug report and my first patch so I probably 
screwed up many aspects of the protocol / code style for these types of reports.

 mergeDocumentsNonSeq does not utilize scratchFile
 -

 Key: PDFBOX-2847
 URL: https://issues.apache.org/jira/browse/PDFBOX-2847
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.9
Reporter: Jim deVos
 Attachments: pdfbox1.8.x.patch


 I noticed when merging relatively large pdfs (1gb)  that the heap would grow 
 by at least the same amount until complete, even when I call 
 mergeDocumentsNonSeq() and supplying a read/write scratchfile.   
 When I looked at the source for mergeDocuments(bool, RandomAccess),  it looks 
 like the scratch file is never used.
 {code}
 private void mergeDocuments(boolean isNonSeq, RandomAccess scratchFile) 
 throws IOException, COSVisitorException
 {
 //...snip
 if (isNonSeq)
 {
 source = PDDocument.loadNonSeq(sourceFile, null);
 }
 //...snip
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-2847) mergeDocumentsNonSeq does not utilize scratchFile

2015-06-29 Thread Jim deVos (JIRA)
Jim deVos created PDFBOX-2847:
-

 Summary: mergeDocumentsNonSeq does not utilize scratchFile
 Key: PDFBOX-2847
 URL: https://issues.apache.org/jira/browse/PDFBOX-2847
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.9
Reporter: Jim deVos


I noticed when merging relatively large pdfs (1gb)  that the heap would grow by 
at least the same amount until complete, even when I call 
mergeDocumentsNonSeq() and supplying a read/write scratchfile.   

When I looked at the source for mergeDocuments(bool, RandomAccess),  it looks 
like the scratch file is never used.
{code}

private void mergeDocuments(boolean isNonSeq, RandomAccess scratchFile) 
throws IOException, COSVisitorException
{
//...snip

if (isNonSeq)
{
source = PDDocument.loadNonSeq(sourceFile, null);
}
//...snip
}
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-2847) mergeDocumentsNonSeq does not utilize scratchFile

2015-06-29 Thread Jim deVos (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim deVos updated PDFBOX-2847:
--
Attachment: pdfbox1.8.x.patch

This is a patch for r1686617 on /branches/1.8  that has a potential fix for the 
bug as well as a tweaked version of the mergeDocumentsNonSeq() test.  Apologies 
in advance - this is my first bug report and my first patch so I probably 
screwed up many aspects of the protocol / code style for these types of reports.

 mergeDocumentsNonSeq does not utilize scratchFile
 -

 Key: PDFBOX-2847
 URL: https://issues.apache.org/jira/browse/PDFBOX-2847
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.9
Reporter: Jim deVos
 Attachments: pdfbox1.8.x.patch


 I noticed when merging relatively large pdfs (1gb)  that the heap would grow 
 by at least the same amount until complete, even when I call 
 mergeDocumentsNonSeq() and supplying a read/write scratchfile.   
 When I looked at the source for mergeDocuments(bool, RandomAccess),  it looks 
 like the scratch file is never used.
 {code}
 private void mergeDocuments(boolean isNonSeq, RandomAccess scratchFile) 
 throws IOException, COSVisitorException
 {
 //...snip
 if (isNonSeq)
 {
 source = PDDocument.loadNonSeq(sourceFile, null);
 }
 //...snip
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org