[jira] [Created] (PDFBOX-3957) Pages lost
Tilman Hausherr created PDFBOX-3957: --- Summary: Pages lost Key: PDFBOX-3957 URL: https://issues.apache.org/jira/browse/PDFBOX-3957 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 2.0.8 Reporter: Tilman Hausherr The file from PDFBOX-3785 has only 1 page, but should have 11. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3956) Truncated pdf can't be repaired anymore
[ https://issues.apache.org/jira/browse/PDFBOX-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-3956: --- Labels: regression (was: ) > Truncated pdf can't be repaired anymore > --- > > Key: PDFBOX-3956 > URL: https://issues.apache.org/jira/browse/PDFBOX-3956 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler > Labels: regression > Attachments: N4CWPKUJDQIPQ6YLYKPLOZJBTSYGQEYR.pdf > > > [~talli...@mitre.org]'s last test run reveals another minor regression. The > truncated file attached to the ticket can't be read anymore. The issue is > related to the changes made in PDFBOX-3936. > The following exception is thrown > {code} > java.io.IOException: Error reading stream, expected='endstream' actual='' at > offset 271297 > org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1078) > org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:821) > > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:782) > > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:713) > org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:673) > org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:205) > org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:240) > org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1012) > org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:950) > > org.apache.pdfbox.debugger.PDFDebugger.parseDocument(PDFDebugger.java:1312) > org.apache.pdfbox.debugger.PDFDebugger.readPDFFile(PDFDebugger.java:1233) > org.apache.pdfbox.debugger.PDFDebugger.readPDFFile(PDFDebugger.java:1218) > org.apache.pdfbox.debugger.PDFDebugger.main(PDFDebugger.java:1209) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3956) Truncated pdf can't be repaired anymore
[ https://issues.apache.org/jira/browse/PDFBOX-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-3956: --- Description: [~talli...@mitre.org]'s last test run reveals another minor regression. The truncated file attached to the ticket can't be read anymore. The issue is related to the changes made in PDFBOX-3936. The following exception is thrown {code} java.io.IOException: Error reading stream, expected='endstream' actual='' at offset 271297 org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1078) org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:821) org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:782) org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:713) org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:673) org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:205) org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:240) org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1012) org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:950) org.apache.pdfbox.debugger.PDFDebugger.parseDocument(PDFDebugger.java:1312) org.apache.pdfbox.debugger.PDFDebugger.readPDFFile(PDFDebugger.java:1233) org.apache.pdfbox.debugger.PDFDebugger.readPDFFile(PDFDebugger.java:1218) org.apache.pdfbox.debugger.PDFDebugger.main(PDFDebugger.java:1209) {code} was:[~talli...@mitre.org]'s last test run reveals another minor regression. The truncated file attached to the ticket can't be read anymore. > Truncated pdf can't be repaired anymore > --- > > Key: PDFBOX-3956 > URL: https://issues.apache.org/jira/browse/PDFBOX-3956 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler > Attachments: N4CWPKUJDQIPQ6YLYKPLOZJBTSYGQEYR.pdf > > > [~talli...@mitre.org]'s last test run reveals another minor regression. The > truncated file attached to the ticket can't be read anymore. The issue is > related to the changes made in PDFBOX-3936. > The following exception is thrown > {code} > java.io.IOException: Error reading stream, expected='endstream' actual='' at > offset 271297 > org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1078) > org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:821) > > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:782) > > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:713) > org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:673) > org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:205) > org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:240) > org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1012) > org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:950) > > org.apache.pdfbox.debugger.PDFDebugger.parseDocument(PDFDebugger.java:1312) > org.apache.pdfbox.debugger.PDFDebugger.readPDFFile(PDFDebugger.java:1233) > org.apache.pdfbox.debugger.PDFDebugger.readPDFFile(PDFDebugger.java:1218) > org.apache.pdfbox.debugger.PDFDebugger.main(PDFDebugger.java:1209) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3956) Truncated pdf can't be repaired anymore
Andreas Lehmkühler created PDFBOX-3956: -- Summary: Truncated pdf can't be repaired anymore Key: PDFBOX-3956 URL: https://issues.apache.org/jira/browse/PDFBOX-3956 Project: PDFBox Issue Type: Bug Components: Parsing Reporter: Andreas Lehmkühler Assignee: Andreas Lehmkühler Attachments: N4CWPKUJDQIPQ6YLYKPLOZJBTSYGQEYR.pdf [~talli...@mitre.org]'s last test run reveals another minor regression. The truncated file attached to the ticket can't be read anymore. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: Apache PDFBox October 2017 report due
Am 09.10.2017 um 21:30 schrieb Tilman Hausherr: Am 09.10.2017 um 19:48 schrieb Andreas Lehmkuehler: ## Releases: - Last release was 2.0.6 on Mon May 15 2017 That one is wrong, last was 2.0.7 Yes, of course. I've trusted the automatically generated report template. Thanks for the pointer. I'm going to correct that before posting the report. Andreas Tilman - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
RE: 2.0.8?
Apologies, but I haven't gotten around to adding the exception columns in the content comparison tables, including the "page count diffs" table. I also haven't had a chance to read/make sense of the reports yet, but I wanted to share asap. Best, Tim -Original Message- From: Allison, Timothy B. Sent: Monday, October 9, 2017 4:26 PM To: dev@pdfbox.apache.org Subject: RE: 2.0.8? Thank you, Andreas, for fixing the slow parse on corrupt file so quickly! Reports are here: http://162.242.228.174/reports/pdfbox_2_0_7_Vs_2_0_8_take3.tar.gz -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Monday, October 9, 2017 8:02 AM To: dev@pdfbox.apache.org Subject: RE: 2.0.8? Starting process now. -Original Message- From: Andreas Lehmkuehler [mailto:andr...@lehmi.de] Sent: Sunday, October 8, 2017 10:12 AM To: dev@pdfbox.apache.org Subject: Re: 2.0.8? Am 03.10.2017 um 15:38 schrieb Allison, Timothy B.: > >> And yes, we need another regressions run if possible > > Sounds good. Will do once I hear that we're good to go. Thank you! We are good now. @Tim: Could you please re-run your test to see how good we are? TIA, Andreas > > - > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For > additional commands, e-mail: dev-h...@pdfbox.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
RE: 2.0.8?
Thank you, Andreas, for fixing the slow parse on corrupt file so quickly! Reports are here: http://162.242.228.174/reports/pdfbox_2_0_7_Vs_2_0_8_take3.tar.gz -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Monday, October 9, 2017 8:02 AM To: dev@pdfbox.apache.org Subject: RE: 2.0.8? Starting process now. -Original Message- From: Andreas Lehmkuehler [mailto:andr...@lehmi.de] Sent: Sunday, October 8, 2017 10:12 AM To: dev@pdfbox.apache.org Subject: Re: 2.0.8? Am 03.10.2017 um 15:38 schrieb Allison, Timothy B.: > >> And yes, we need another regressions run if possible > > Sounds good. Will do once I hear that we're good to go. Thank you! We are good now. @Tim: Could you please re-run your test to see how good we are? TIA, Andreas > > - > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For > additional commands, e-mail: dev-h...@pdfbox.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: Apache PDFBox October 2017 report due
Am 09.10.2017 um 19:48 schrieb Andreas Lehmkuehler: ## Releases: - Last release was 2.0.6 on Mon May 15 2017 That one is wrong, last was 2.0.7 Tilman - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3953) StackOverflowError in org.apache.pdfbox.pdmodel.PDPageTree.getKids
[ https://issues.apache.org/jira/browse/PDFBOX-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197527#comment-16197527 ] Tilman Hausherr commented on PDFBOX-3953: - Please retry with a current snapshot https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.8-SNAPSHOT/ If it still happens, please attach your PDF file. This may be a recursion/loop in the page tree. > StackOverflowError in org.apache.pdfbox.pdmodel.PDPageTree.getKids > -- > > Key: PDFBOX-3953 > URL: https://issues.apache.org/jira/browse/PDFBOX-3953 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Affects Versions: 2.0.7 >Reporter: Jorge Spinsanti > > I got an StackOverflowError in > org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:135) > {code} > java.lang.StackOverflowError > at org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:135) > at org.apache.pdfbox.pdmodel.PDPageTree.access$200(PDPageTree.java:38) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:166) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > ... > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3954) IllegalArgumentException on PDPageTree
[ https://issues.apache.org/jira/browse/PDFBOX-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197512#comment-16197512 ] Tilman Hausherr commented on PDFBOX-3954: - Please retry with a current snapshot https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.8-SNAPSHOT/ If it doesn't work, please attach the PDF file. > IllegalArgumentException on PDPageTree > -- > > Key: PDFBOX-3954 > URL: https://issues.apache.org/jira/browse/PDFBOX-3954 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Reporter: Jorge Spinsanti > > I got the following stacktrace: > {code} > java.lang.IllegalArgumentException: root cannot be null > at org.apache.pdfbox.pdmodel.PDPageTree.(PDPageTree.java:75) > at > org.apache.pdfbox.pdmodel.PDDocumentCatalog.getPages(PDDocumentCatalog.java:129) > at > org.apache.pdfbox.multipdf.PDFMergerUtility.appendDocument(PDFMergerUtility.java:562) > at > org.apache.pdfbox.multipdf.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:265) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-3955) new -- very slow processing on truncated PDF
[ https://issues.apache.org/jira/browse/PDFBOX-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler resolved PDFBOX-3955. Resolution: Fixed I've fixed the very slow performance. Objects streams were parsed multiple times when rebuilding the trailer dictionary. But my fix doesn't "heal" the truncated pdf. It's corrupt and can't be fixed as the root object is missing. [~talli...@mitre.org] Thanks for the finding. > new -- very slow processing on truncated PDF > > > Key: PDFBOX-3955 > URL: https://issues.apache.org/jira/browse/PDFBOX-3955 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Reporter: Tim Allison >Assignee: Andreas Lehmkühler > Fix For: 2.0.8, 3.0.0 > > > In the latest regression run with PDFBox's 2.x branch, we're now getting very > slow processing on a truncated PDF with PDFBox app's {{ExtractText}}: > http://162.242.228.174/docs/truncated_pdfs/commoncrawl2_likely_broken/7K/7KK53NK5PVKOUGDSQ4FK6542BNPC4SWB > Turns out this is not an infinite loop. After 4.5 minutes, {{ExtractText}} > eventually ended with: > {noformat} > Exception in thread "main" java.io.IOException: Missing root object > specification in trailer. > at > org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2508) > at > org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:193) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:240) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1012) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:950) > at > org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:192) > at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60) > {noformat} > . -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3955) new -- very slow processing on truncated PDF
[ https://issues.apache.org/jira/browse/PDFBOX-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-3955: --- Fix Version/s: 3.0.0 2.0.8 > new -- very slow processing on truncated PDF > > > Key: PDFBOX-3955 > URL: https://issues.apache.org/jira/browse/PDFBOX-3955 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Reporter: Tim Allison >Assignee: Andreas Lehmkühler > Fix For: 2.0.8, 3.0.0 > > > In the latest regression run with PDFBox's 2.x branch, we're now getting very > slow processing on a truncated PDF with PDFBox app's {{ExtractText}}: > http://162.242.228.174/docs/truncated_pdfs/commoncrawl2_likely_broken/7K/7KK53NK5PVKOUGDSQ4FK6542BNPC4SWB > Turns out this is not an infinite loop. After 4.5 minutes, {{ExtractText}} > eventually ended with: > {noformat} > Exception in thread "main" java.io.IOException: Missing root object > specification in trailer. > at > org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2508) > at > org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:193) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:240) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1012) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:950) > at > org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:192) > at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60) > {noformat} > . -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3955) new -- very slow processing on truncated PDF
[ https://issues.apache.org/jira/browse/PDFBOX-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197430#comment-16197430 ] ASF subversion and git services commented on PDFBOX-3955: - Commit 1811590 from [~lehmi] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1811590 ] PDFBOX-3955: don't parse object stream multiple times > new -- very slow processing on truncated PDF > > > Key: PDFBOX-3955 > URL: https://issues.apache.org/jira/browse/PDFBOX-3955 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Reporter: Tim Allison >Assignee: Andreas Lehmkühler > Fix For: 2.0.8, 3.0.0 > > > In the latest regression run with PDFBox's 2.x branch, we're now getting very > slow processing on a truncated PDF with PDFBox app's {{ExtractText}}: > http://162.242.228.174/docs/truncated_pdfs/commoncrawl2_likely_broken/7K/7KK53NK5PVKOUGDSQ4FK6542BNPC4SWB > Turns out this is not an infinite loop. After 4.5 minutes, {{ExtractText}} > eventually ended with: > {noformat} > Exception in thread "main" java.io.IOException: Missing root object > specification in trailer. > at > org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2508) > at > org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:193) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:240) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1012) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:950) > at > org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:192) > at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60) > {noformat} > . -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3955) new -- very slow processing on truncated PDF
[ https://issues.apache.org/jira/browse/PDFBOX-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197427#comment-16197427 ] ASF subversion and git services commented on PDFBOX-3955: - Commit 1811589 from [~lehmi] in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1811589 ] PDFBOX-3955: don't parse object stream multiple times > new -- very slow processing on truncated PDF > > > Key: PDFBOX-3955 > URL: https://issues.apache.org/jira/browse/PDFBOX-3955 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Reporter: Tim Allison >Assignee: Andreas Lehmkühler > > In the latest regression run with PDFBox's 2.x branch, we're now getting very > slow processing on a truncated PDF with PDFBox app's {{ExtractText}}: > http://162.242.228.174/docs/truncated_pdfs/commoncrawl2_likely_broken/7K/7KK53NK5PVKOUGDSQ4FK6542BNPC4SWB > Turns out this is not an infinite loop. After 4.5 minutes, {{ExtractText}} > eventually ended with: > {noformat} > Exception in thread "main" java.io.IOException: Missing root object > specification in trailer. > at > org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2508) > at > org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:193) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:240) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1012) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:950) > at > org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:192) > at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60) > {noformat} > . -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3955) new infinite loop on truncated PDF
[ https://issues.apache.org/jira/browse/PDFBOX-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated PDFBOX-3955: Description: In the latest regression run with PDFBox's 2.x branch, we're now getting very slow processing on a truncated PDF with PDFBox app's {{ExtractText}}: http://162.242.228.174/docs/truncated_pdfs/commoncrawl2_likely_broken/7K/7KK53NK5PVKOUGDSQ4FK6542BNPC4SWB Turns out this is not an infinite loop. After 4.5 minutes, {{ExtractText}} eventually ended with: {noformat} Exception in thread "main" java.io.IOException: Missing root object specification in trailer. at org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2508) at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:193) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:240) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1012) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:950) at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:192) at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82) at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60) {noformat} . was: In the latest regression run with PDFBox's 2.x branch, we're now getting an infinite loop on a truncated PDF with PDFBox app's {{ExtractText}}: http://162.242.228.174/docs/truncated_pdfs/commoncrawl2_likely_broken/7K/7KK53NK5PVKOUGDSQ4FK6542BNPC4SWB . > new infinite loop on truncated PDF > -- > > Key: PDFBOX-3955 > URL: https://issues.apache.org/jira/browse/PDFBOX-3955 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Reporter: Tim Allison >Assignee: Andreas Lehmkühler > > In the latest regression run with PDFBox's 2.x branch, we're now getting very > slow processing on a truncated PDF with PDFBox app's {{ExtractText}}: > http://162.242.228.174/docs/truncated_pdfs/commoncrawl2_likely_broken/7K/7KK53NK5PVKOUGDSQ4FK6542BNPC4SWB > Turns out this is not an infinite loop. After 4.5 minutes, {{ExtractText}} > eventually ended with: > {noformat} > Exception in thread "main" java.io.IOException: Missing root object > specification in trailer. > at > org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2508) > at > org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:193) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:240) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1012) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:950) > at > org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:192) > at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60) > {noformat} > . -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3955) new -- very slow processing on truncated PDF
[ https://issues.apache.org/jira/browse/PDFBOX-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated PDFBOX-3955: Summary: new -- very slow processing on truncated PDF (was: new infinite loop on truncated PDF) > new -- very slow processing on truncated PDF > > > Key: PDFBOX-3955 > URL: https://issues.apache.org/jira/browse/PDFBOX-3955 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Reporter: Tim Allison >Assignee: Andreas Lehmkühler > > In the latest regression run with PDFBox's 2.x branch, we're now getting very > slow processing on a truncated PDF with PDFBox app's {{ExtractText}}: > http://162.242.228.174/docs/truncated_pdfs/commoncrawl2_likely_broken/7K/7KK53NK5PVKOUGDSQ4FK6542BNPC4SWB > Turns out this is not an infinite loop. After 4.5 minutes, {{ExtractText}} > eventually ended with: > {noformat} > Exception in thread "main" java.io.IOException: Missing root object > specification in trailer. > at > org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2508) > at > org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:193) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:240) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1012) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:950) > at > org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:192) > at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60) > {noformat} > . -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Assigned] (PDFBOX-3955) new infinite loop on truncated PDF
[ https://issues.apache.org/jira/browse/PDFBOX-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler reassigned PDFBOX-3955: -- Assignee: Andreas Lehmkühler > new infinite loop on truncated PDF > -- > > Key: PDFBOX-3955 > URL: https://issues.apache.org/jira/browse/PDFBOX-3955 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Reporter: Tim Allison >Assignee: Andreas Lehmkühler > > In the latest regression run with PDFBox's 2.x branch, we're now getting an > infinite loop on a truncated PDF with PDFBox app's {{ExtractText}}: > http://162.242.228.174/docs/truncated_pdfs/commoncrawl2_likely_broken/7K/7KK53NK5PVKOUGDSQ4FK6542BNPC4SWB > . -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Apache PDFBox October 2017 report due
Hi, find attached a quick draft of the board report we're expected to submit this month. It's based upon the report template which can be found at [1] Any further comments, objections or additions? ## Description: - the Apache PDFBox library is an open source Java tool for working with PDF documents. ## Issues: - there are no issue requiring board attention at this time. ## Activity: - we are working on fixing bugs in 2.0.x - there are some small improvements as well - we already started a vote for 2.0.8, but cancelled it due to a regression - Levigo donated their JBig2 ImageIO plugin to PDFBox and we are almost done with the integration ## Health report: - there is a steady stream of contributions, bug reports and questions on the mailing lists ## PMC changes: - Currently 18 PMC members. - New PMC members: - Jörg Henne was added as a committer on Mon Oct 09 2017 ## Committer base changes: - Currently 18 committers. - New commmitter: - Jörg Henne was added as a committer on Mon Oct 09 2017 ## Releases: - Last release was 2.0.6 on Mon May 15 2017 ## JIRA activity: - 92 JIRA tickets created in the last 3 months - 95 JIRA tickets closed/resolved in the last 3 months Andreas [1] https://reporter.apache.org/?pdfbox - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3955) new infinite loop on truncated PDF
Tim Allison created PDFBOX-3955: --- Summary: new infinite loop on truncated PDF Key: PDFBOX-3955 URL: https://issues.apache.org/jira/browse/PDFBOX-3955 Project: PDFBox Issue Type: Bug Components: Parsing Reporter: Tim Allison In the latest regression run with PDFBox's 2.x branch, we're now getting an infinite loop on a truncated PDF with PDFBox app's {{ExtractText}}: http://162.242.228.174/docs/truncated_pdfs/commoncrawl2_likely_broken/7K/7KK53NK5PVKOUGDSQ4FK6542BNPC4SWB . -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
New PDFBox Committer
Hi, I'm happy to announce that the PDFBox PMC has decided to offer committership in Apache PDFBox to Jörg Henne. He has accepted the offer and should have his committer-bits ready by now. As all other committers Jörg has joined the PMC as well. BR Andreas Lehmkühler - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3953) StackOverflowError in org.apache.pdfbox.pdmodel.PDPageTree.getKids
[ https://issues.apache.org/jira/browse/PDFBOX-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Spinsanti updated PDFBOX-3953: Description: I got an StackOverflowError in org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:135) {code} java.lang.StackOverflowError at org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:135) at org.apache.pdfbox.pdmodel.PDPageTree.access$200(PDPageTree.java:38) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:166) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) ... {code} was: I got an StackOverflowError in org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:135) {code} java.lang.StackOverflowError at org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:135) at org.apache.pdfbox.pdmodel.PDPageTree.access$200(PDPageTree.java:38) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:166) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) {code} > StackOverflowError in org.apache.pdfbox.pdmodel.PDPageTree.getKids > -- > > Key: PDFBOX-3953 > URL: https://issues.apache.org/jira/browse/PDFBOX-3953 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Affects Versions: 2.0.7 >Reporter: Jorge Spinsanti > > I got an StackOverflowError in > org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:135) > {code} > java.lang.StackOverflowError > at org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:135) > at org.apache.pdfbox.pdmodel.PDPageTree.access$200(PDPageTree.java:38) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:166) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > at > org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) > ... > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3954) IllegalArgumentException on PDPageTree
Jorge Spinsanti created PDFBOX-3954: --- Summary: IllegalArgumentException on PDPageTree Key: PDFBOX-3954 URL: https://issues.apache.org/jira/browse/PDFBOX-3954 Project: PDFBox Issue Type: Bug Components: PDModel Reporter: Jorge Spinsanti I got the following stacktrace: {code} java.lang.IllegalArgumentException: root cannot be null at org.apache.pdfbox.pdmodel.PDPageTree.(PDPageTree.java:75) at org.apache.pdfbox.pdmodel.PDDocumentCatalog.getPages(PDDocumentCatalog.java:129) at org.apache.pdfbox.multipdf.PDFMergerUtility.appendDocument(PDFMergerUtility.java:562) at org.apache.pdfbox.multipdf.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:265) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3953) StackOverflowError in org.apache.pdfbox.pdmodel.PDPageTree.getKids
Jorge Spinsanti created PDFBOX-3953: --- Summary: StackOverflowError in org.apache.pdfbox.pdmodel.PDPageTree.getKids Key: PDFBOX-3953 URL: https://issues.apache.org/jira/browse/PDFBOX-3953 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 2.0.7 Reporter: Jorge Spinsanti I got an StackOverflowError in org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:135) {code} java.lang.StackOverflowError at org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:135) at org.apache.pdfbox.pdmodel.PDPageTree.access$200(PDPageTree.java:38) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:166) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) at org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
RE: 2.0.8?
Starting process now. -Original Message- From: Andreas Lehmkuehler [mailto:andr...@lehmi.de] Sent: Sunday, October 8, 2017 10:12 AM To: dev@pdfbox.apache.org Subject: Re: 2.0.8? Am 03.10.2017 um 15:38 schrieb Allison, Timothy B.: > >> And yes, we need another regressions run if possible > > Sounds good. Will do once I hear that we're good to go. Thank you! We are good now. @Tim: Could you please re-run your test to see how good we are? TIA, Andreas > > - > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For > additional commands, e-mail: dev-h...@pdfbox.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org