[ https://issues.apache.org/jira/browse/PDFBOX-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
thomas menzel updated PDFBOX-546: --------------------------------- Description: SYMPTOM this is the full stack trace that i'm observing with the PDF file i attached @ https://issues.apache.org/jira/secure/attachment/12422836/PwC-Tech-Forecast-Spring-2009.pdf Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:860) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:825) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:750) at org.apache.pdfbox.ExtractText.main(ExtractText.java:173) Caused by: java.util.NoSuchElementException at java.util.AbstractList$Itr.next(Unknown Source) at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115) at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203) ... 4 more STEPS cmdline: org.apache.pdfbox.ExtractText on the file i found the exception also @ PDFBOX-533 (https://issues.apache.org/jira/browse/PDFBOX-533?focusedCommentId=12756825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12756825) but am not sure if this is the same case or not as this file is a lot smaller and have so little clue about the internal structure of PDF that i even can follow any of the comments. sorry. see also https://issues.apache.org/jira/browse/PDFBOX-186 how i got to create this issue. was: SYMPTOM this is the full stack trace that i'm observing with the PDF file @ Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:860) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:825) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:750) at org.apache.pdfbox.ExtractText.main(ExtractText.java:173) Caused by: java.util.NoSuchElementException at java.util.AbstractList$Itr.next(Unknown Source) at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115) at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203) ... 4 more STEPS cmdline: org.apache.pdfbox.ExtractText on the file i found the exception also @ PDFBOX-533 (https://issues.apache.org/jira/browse/PDFBOX-533?focusedCommentId=12756825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12756825) but am not sure if this is the same case or not. see also https://issues.apache.org/jira/browse/PDFBOX-186 how i got to create this issue. > [parser] .PDFXrefStreamParser.parse fails with > java.util.NoSuchElementException > ------------------------------------------------------------------------------- > > Key: PDFBOX-546 > URL: https://issues.apache.org/jira/browse/PDFBOX-546 > Project: PDFBox > Issue Type: Bug > Components: Parsing, Text extraction > Affects Versions: 0.8.0-incubator > Reporter: thomas menzel > > SYMPTOM > this is the full stack trace that i'm observing with the PDF file i attached > @ > https://issues.apache.org/jira/secure/attachment/12422836/PwC-Tech-Forecast-Spring-2009.pdf > Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:860) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:825) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:750) > at org.apache.pdfbox.ExtractText.main(ExtractText.java:173) > Caused by: java.util.NoSuchElementException > at java.util.AbstractList$Itr.next(Unknown Source) > at > org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115) > at > org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203) > ... 4 more > STEPS > cmdline: org.apache.pdfbox.ExtractText on the file > i found the exception also @ PDFBOX-533 > (https://issues.apache.org/jira/browse/PDFBOX-533?focusedCommentId=12756825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12756825) > but am not sure if this is the same case or not as this file is a lot > smaller and have so little clue about the internal structure of PDF that i > even can follow any of the comments. sorry. > see also https://issues.apache.org/jira/browse/PDFBOX-186 how i got to create > this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.