[ 
https://issues.apache.org/jira/browse/PDFBOX-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

thomas menzel updated PDFBOX-546:
---------------------------------

    Description: 
SYMPTOM
this is the full stack trace that i'm observing with the PDF file i attached @ 
https://issues.apache.org/jira/secure/attachment/12422836/PwC-Tech-Forecast-Spring-2009.pdf

Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:860)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:825)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:750)
        at org.apache.pdfbox.ExtractText.main(ExtractText.java:173)
Caused by: java.util.NoSuchElementException
        at java.util.AbstractList$Itr.next(Unknown Source)
        at 
org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
        at 
org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
        ... 4 more

STEPS
cmdline: org.apache.pdfbox.ExtractText on the file

i found the exception also @ PDFBOX-533 
(https://issues.apache.org/jira/browse/PDFBOX-533?focusedCommentId=12756825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12756825)
  but am not sure if this is the same case or not as this file is a lot smaller 
and have so little clue about the internal structure of PDF that i even can 
follow any of the comments. sorry.

see also https://issues.apache.org/jira/browse/PDFBOX-186 how i got to create 
this issue.

  was:
SYMPTOM
this is the full stack trace that i'm observing with the PDF file @ 

Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:860)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:825)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:750)
        at org.apache.pdfbox.ExtractText.main(ExtractText.java:173)
Caused by: java.util.NoSuchElementException
        at java.util.AbstractList$Itr.next(Unknown Source)
        at 
org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
        at 
org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
        ... 4 more

STEPS
cmdline: org.apache.pdfbox.ExtractText on the file

i found the exception also @ PDFBOX-533 
(https://issues.apache.org/jira/browse/PDFBOX-533?focusedCommentId=12756825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12756825)
  but am not sure if this is the same case or not.

see also https://issues.apache.org/jira/browse/PDFBOX-186 how i got to create 
this issue.


> [parser] .PDFXrefStreamParser.parse fails with 
> java.util.NoSuchElementException
> -------------------------------------------------------------------------------
>
>                 Key: PDFBOX-546
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-546
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: thomas menzel
>
> SYMPTOM
> this is the full stack trace that i'm observing with the PDF file i attached 
> @ 
> https://issues.apache.org/jira/secure/attachment/12422836/PwC-Tech-Forecast-Spring-2009.pdf
> Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:860)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:825)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:750)
>         at org.apache.pdfbox.ExtractText.main(ExtractText.java:173)
> Caused by: java.util.NoSuchElementException
>         at java.util.AbstractList$Itr.next(Unknown Source)
>         at 
> org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
>         at 
> org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
>         ... 4 more
> STEPS
> cmdline: org.apache.pdfbox.ExtractText on the file
> i found the exception also @ PDFBOX-533 
> (https://issues.apache.org/jira/browse/PDFBOX-533?focusedCommentId=12756825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12756825)
>   but am not sure if this is the same case or not as this file is a lot 
> smaller and have so little clue about the internal structure of PDF that i 
> even can follow any of the comments. sorry.
> see also https://issues.apache.org/jira/browse/PDFBOX-186 how i got to create 
> this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to