Hi!

I'm running Nutch v1.2 and experience problems while trying to index 
PDF-documents. The error I receive is:

Error parsing: <docname>.pdf: failed(2,0): expected='endstream' actual='' 
org.apache.pdfbox.io.pushbackinputstr...@cbf92

I've inspected the security settings and printing/content copying/page 
extraction are all allowed. While inspecting the document properties I see:

        - PDF Producer: Adobe PDF Library 9.9
        - PDF Version: 1.5 (Acrobat 5.x)

What might be the culprit here?

Thanks in advance!
/Peter

Reply via email to