Hi!
I'm running Nutch v1.2 and experience problems while trying to index
PDF-documents. The error I receive is:
Error parsing: <docname>.pdf: failed(2,0): expected='endstream' actual=''
org.apache.pdfbox.io.pushbackinputstr...@cbf92
I've inspected the security settings and printing/content copying/page
extraction are all allowed. While inspecting the document properties I see:
- PDF Producer: Adobe PDF Library 9.9
- PDF Version: 1.5 (Acrobat 5.x)
What might be the culprit here?
Thanks in advance!
/Peter