Re: Unexpected end of ZLIB input stream when parsing pdf files

Ben Litchfield Wed, 29 Oct 2008 07:02:01 -0700

I have seen this sometimes when the PDF is encrypted as well.


Ben

Quoting Alexander Aristov <[EMAIL PROTECTED]>:

I suspect that Nutch has not downloaded full pdf. There is a setting in the
nutch config file to truncate large files. It's efficient for html but might
cause such  errors for other formats.

Check this setting and adjust accordingly.

Alexander

2008/10/29 olivier_coface <[EMAIL PROTECTED]>


I had the following error when crawling on pdf files (it happened on 2 pdf
files):

http://lyra:85/ExternalDocumentation/BusinessComponentApproach_Chapter2.pdf
:
failed(2,0): Can't be handled as pdf document. java.io.EOFException:
Unexpected end of ZLIB input stream

Any idea?
--
View this message in context:
http://www.nabble.com/Unexpected-end-of-ZLIB-input-stream-when-parsing-pdf-files-tp20223893p20223893.html
Sent from the Nutch - User mailing list archive at Nabble.com.



--
Best Regards
Alexander Aristov

Re: Unexpected end of ZLIB input stream when parsing pdf files

Reply via email to