BTW, the PDF parser can only handle files which were encripted with empty password.
Alex 2008/10/29 Ben Litchfield <[EMAIL PROTECTED]> > I have seen this sometimes when the PDF is encrypted as well. > > Ben > > > Quoting Alexander Aristov <[EMAIL PROTECTED]>: > > I suspect that Nutch has not downloaded full pdf. There is a setting in >> the >> nutch config file to truncate large files. It's efficient for html but >> might >> cause such errors for other formats. >> >> Check this setting and adjust accordingly. >> >> Alexander >> >> 2008/10/29 olivier_coface <[EMAIL PROTECTED]> >> >> >>> I had the following error when crawling on pdf files (it happened on 2 >>> pdf >>> files): >>> >>> >>> http://lyra:85/ExternalDocumentation/BusinessComponentApproach_Chapter2.pdf >>> : >>> failed(2,0): Can't be handled as pdf document. java.io.EOFException: >>> Unexpected end of ZLIB input stream >>> >>> Any idea? >>> -- >>> View this message in context: >>> >>> http://www.nabble.com/Unexpected-end-of-ZLIB-input-stream-when-parsing-pdf-files-tp20223893p20223893.html >>> Sent from the Nutch - User mailing list archive at Nabble.com. >>> >>> >>> >> >> -- >> Best Regards >> Alexander Aristov >> >> > > > -- Best Regards Alexander Aristov
