I suspect that Nutch has not downloaded full pdf. There is a setting in the nutch config file to truncate large files. It's efficient for html but might cause such errors for other formats.
Check this setting and adjust accordingly. Alexander 2008/10/29 olivier_coface <[EMAIL PROTECTED]> > > I had the following error when crawling on pdf files (it happened on 2 pdf > files): > > http://lyra:85/ExternalDocumentation/BusinessComponentApproach_Chapter2.pdf > : > failed(2,0): Can't be handled as pdf document. java.io.EOFException: > Unexpected end of ZLIB input stream > > Any idea? > -- > View this message in context: > http://www.nabble.com/Unexpected-end-of-ZLIB-input-stream-when-parsing-pdf-files-tp20223893p20223893.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > -- Best Regards Alexander Aristov
