BTW, the PDF parser can only handle files which were encripted with empty
password.

Alex

2008/10/29 Ben Litchfield <[EMAIL PROTECTED]>

> I have seen this sometimes when the PDF is encrypted as well.
>
> Ben
>
>
> Quoting Alexander Aristov <[EMAIL PROTECTED]>:
>
>  I suspect that Nutch has not downloaded full pdf. There is a setting in
>> the
>> nutch config file to truncate large files. It's efficient for html but
>> might
>> cause such  errors for other formats.
>>
>> Check this setting and adjust accordingly.
>>
>> Alexander
>>
>> 2008/10/29 olivier_coface <[EMAIL PROTECTED]>
>>
>>
>>> I had the following error when crawling on pdf files (it happened on 2
>>> pdf
>>> files):
>>>
>>>
>>> http://lyra:85/ExternalDocumentation/BusinessComponentApproach_Chapter2.pdf
>>> :
>>> failed(2,0): Can't be handled as pdf document. java.io.EOFException:
>>> Unexpected end of ZLIB input stream
>>>
>>> Any idea?
>>> --
>>> View this message in context:
>>>
>>> http://www.nabble.com/Unexpected-end-of-ZLIB-input-stream-when-parsing-pdf-files-tp20223893p20223893.html
>>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>>
>>>
>>>
>>
>> --
>> Best Regards
>> Alexander Aristov
>>
>>
>
>
>


-- 
Best Regards
Alexander Aristov

Reply via email to