On Wed, Jan 28, 2009 at 8:00 AM, Rolando Bermudez Peña <[email protected]> wrote:
> Hello all,
>
> When crawling my intranet I encounter with several errors like the following.
>
>
> fetch of http://intranet/pdf/fund_ admin_fin_2.pdf failed with: 
> java.lang.IllegalArgumentException: Invalid uri
> 'http://intranet/pdf/fund_ admin_fin_2.pdf': escaped absolute path not valid
>

This url contains spaces and nutch rejects it as an invalid URL, I think.

>
> Error parsing: http://intranet/pdf/infotech.pdf: failed(2,0): Can't be 
> handled as pdf document. java.lang.ClassCastException: 
> org.pdfbox.pdmodel.encryption.PDEncryptionDictionary cannot be cast to 
> org.pdfbox.pdmodel.encryption.PDStandardEncryption
>
> Error parsing: http://intranet/pdf/ronda_pupo.pdf: failed(2,0): Can't be 
> handled as pdf document. java.io.IOException: Error: expected the end of a 
> dictionary.
>
>
> Any ideas what is causing this, perhaps is a bad configuration?
>
> Regards,
> Rolando
>
>
>



-- 
Doğacan Güney

Reply via email to