On Wed, Jan 28, 2009 at 8:00 AM, Rolando Bermudez Peña <[email protected]> wrote: > Hello all, > > When crawling my intranet I encounter with several errors like the following. > > > fetch of http://intranet/pdf/fund_ admin_fin_2.pdf failed with: > java.lang.IllegalArgumentException: Invalid uri > 'http://intranet/pdf/fund_ admin_fin_2.pdf': escaped absolute path not valid >
This url contains spaces and nutch rejects it as an invalid URL, I think. > > Error parsing: http://intranet/pdf/infotech.pdf: failed(2,0): Can't be > handled as pdf document. java.lang.ClassCastException: > org.pdfbox.pdmodel.encryption.PDEncryptionDictionary cannot be cast to > org.pdfbox.pdmodel.encryption.PDStandardEncryption > > Error parsing: http://intranet/pdf/ronda_pupo.pdf: failed(2,0): Can't be > handled as pdf document. java.io.IOException: Error: expected the end of a > dictionary. > > > Any ideas what is causing this, perhaps is a bad configuration? > > Regards, > Rolando > > > -- Doğacan Güney
