I believe these errors are due to a parsing bug in PDFBox that has been fixed since the 0.7.2 release. Please give the nightly build(should be a drop in replacement) a try from http://www.pdfbox.org/dist and let me know if you are still having issues.
Ben On Tue, 28 Feb 2006, Richard Braman wrote: > I get the following errors regarding pdf: > > 060228 160518 fetch okay, but can't parse > http://taxpros.marylandtaxes.com/publications/revenews/archives/spr05_hi > .pdf, reason: failed(2,202): Content truncated at 66005 bytes. Parser > can't handle incomplete pdf file. > > 060228 160354 fetch okay, but can't parse > http://www.mstc.state.ms.us/info/stats/transfer/tran0704.pdf, reason: > failed(2,0): Can't be handled as pdf document. > java.lang.NullPointerException > > 060228 160518 fetch okay, but can't parse > http://www.dor.state.nc.us/downloads/corp_archive/03archive/NC478_Instru > ctions.pdf, reason: failed(2,0): Can't be handled as pdf document. > java.io.IOException: You do not have permission to extract text > > I have a number of errors like this in my log, mostly the content > truncated one. > > The thing is these files all open fine in acrobat. > > > > Richard Braman > mailto:[EMAIL PROTECTED] > 561.748.4002 (voice) > > http://www.taxcodesoftware.org <http://www.taxcodesoftware.org/> > Free Open Source Tax Software > > > ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
