Hi Bn, We actually got to the bottom of all of them except for 1... The content truncatetion was due to an inconsistancy bug in nutch config . The no permission to extract text is actually true, the author, the NC Department of revenue put this restriction on all of their files (I have asked them to remove it as it hampers public accessability). The Null pointer exception is the only one to deal with that may be due to the parsing bug . Is this one that you are referring to?
-----Original Message----- From: Ben Litchfield [mailto:[EMAIL PROTECTED] Sent: Thursday, March 02, 2006 4:07 PM To: Richard Braman Cc: [email protected]; [email protected]; [EMAIL PROTECTED] Subject: Re: [PDFBox-user] PDF Parse Error I believe these errors are due to a parsing bug in PDFBox that has been fixed since the 0.7.2 release. Please give the nightly build(should be a drop in replacement) a try from http://www.pdfbox.org/dist and let me know if you are still having issues. Ben On Tue, 28 Feb 2006, Richard Braman wrote: > I get the following errors regarding pdf: > > 060228 160518 fetch okay, but can't parse > http://taxpros.marylandtaxes.com/publications/revenews/archives/spr05_ > hi > .pdf, reason: failed(2,202): Content truncated at 66005 bytes. Parser > can't handle incomplete pdf file. > > 060228 160354 fetch okay, but can't parse > http://www.mstc.state.ms.us/info/stats/transfer/tran0704.pdf, reason: > failed(2,0): Can't be handled as pdf document. > java.lang.NullPointerException > > 060228 160518 fetch okay, but can't parse > http://www.dor.state.nc.us/downloads/corp_archive/03archive/NC478_Inst > ru > ctions.pdf, reason: failed(2,0): Can't be handled as pdf document. > java.io.IOException: You do not have permission to extract text > > I have a number of errors like this in my log, mostly the content > truncated one. > > The thing is these files all open fine in acrobat. > > > > Richard Braman > mailto:[EMAIL PROTECTED] > 561.748.4002 (voice) > > http://www.taxcodesoftware.org <http://www.taxcodesoftware.org/> Free > Open Source Tax Software > > > ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
