[ http://issues.apache.org/jira/browse/NUTCH-220?page=comments#action_12372277 ]
Ben Litchfield commented on NUTCH-220: -------------------------------------- Actually, now that I look at the stack trace, the NPE is not happening in PDFBox code it appears to be in hadoop code, so I don't think that upgrading PDFBox will help. Ben > PDF Box can't parse document: java.lang.NullPointerException > ------------------------------------------------------------ > > Key: NUTCH-220 > URL: http://issues.apache.org/jira/browse/NUTCH-220 > Project: Nutch > Type: Bug > Environment: PDFBox 0.7.2 > Reporter: Richard Braman > > This error was fixed in the ltest build of PDFBOx, which should be tested > with nutch. > >> 060228 160354 fetch okay, but can't parse > >> http://www.mstc.state.ms.us/info/stats/transfer/tran0704.pdf, reason: > >> failed(2,0): Can't be handled as pdf document. > >> java.lang.NullPointerException > Yes, the NPE should be fixed. > Ben > Richard Braman wrote: > > Hi Bn, > > > > We actually got to the bottom of all of them except for 1... The > > content truncatetion was due to an inconsistancy bug in nutch config . > > The no permission to extract text is actually true, the author, the NC > > Department of revenue put this restriction on all of their files (I have > > asked them to remove it as it hampers public accessability). The Null > > pointer exception is the only one to deal with that may be due to the > > parsing bug . Is this one that you are referring to? > > > > -----Original Message----- > > From: Ben Litchfield [mailto:[EMAIL PROTECTED] > > Sent: Thursday, March 02, 2006 4:07 PM > > To: Richard Braman > > Cc: [email protected]; [email protected]; > > [EMAIL PROTECTED] > > Subject: Re: [PDFBox-user] PDF Parse Error > > > > > > > > I believe these errors are due to a parsing bug in PDFBox that has > > been fixed since the 0.7.2 release. Please give the nightly > > build(should be a drop in replacement) a try from > > http://www.pdfbox.org/dist and let me know if you are still having > > issues. > > > > Ben -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
