[ http://issues.apache.org/jira/browse/NUTCH-220?page=comments#action_12372275 ]
Richard Braman commented on NUTCH-220: -------------------------------------- PDFBox-0.7.3 no longer depends on log4j at all, so you should not be getting any log4j errors from PDFBox! Ben On Sun, 26 Mar 2006, Richard Braman wrote: > > Hi Ben, > > I noticed that the nutch uses a log4j version of PDFBox.jar. I don't > > see this as an ant target on 0.7.3 . I downloaded pdfbox from CVS Head. > > > > When I tried to use the PDFBox nightly it gave me a bunch of log4j > > errors, so I guess nutch expects the log4j version. > > > > I am trying to upgrade my nutch to 0.7.3 to see if I can get arid of the > > NPE error. > > > > The NPE bug I told you about a few weeks ago is much worse effect in > > Nutch .8, as it seems to cause the fetcher to abort. > > > > 060326 142450 fetch of > > http://www.state.sd.us/drr2/reg/bank/Trust%20Fee%20Calculation.pdf > > failed with: java.lang.NullPointerException > > java.lang.NullPointerException > > at > > org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:180) > > at > > org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:171) > > at org.apache.hadoop.mapred.MapTask$2.collect(MapTask.java:91) > > at > > org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:245) > > at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:185) > > 060326 142450 SEVERE fetcher caught:java.lang.NullPointerException > > > > -- > > Richard L Braman, Jr., CPA > > Tax Code Software Foundation, Inc. > > Open Source Tax Software > > http://www.taxcodesoftware.org > > [EMAIL PROTECTED] > > > PDF Box can't parse document: java.lang.NullPointerException > ------------------------------------------------------------ > > Key: NUTCH-220 > URL: http://issues.apache.org/jira/browse/NUTCH-220 > Project: Nutch > Type: Bug > Environment: PDFBox 0.7.2 > Reporter: Richard Braman > > This error was fixed in the ltest build of PDFBOx, which should be tested > with nutch. > >> 060228 160354 fetch okay, but can't parse > >> http://www.mstc.state.ms.us/info/stats/transfer/tran0704.pdf, reason: > >> failed(2,0): Can't be handled as pdf document. > >> java.lang.NullPointerException > Yes, the NPE should be fixed. > Ben > Richard Braman wrote: > > Hi Bn, > > > > We actually got to the bottom of all of them except for 1... The > > content truncatetion was due to an inconsistancy bug in nutch config . > > The no permission to extract text is actually true, the author, the NC > > Department of revenue put this restriction on all of their files (I have > > asked them to remove it as it hampers public accessability). The Null > > pointer exception is the only one to deal with that may be due to the > > parsing bug . Is this one that you are referring to? > > > > -----Original Message----- > > From: Ben Litchfield [mailto:[EMAIL PROTECTED] > > Sent: Thursday, March 02, 2006 4:07 PM > > To: Richard Braman > > Cc: [email protected]; [email protected]; > > [EMAIL PROTECTED] > > Subject: Re: [PDFBox-user] PDF Parse Error > > > > > > > > I believe these errors are due to a parsing bug in PDFBox that has > > been fixed since the 0.7.2 release. Please give the nightly > > build(should be a drop in replacement) a try from > > http://www.pdfbox.org/dist and let me know if you are still having > > issues. > > > > Ben -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
