Thanks Ferdy On Thu, May 31, 2012 at 9:58 PM, George Smith <[email protected]> wrote: > Hi Ferdy. > > That patch is fantastic. I applied the change yesterday morning and > everything is parsing smoothly. Thanks for your help. > > > > On Wed, May 30, 2012 at 5:30 AM, Ferdy Galema <[email protected]>wrote: > >> I already stumbled upon this some time ago. I just created the following >> issue with a work-around for it. (Already committed it so you can update >> your workspace as an alternative to applying the patch). >> >> https://issues.apache.org/jira/browse/NUTCH-1379 >> >> >> On Tue, May 29, 2012 at 3:57 PM, Lewis John Mcgibbney < >> [email protected]> wrote: >> >> > Hi George, >> > >> > How were you executing parsing on this page? >> > >> > the toArgMap method in Tool Util can throw a runtime exception, >> > however this doesn't look like the one your getting. >> > >> > What other kind of logging do you have around here? Specifically >> > related to when the parse method kicks in? This might give us a bit >> > more idea where exactly this is happening. >> > >> > Thanks >> > >> > Lewis >> > >> > On Fri, May 25, 2012 at 9:00 PM, George Smith <[email protected]> >> > wrote: >> > > I've been using the nutchgora branch for a few months so I'm very new >> to >> > it >> > > and I've been able to find information on the user or dev list, jira, >> or >> > > regular web searches for most of the issues I've encountered except >> one. >> > > >> > > >> > > >> > > This error occurs frequently when parsing but not always and there >> > doesn't >> > > seem to be a common element among the pages it is erroring out on. >> > > >> > > >> > > >> > > I have tried a number of revisions of the nutchgora branch that built >> > with >> > > ant/ivy fine and also with eclipse. I also tried the prebuilt copies >> from >> > > Jenkins. I've run them on Debian 6.0.5, ubuntu 10, and ubuntu 12 with >> the >> > > openjdk and the sun jdk and always receive the same error. >> > > >> > > >> > > >> > > Can someone shed some light on this or point me in the right direction. >> > > Thanks. >> > > >> > > >> > > >> > > >> > > >> > > The error output to stdout is: >> > > >> > > Parsing http://www.site.com/dir/page.html >> > > >> > > Exception in thread "main" java.lang.RuntimeException: job failed: >> > > name=parse, jobid=job_local_0001 >> > > >> > > at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:47) >> > > >> > > at org.apache.nutch.parse.ParserJob.run(ParserJob.java:242) >> > > >> > > at org.apache.nutch.parse.ParserJob.parse(ParserJob.java:257) >> > > >> > > at org.apache.nutch.parse.ParserJob.run(ParserJob.java:300) >> > > >> > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> > > >> > > at org.apache.nutch.parse.ParserJob.main(ParserJob.java:304) >> > > >> > > >> > > >> > > The error output in the hadoop.log is: >> > > >> > > 2012-05-25 13:50:29,635 INFO parse.ParserJob - Parsing >> > > http://www.site.com/dir/page.html >> > > >> > > 2012-05-25 13:50:29,638 WARN mapred.FileOutputCommitter - Output path >> is >> > > null in cleanup >> > > >> > > 2012-05-25 13:50:29,639 WARN mapred.LocalJobRunner - job_local_0001 >> > > >> > > java.lang.NullPointerException >> > > >> > > at org.apache.avro.util.Utf8.<init>(Utf8.java:37) >> > > >> > > at org.apache.nutch.parse.ParseUtil.process(ParseUtil.java:212) >> > > >> > > at >> org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:123) >> > > >> > > at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:76) >> > > >> > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >> > > >> > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) >> > > >> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) >> > > >> > > at >> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) >> > >> > >> > >> > -- >> > Lewis >> > >>
-- Lewis

