I've been using the nutchgora branch for a few months so I'm very new to it
and I've been able to find information on the user or dev list, jira, or
regular web searches for most of the issues I've encountered except one.



This error occurs frequently when parsing but not always and there doesn't
seem to be a common element among the pages it is erroring out on.



I have tried a number of revisions of the nutchgora branch that built with
ant/ivy fine and also with eclipse. I also tried the prebuilt copies from
Jenkins. I've run them on Debian 6.0.5, ubuntu 10, and ubuntu 12 with the
openjdk and the sun jdk and always receive the same error.



Can someone shed some light on this or point me in the right direction.
Thanks.





The error output to stdout is:

Parsing http://www.site.com/dir/page.html

Exception in thread "main" java.lang.RuntimeException: job failed:
name=parse, jobid=job_local_0001

at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:47)

at org.apache.nutch.parse.ParserJob.run(ParserJob.java:242)

at org.apache.nutch.parse.ParserJob.parse(ParserJob.java:257)

at org.apache.nutch.parse.ParserJob.run(ParserJob.java:300)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.nutch.parse.ParserJob.main(ParserJob.java:304)



The error output in the hadoop.log is:

2012-05-25 13:50:29,635 INFO  parse.ParserJob - Parsing
http://www.site.com/dir/page.html

2012-05-25 13:50:29,638 WARN  mapred.FileOutputCommitter - Output path is
null in cleanup

2012-05-25 13:50:29,639 WARN  mapred.LocalJobRunner - job_local_0001

java.lang.NullPointerException

at org.apache.avro.util.Utf8.<init>(Utf8.java:37)

at org.apache.nutch.parse.ParseUtil.process(ParseUtil.java:212)

at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:123)

at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:76)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

Reply via email to