I've been using the nutchgora branch for a few months so I'm very new to it and I've been able to find information on the user or dev list, jira, or regular web searches for most of the issues I've encountered except one.
This error occurs frequently when parsing but not always and there doesn't seem to be a common element among the pages it is erroring out on. I have tried a number of revisions of the nutchgora branch that built with ant/ivy fine and also with eclipse. I also tried the prebuilt copies from Jenkins. I've run them on Debian 6.0.5, ubuntu 10, and ubuntu 12 with the openjdk and the sun jdk and always receive the same error. Can someone shed some light on this or point me in the right direction. Thanks. The error output to stdout is: Parsing http://www.site.com/dir/page.html Exception in thread "main" java.lang.RuntimeException: job failed: name=parse, jobid=job_local_0001 at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:47) at org.apache.nutch.parse.ParserJob.run(ParserJob.java:242) at org.apache.nutch.parse.ParserJob.parse(ParserJob.java:257) at org.apache.nutch.parse.ParserJob.run(ParserJob.java:300) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.parse.ParserJob.main(ParserJob.java:304) The error output in the hadoop.log is: 2012-05-25 13:50:29,635 INFO parse.ParserJob - Parsing http://www.site.com/dir/page.html 2012-05-25 13:50:29,638 WARN mapred.FileOutputCommitter - Output path is null in cleanup 2012-05-25 13:50:29,639 WARN mapred.LocalJobRunner - job_local_0001 java.lang.NullPointerException at org.apache.avro.util.Utf8.<init>(Utf8.java:37) at org.apache.nutch.parse.ParseUtil.process(ParseUtil.java:212) at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:123) at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:76) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

