Hi band_master, On Tue, Jul 23, 2013 at 1:20 PM, band_master <[email protected]>wrote:
> I am having trouble, though, getting Nutch to work. I can successfully > inject urls, but there seems to be an error in the Hadoop log around > parsing > UTF8 characters. > How are you coming to this conclusion? I would say, that from your logs, there is a problem when generating your URLs and batchId for fetching. > 2013-07-23 13:07:22,273 WARN mapred.LocalJobRunner - > job_local117641048_0002 > java.lang.NullPointerException > at org.apache.avro.util.Utf8.<init>(Utf8.java:37) > at > org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) > > GeneratorReducer.java:100 is as follows batchId = new Utf8(conf.get(GeneratorJob.BATCH_ID)); Which would indicate that the BATCH_ID has not been assigned yet. Do you have the log of when the batchId should have been generated? Thanks Lewis

