Hi band_master,

On Tue, Jul 23, 2013 at 1:20 PM, band_master <[email protected]>wrote:

> I am having trouble, though, getting Nutch to work. I can successfully
> inject urls, but there seems to be an error in the Hadoop log around
> parsing
> UTF8 characters.
>

How are you coming to this conclusion? I would say, that from your logs,
there is a problem when generating your URLs and batchId for fetching.


> 2013-07-23 13:07:22,273 WARN  mapred.LocalJobRunner -
> job_local117641048_0002
> java.lang.NullPointerException
>         at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
>         at
> org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)
>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
>         at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>
>
GeneratorReducer.java:100 is as follows

batchId = new Utf8(conf.get(GeneratorJob.BATCH_ID));

Which would indicate that the BATCH_ID has not been assigned yet.
Do you have the log of when the batchId should have been generated?

Thanks
Lewis

Reply via email to