>
> It appeared to me that the NPE happens during the serialization into
> the crawldb - but I haven't been able to figure out why this happens.
>

The Fetcher does not write to the crawldb at all - this is done during the
'update' step. As you can see in the stacktrace, the problem arises when the
reducer deserializes its input before processing it.

Maybe you could try and run the code on the same data with a debugger to get
a clearer picture of what's happening?



On 31 May 2011 07:56, Viksit Gaur <[email protected]> wrote:

> [Cross posting on user and dev since this is a possible bug]
>
> Hi all,
>
> Running Nutch 1.2 Fetcher on an Amazon EMR cluster results in an error
> of the sort,
>
>
> 2011-05-31 05:55:52,858 WARN org.apache.hadoop.mapred.TaskTracker
> (main): Error running child
> java.lang.RuntimeException: java.lang.NullPointerException
>        at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:80)
>        at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:163)
>        at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:278)
>        at
> org.apache.nutch.util.GenericWritableConfigurable.readFields(GenericWritableConfigurable.java:54)
>        at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>        at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>        at
> org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:770)
>        at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:710)
>        at
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:228)
>        at
> org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:39)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:321)
>        at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2216)
> Caused by: java.lang.NullPointerException
>        at
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
>        at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:72)
>        ... 11 more
>
> and happens during the Fetcher's Reduce step,
>
> 2011-05-31 05:56:38,078 INFO org.apache.hadoop.mapred.JobClient
> (main):  map 100% reduce 66%
> 2011-05-31 05:56:38,078 INFO org.apache.hadoop.mapred.JobClient
> (main): Task Id : attempt_201105310525_0005_r_000000_1, Status :
> FAILED
> 2011-05-31 05:56:48,178 INFO org.apache.hadoop.mapred.JobClient
> (main):  map 100% reduce 90%
> 2011-05-31 05:56:53,230 INFO org.apache.hadoop.mapred.JobClient
> (main):  map 100% reduce 91%
> 2011-05-31 05:56:58,251 INFO org.apache.hadoop.mapred.JobClient
> (main):  map 100% reduce 94%
> 2011-05-31 05:57:03,271 INFO org.apache.hadoop.mapred.JobClient
> (main):  map 100% reduce 96%
> 2011-05-31 05:57:08,307 INFO org.apache.hadoop.mapred.JobClient
> (main):  map 100% reduce 97%
> 2011-05-31 05:57:13,343 INFO org.apache.hadoop.mapred.JobClient
> (main):  map 100% reduce 99%
> 2011-05-31 05:57:17,360 INFO org.apache.hadoop.mapred.JobClient
> (main):  map 100% reduce 66%
> 2011-05-31 05:57:17,360 INFO org.apache.hadoop.mapred.JobClient
> (main): Task Id : attempt_201105310525_0005_r_000000_2, Status :
> FAILED
> 2011-05-31 05:57:27,440 INFO org.apache.hadoop.mapred.JobClient
> (main):  map 100% reduce 90%
> 2011-05-31 05:57:32,460 INFO org.apache.hadoop.mapred.JobClient
> (main):  map 100% reduce 91%
> 2011-05-31 05:57:33,465 INFO org.apache.hadoop.mapred.JobClient
> (main):  map 100% reduce 92%
> 2011-05-31 05:57:37,497 INFO org.apache.hadoop.mapred.JobClient
> (main):  map 100% reduce 94%
> 2011-05-31 05:57:42,517 INFO org.apache.hadoop.mapred.JobClient
> (main):  map 100% reduce 96%
> 2011-05-31 05:57:47,537 INFO org.apache.hadoop.mapred.JobClient
> (main):  map 100% reduce 97%
> 2011-05-31 05:57:48,542 INFO org.apache.hadoop.mapred.JobClient
> (main):  map 100% reduce 98%
> 2011-05-31 05:57:52,558 INFO org.apache.hadoop.mapred.JobClient
> (main):  map 100% reduce 99%
> 2011-05-31 05:57:56,576 ERROR org.apache.nutch.fetcher.Fetcher (main):
> Fetcher: java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113)
>        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1107)
>        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1145)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1116)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>
> It appeared to me that the NPE happens during the serialization into
> the crawldb - but I haven't been able to figure out why this happens.
> Would anyone have ideas on this?
>
> Cheers
> Viksit
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to