[Cross posting on user and dev since this is a possible bug]
Hi all,
Running Nutch 1.2 Fetcher on an Amazon EMR cluster results in an error
of the sort,
2011-05-31 05:55:52,858 WARN org.apache.hadoop.mapred.TaskTracker
(main): Error running child
java.lang.RuntimeException: java.lang.NullPointerException
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:80)
at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:163)
at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:278)
at
org.apache.nutch.util.GenericWritableConfigurable.readFields(GenericWritableConfigurable.java:54)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at
org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:770)
at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:710)
at
org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:228)
at
org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:39)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:321)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2216)
Caused by: java.lang.NullPointerException
at
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:72)
... 11 more
and happens during the Fetcher's Reduce step,
2011-05-31 05:56:38,078 INFO org.apache.hadoop.mapred.JobClient
(main): map 100% reduce 66%
2011-05-31 05:56:38,078 INFO org.apache.hadoop.mapred.JobClient
(main): Task Id : attempt_201105310525_0005_r_000000_1, Status :
FAILED
2011-05-31 05:56:48,178 INFO org.apache.hadoop.mapred.JobClient
(main): map 100% reduce 90%
2011-05-31 05:56:53,230 INFO org.apache.hadoop.mapred.JobClient
(main): map 100% reduce 91%
2011-05-31 05:56:58,251 INFO org.apache.hadoop.mapred.JobClient
(main): map 100% reduce 94%
2011-05-31 05:57:03,271 INFO org.apache.hadoop.mapred.JobClient
(main): map 100% reduce 96%
2011-05-31 05:57:08,307 INFO org.apache.hadoop.mapred.JobClient
(main): map 100% reduce 97%
2011-05-31 05:57:13,343 INFO org.apache.hadoop.mapred.JobClient
(main): map 100% reduce 99%
2011-05-31 05:57:17,360 INFO org.apache.hadoop.mapred.JobClient
(main): map 100% reduce 66%
2011-05-31 05:57:17,360 INFO org.apache.hadoop.mapred.JobClient
(main): Task Id : attempt_201105310525_0005_r_000000_2, Status :
FAILED
2011-05-31 05:57:27,440 INFO org.apache.hadoop.mapred.JobClient
(main): map 100% reduce 90%
2011-05-31 05:57:32,460 INFO org.apache.hadoop.mapred.JobClient
(main): map 100% reduce 91%
2011-05-31 05:57:33,465 INFO org.apache.hadoop.mapred.JobClient
(main): map 100% reduce 92%
2011-05-31 05:57:37,497 INFO org.apache.hadoop.mapred.JobClient
(main): map 100% reduce 94%
2011-05-31 05:57:42,517 INFO org.apache.hadoop.mapred.JobClient
(main): map 100% reduce 96%
2011-05-31 05:57:47,537 INFO org.apache.hadoop.mapred.JobClient
(main): map 100% reduce 97%
2011-05-31 05:57:48,542 INFO org.apache.hadoop.mapred.JobClient
(main): map 100% reduce 98%
2011-05-31 05:57:52,558 INFO org.apache.hadoop.mapred.JobClient
(main): map 100% reduce 99%
2011-05-31 05:57:56,576 ERROR org.apache.nutch.fetcher.Fetcher (main):
Fetcher: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1107)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1116)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
It appeared to me that the NPE happens during the serialization into
the crawldb - but I haven't been able to figure out why this happens.
Would anyone have ideas on this?
Cheers
Viksit