Hi all,
I'm getting errors similar to the ones mentioned on NUTCH-937
(https://issues.apache.org/jira/browse/NUTCH-937) when running Nutch
1.2 on Cloudera CDH 0.20 [1]. These look like [2]. I invoke the nutch
job like the following,
sudo -u hdfs hadoop jar /opt/nutch-1.2/nutch-1.2.job
org.apache.nutch.crawl.Injector
/user/root/nutch/crawl.lucene.apache.org/crawldb
/user/root/nutch/crawl.lucene.apache.org/urls/ -conf
/opt/nutch-1.2/conf/nutch-default.xml
-Dplugin.folders=/opt/nutch-1.2/plugins/
Looking at the documentation on the web, it seems that this is caused
due to a change in the way the job file is unpacked and the location
of the nutch plugins directory. I've tried the following steps and
none have worked for me,
(1) Added the following property to the hadoop-site.xml and made sure
that the job_201105091841_0010_conf.xml is picking it up.
<property>
<name>mapreduce.job.jar.unpack.pattern</name>
<value>(?:classes/|lib/|plugins/).*</value>
</property>
I also tried mapred.job.jar.unpack.pattern to no luck.
(2) I've added the -conf /opt/nutch-1.2/conf/nutch-default.xml
-Dplugin.folders=/opt/nutch-1.2/plugins/ parameters during the nutch
invocation, and that doesn't seem to get picked up either (same
error).
Has any one faced this issue and successfully dealt with it? What
would you recommend as the best way to fix this ?
Thanks!
Viksit
[1] = CDH3 =
Hadoop 0.20.2-cdh3u0
Subversion -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14
[2] = Error message =
11/05/09 20:44:27 INFO mapred.JobClient: Task Id :
attempt_201105091841_0010_m_000000_0, Status : FAILED
java.lang.RuntimeException: Error in configuring object
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 17 more
Caused by: java.lang.RuntimeException: x point
org.apache.nutch.net.URLNormalizer not found.
at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:122)
at
org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70)
... 22 more