> How should I define the Hue job so that it recognizes Nutch's .job jar file and/or make the CDH4 Hue consistent with the hadoop/hdfs shell commands? Could you try posting to hue and CDH4 user groups ? We dont promise compatibility across the several hadoop distributions out there. See https://issues.apache.org/jira/browse/NUTCH-1447
On Thu, Jun 13, 2013 at 7:39 AM, Byte Array <[email protected]> wrote: > Hello! > > I am trying to run a simple crawl with Nutch 1.6 on CDH4.2.1 on Centos 6.2 > cluster. > > First I had problems with > # hadoop jar apache-nutch-1.6.job org.apache.nutch.fetcher.Fetcher > /nutch/1.6/crawl/segments/20130613095319 > which was returning: > java.lang.RuntimeException: problem advancing post rec#0 > at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1183) > at > > org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:255) > at > > org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:251) > at > > org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:40) > at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:447) > Caused by: java.io.IOException: can't find class: > org.apache.nutch.protocol.ProtocolStatus because > org.apache.nutch.protocol.ProtocolStatus > at > > org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:206) > . . . > Also, I noticed inconsistency between the file system shown with hdfs dfs > -ls and the one shown in CDH4 Hue GUI. The former seems to simply create > the folders/files locally and is not aware of the ones I create through Hue > GUI. > Therefore, I suspected that the job is not properly running on the CDH4 > cluster and used Hue GUI to create /user/admin/Nutch-1.6 folder and > urls/seed.txt and upload the Nutch 1.6 .job file (previously configured and > built with ant in Eclipse). > When I submit the job through Hue it logs ClassNotFoundException, although > I properly defined path to the .job file on the hdfs and the class name in > that file: > ... > Failing Oozie Launcher, Main class [org.apache.nutch.crawl.Injector], > exception invoking main(), java.lang.ClassNotFoundException: Class > org.apache.nutch.crawl.Injector not found > java.lang.RuntimeException: java.lang.ClassNotFoundException: Class > org.apache.nutch.crawl.Injector not found > ... > How should I define the Hue job so that it recognizes Nutch's .job jar file > and/or make the CDH4 Hue consistent with the hadoop/hdfs shell commands? > This thread looks related: > http://www.mail-archive.com/[email protected]/msg07603.html > > > Thank you >

