> How should I define the Hue job so that it recognizes Nutch's .job jar
file and/or make the CDH4 Hue consistent with the hadoop/hdfs shell
commands?
Could you try posting to hue and CDH4 user groups ? We dont promise
compatibility across the several hadoop distributions out there.
See https://issues.apache.org/jira/browse/NUTCH-1447

On Thu, Jun 13, 2013 at 7:39 AM, Byte Array <[email protected]> wrote:

> Hello!
>
> I am trying to run a simple crawl with Nutch 1.6 on CDH4.2.1 on Centos 6.2
> cluster.
>
> First I had problems with
> # hadoop jar apache-nutch-1.6.job org.apache.nutch.fetcher.Fetcher
> /nutch/1.6/crawl/segments/20130613095319
> which was returning:
>  java.lang.RuntimeException: problem advancing post rec#0
> at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1183)
> at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:255)
> at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:251)
> at
>
> org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:40)
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:447)
> Caused by: java.io.IOException: can't find class:
> org.apache.nutch.protocol.ProtocolStatus because
> org.apache.nutch.protocol.ProtocolStatus
> at
>
> org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:206)
> . . .
> Also, I noticed inconsistency between the file system shown with hdfs dfs
> -ls and the one shown in CDH4 Hue GUI. The former seems to simply create
> the folders/files locally and is not aware of the ones I create through Hue
> GUI.
> Therefore, I suspected that the job is not properly running on the CDH4
> cluster and used Hue GUI to create /user/admin/Nutch-1.6 folder and
> urls/seed.txt and upload the Nutch 1.6 .job file (previously configured and
> built with ant in Eclipse).
> When I submit the job through Hue it logs ClassNotFoundException, although
> I properly defined path to the .job file on the hdfs and the class name in
> that file:
> ...
> Failing Oozie Launcher, Main class [org.apache.nutch.crawl.Injector],
> exception invoking main(), java.lang.ClassNotFoundException: Class
> org.apache.nutch.crawl.Injector not found
> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
> org.apache.nutch.crawl.Injector not found
> ...
> How should I define the Hue job so that it recognizes Nutch's .job jar file
> and/or make the CDH4 Hue consistent with the hadoop/hdfs shell commands?
> This thread looks related:
> http://www.mail-archive.com/[email protected]/msg07603.html
>
>
> Thank you
>

Reply via email to