Yes you are right,I have just added the following line in hadoop script in my nutch installation
before it build classpath with hadoop-*.jar May be is not the proper place but I got my segment merged
for f in $HADOOP_HOME/nutch-*.jar; do
CLASSPATH=${CLASSPATH}:$f;
done
Thanks Andrzej,
-Corrado
Andrzej Bialecki wrote:
zzcgiacomini wrote:Hi all,I have two segments test/segments/20060511101525 and test/segments/20060523 Which I would like to merge into one new segment using "mergesegs" so far without success.- nutch mergesegs test/segments/test/segments/20060526095530 test/segments/20060511101525 test/segments/20060523095535- nutch mergesegs test/segments/20060526095530 -dir test/segments whatever I try I always get Exception raise at the same place : Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)at org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:596) at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:644)and the job tracker logs the following lines : 060526 105956 job init failedjava.io.IOException: java.lang.ClassNotFoundException: org.apache.nutch.segment.SegmentMerger$ObjectInputFormat at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:119) at org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:293)at java.lang.Thread.run(Thread.java:595) 060526 105959 Server connection on port 9011 from 10.234.57.38: exiting Any Help ?This looks like an (already known) problem with Hadoop and the classloader - input and output format classes need to be deployed on the tasktracker, and not just submitted in the *.job jar.Simply speaking, you need to put the nutch*.jar on the classpath of all tasktrackers.
