zzcgiacomini wrote:
Hi all,
I have two segments test/segments/20060511101525 and
test/segments/20060523
Which I would like to merge into one new segment using "mergesegs" so
far without success.
- nutch mergesegs test/segments/test/segments/20060526095530
test/segments/20060511101525 test/segments/20060523095535
- nutch mergesegs test/segments/20060526095530 -dir test/segments
whatever I try I always get Exception raise at the same place :
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
at
org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:596)
at
org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:644)
and the job tracker logs the following lines :
060526 105956 job init failed
java.io.IOException: java.lang.ClassNotFoundException:
org.apache.nutch.segment.SegmentMerger$ObjectInputFormat
at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:119)
at
org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:293)
at java.lang.Thread.run(Thread.java:595)
060526 105959 Server connection on port 9011 from 10.234.57.38: exiting
Any Help ?
This looks like an (already known) problem with Hadoop and the
classloader - input and output format classes need to be deployed on the
tasktracker, and not just submitted in the *.job jar.
Simply speaking, you need to put the nutch*.jar on the classpath of all
tasktrackers.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com