Vishal Shah wrote:
Hi,
I am trying to use the dump option in the segread command to get a
segment's dump. However, I see the ClassNotFound exception for
SegmentReader$InputFormat. Has anyone else experienced this? How do I
resolve it?
[EMAIL PROTECTED] search]$ bin/nutch readseg -dump
crawl1/segments/20060908210708 crawl1/segments/20060908210708/gendump
-nocontent -nofetch -noparse -noparsedata -noparsetext
SegmentReader: dump segment: crawl1/segments/20060908210708
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:363)
        at
org.apache.nutch.segment.SegmentReader.dump(SegmentReader.java:196)
        at
org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:533)
[EMAIL PROTECTED] search]$ tail logs/nutch.log
2006-09-12 12:50:52,675 WARN  mapred.JobTracker - job init failed
java.io.IOException: java.lang.ClassNotFoundException:
org.apache.nutch.segment.SegmentReader$InputFormat

How are you deploying Hadoop/Nutch? If you run just plain Hadoop cluster, without deploying Nutch jars, and then only submit Nutch job jar, then Hadoop cannot process input files that require custom InputFormats, because at this moment the TaskTracker's classloader doesn't yet have access to the InputFormat defined in the job jar.

A workaround is to deploy the nutch-xxxx.jar too, in addition to Hadoop-only jars. I believe this has been solved in the newer versions of Hadoop.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to