Vishal Shah wrote:
Hi,
I am trying to use the dump option in the segread command to get a
segment's dump. However, I see the ClassNotFound exception for
SegmentReader$InputFormat. Has anyone else experienced this? How do I
resolve it?
[EMAIL PROTECTED] search]$ bin/nutch readseg -dump
crawl1/segments/20060908210708 crawl1/segments/20060908210708/gendump
-nocontent -nofetch -noparse -noparsedata -noparsetext
SegmentReader: dump segment: crawl1/segments/20060908210708
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:363)
at
org.apache.nutch.segment.SegmentReader.dump(SegmentReader.java:196)
at
org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:533)
[EMAIL PROTECTED] search]$ tail logs/nutch.log
2006-09-12 12:50:52,675 WARN mapred.JobTracker - job init failed
java.io.IOException: java.lang.ClassNotFoundException:
org.apache.nutch.segment.SegmentReader$InputFormat
How are you deploying Hadoop/Nutch? If you run just plain Hadoop
cluster, without deploying Nutch jars, and then only submit Nutch job
jar, then Hadoop cannot process input files that require custom
InputFormats, because at this moment the TaskTracker's classloader
doesn't yet have access to the InputFormat defined in the job jar.
A workaround is to deploy the nutch-xxxx.jar too, in addition to
Hadoop-only jars. I believe this has been solved in the newer versions
of Hadoop.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com