Re: ClassNotFoundException while using segread

Andrzej Bialecki Mon, 11 Sep 2006 23:36:35 -0700

Vishal Shah wrote:

Hi,

I am trying to use the dump option in the segread command to get a

segment's dump. However, I see the ClassNotFound exception for
SegmentReader$InputFormat. Has anyone else experienced this? How do I
resolve it?

[EMAIL PROTECTED] search]$ bin/nutch readseg -dump

crawl1/segments/20060908210708 crawl1/segments/20060908210708/gendump
-nocontent -nofetch -noparse -noparsedata -noparsetext
SegmentReader: dump segment: crawl1/segments/20060908210708
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:363)
        at
org.apache.nutch.segment.SegmentReader.dump(SegmentReader.java:196)
        at
org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:533)

[EMAIL PROTECTED] search]$ tail logs/nutch.log

2006-09-12 12:50:52,675 WARN  mapred.JobTracker - job init failed
java.io.IOException: java.lang.ClassNotFoundException:
org.apache.nutch.segment.SegmentReader$InputFormat

How are you deploying Hadoop/Nutch? If you run just plain Hadoopcluster, without deploying Nutch jars, and then only submit Nutch jobjar, then Hadoop cannot process input files that require customInputFormats, because at this moment the TaskTracker's classloaderdoesn't yet have access to the InputFormat defined in the job jar.

A workaround is to deploy the nutch-xxxx.jar too, in addition toHadoop-only jars. I believe this has been solved in the newer versionsof Hadoop.


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: ClassNotFoundException while using segread

Reply via email to