Vishal Shah wrote: > Hi, > > I am trying to use the dump option in the segread command to get a > segment's dump. However, I see the ClassNotFound exception for > SegmentReader$InputFormat. Has anyone else experienced this? How do I > resolve it? > > [EMAIL PROTECTED] search]$ bin/nutch readseg -dump > crawl1/segments/20060908210708 crawl1/segments/20060908210708/gendump > -nocontent -nofetch -noparse -noparsedata -noparsetext > SegmentReader: dump segment: crawl1/segments/20060908210708 > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:363) > at > org.apache.nutch.segment.SegmentReader.dump(SegmentReader.java:196) > at > org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:533) > > > [EMAIL PROTECTED] search]$ tail logs/nutch.log > 2006-09-12 12:50:52,675 WARN mapred.JobTracker - job init failed > java.io.IOException: java.lang.ClassNotFoundException: > org.apache.nutch.segment.SegmentReader$InputFormat >
How are you deploying Hadoop/Nutch? If you run just plain Hadoop cluster, without deploying Nutch jars, and then only submit Nutch job jar, then Hadoop cannot process input files that require custom InputFormats, because at this moment the TaskTracker's classloader doesn't yet have access to the InputFormat defined in the job jar. A workaround is to deploy the nutch-xxxx.jar too, in addition to Hadoop-only jars. I believe this has been solved in the newer versions of Hadoop. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
