Hi Andrzej,

  Thanks for the reply. Currently, I have deployed Hadoop/Nutch using
the instructions in the hadoop/nutch tutorial. Currently, I have copied
the nutch jars in my NUTCH_HOME directory. I tried copying the
nutch-xxxx.job to my lib directory, but that doesn't work too. 

  Do I need to set the CLASSPATH before I run bin/start-all.sh, or is it
something else? Sorry, I am new to Java development, so I don't know
what you mean by deploying something.

Thanks,

-vishal.

-----Original Message-----
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 12, 2006 12:06 PM
To: [email protected]
Subject: Re: ClassNotFoundException while using segread

Vishal Shah wrote:
> Hi,
>  
>    I am trying to use the dump option in the segread command to get a
> segment's dump. However, I see the ClassNotFound exception for
> SegmentReader$InputFormat. Has anyone else experienced this? How do I
> resolve it?
>  
> [EMAIL PROTECTED] search]$ bin/nutch readseg -dump
> crawl1/segments/20060908210708 crawl1/segments/20060908210708/gendump
> -nocontent -nofetch -noparse -noparsedata -noparsetext
> SegmentReader: dump segment: crawl1/segments/20060908210708
> Exception in thread "main" java.io.IOException: Job failed!
>         at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:363)
>         at
> org.apache.nutch.segment.SegmentReader.dump(SegmentReader.java:196)
>         at
> org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:533)
>  
>  
> [EMAIL PROTECTED] search]$ tail logs/nutch.log
> 2006-09-12 12:50:52,675 WARN  mapred.JobTracker - job init failed
> java.io.IOException: java.lang.ClassNotFoundException:
> org.apache.nutch.segment.SegmentReader$InputFormat
>   

How are you deploying Hadoop/Nutch? If you run just plain Hadoop 
cluster, without deploying Nutch jars, and then only submit Nutch job 
jar, then Hadoop cannot process input files that require custom 
InputFormats, because at this moment the TaskTracker's classloader 
doesn't yet have access to the InputFormat defined in the job jar.

A workaround is to deploy the nutch-xxxx.jar too, in addition to 
Hadoop-only jars. I believe this has been solved in the newer versions 
of Hadoop.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to