or add $HADOOP/bin to your $PATH and use the nutch script in $NUTCH_HOME/runtime/deploy/bin
Thanks lewis. i figured it out, > but then i crawled using job file only and it ran successfully. > bin/hadoop nutch-version.job org.apache.nutch.crawl.Crawl -params > > We can run this way right? > > Regards, > Som > > On Sat, Jul 7, 2012 at 3:48 PM, Lewis John Mcgibbney < > [email protected]> wrote: > > > Hi, > > > > You need to have the nutch job file (with all configuration in there) > > on your hadoop classpath. You then run hadoop -jar $nutch.jar > > $nutch.class -params this shoudl be all that is required. > > > > Lewis > > > > On Sat, Jul 7, 2012 at 12:06 AM, shekhar sharma <[email protected]> > > wrote: > > > Hello All, > > > I am trying to run nutch (1.5.0 binaries) on hadoop, but when i am > seeing > > > the db stats it shows me the following output: > > > > > > CrawlDb statistics start: finalCrawl/crawldb > > > Statistics for CrawlDb: finalCrawl/crawldb > > > TOTAL urls: 6 > > > retry 1: 6 > > > min score: 1.0 > > > avg score: 1.0 > > > max score: 1.0 > > > status 1 (db_unfetched): 6 > > > CrawlDb statistics: done > > > > > > What i did is copied entire files from Nutch conf directory to Hadoop > > conf > > > directory > > > > > > I have rename nutch-default.xml to nutch-site.xml. And also provided > > > plugins.folder properties, plugins.includes, http.agent.name etc.. > > > > > > Crawling is successful, but when i am trying to dump the crawldb > contents > > > it shows me nothing and when i do stats on crawldb it tells me that > > > nothing is fetched.. > > > > > > But when i do crawl using nutch only, everything is fine.. > > > > > > Any suggestion what is going wrong? > > > > > > Regards, > > > Som > > > > > > > > -- > > Lewis > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

