Thanks lewis. i figured it out, but then i crawled using job file only and it ran successfully. bin/hadoop nutch-version.job org.apache.nutch.crawl.Crawl -params
We can run this way right? Regards, Som On Sat, Jul 7, 2012 at 3:48 PM, Lewis John Mcgibbney < [email protected]> wrote: > Hi, > > You need to have the nutch job file (with all configuration in there) > on your hadoop classpath. You then run hadoop -jar $nutch.jar > $nutch.class -params this shoudl be all that is required. > > Lewis > > On Sat, Jul 7, 2012 at 12:06 AM, shekhar sharma <[email protected]> > wrote: > > Hello All, > > I am trying to run nutch (1.5.0 binaries) on hadoop, but when i am seeing > > the db stats it shows me the following output: > > > > CrawlDb statistics start: finalCrawl/crawldb > > Statistics for CrawlDb: finalCrawl/crawldb > > TOTAL urls: 6 > > retry 1: 6 > > min score: 1.0 > > avg score: 1.0 > > max score: 1.0 > > status 1 (db_unfetched): 6 > > CrawlDb statistics: done > > > > What i did is copied entire files from Nutch conf directory to Hadoop > conf > > directory > > > > I have rename nutch-default.xml to nutch-site.xml. And also provided > > plugins.folder properties, plugins.includes, http.agent.name etc.. > > > > Crawling is successful, but when i am trying to dump the crawldb contents > > it shows me nothing and when i do stats on crawldb it tells me that > > nothing is fetched.. > > > > But when i do crawl using nutch only, everything is fine.. > > > > Any suggestion what is going wrong? > > > > Regards, > > Som > > > > -- > Lewis >

