Re: Nothing is fetched when running nutch on Hadoop

Lewis John Mcgibbney Sat, 07 Jul 2012 03:19:11 -0700

Hi,

You need to have the nutch job file (with all configuration in there)
on your hadoop classpath. You then run hadoop -jar $nutch.jar
$nutch.class -params this shoudl be all that is required.


Lewis

On Sat, Jul 7, 2012 at 12:06 AM, shekhar sharma <[email protected]> wrote:
> Hello All,
> I am trying to run nutch (1.5.0 binaries) on hadoop, but when i am seeing
> the db stats it shows me the following output:
>
> CrawlDb statistics start: finalCrawl/crawldb
> Statistics for CrawlDb: finalCrawl/crawldb
> TOTAL urls:    6
> retry 1:    6
> min score:    1.0
> avg score:    1.0
> max score:    1.0
> status 1 (db_unfetched):    6
> CrawlDb statistics: done
>
> What i did is copied entire files from Nutch conf directory to Hadoop conf
> directory
>
> I have rename nutch-default.xml to nutch-site.xml. And also provided
> plugins.folder properties, plugins.includes, http.agent.name etc..
>
> Crawling is successful, but when i am trying to dump the crawldb contents
> it shows me nothing and when i do stats on crawldb  it tells me that
> nothing is fetched..
>
> But when i do crawl using nutch only, everything is fine..
>
> Any suggestion what is going wrong?
>
> Regards,
> Som



-- 
Lewis

Re: Nothing is fetched when running nutch on Hadoop

Reply via email to