Re: Nothing is fetched when running nutch on Hadoop

shekhar sharma Sat, 07 Jul 2012 11:27:04 -0700

Thanks lewis. i figured it out,
but then i crawled using job file only and it ran successfully.
bin/hadoop nutch-version.job org.apache.nutch.crawl.Crawl -params


We can run this way right?

Regards,
Som

On Sat, Jul 7, 2012 at 3:48 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> Hi,
>
> You need to have the nutch job file (with all configuration in there)
> on your hadoop classpath. You then run hadoop -jar $nutch.jar
> $nutch.class -params this shoudl be all that is required.
>
> Lewis
>
> On Sat, Jul 7, 2012 at 12:06 AM, shekhar sharma <[email protected]>
> wrote:
> > Hello All,
> > I am trying to run nutch (1.5.0 binaries) on hadoop, but when i am seeing
> > the db stats it shows me the following output:
> >
> > CrawlDb statistics start: finalCrawl/crawldb
> > Statistics for CrawlDb: finalCrawl/crawldb
> > TOTAL urls:    6
> > retry 1:    6
> > min score:    1.0
> > avg score:    1.0
> > max score:    1.0
> > status 1 (db_unfetched):    6
> > CrawlDb statistics: done
> >
> > What i did is copied entire files from Nutch conf directory to Hadoop
> conf
> > directory
> >
> > I have rename nutch-default.xml to nutch-site.xml. And also provided
> > plugins.folder properties, plugins.includes, http.agent.name etc..
> >
> > Crawling is successful, but when i am trying to dump the crawldb contents
> > it shows me nothing and when i do stats on crawldb  it tells me that
> > nothing is fetched..
> >
> > But when i do crawl using nutch only, everything is fine..
> >
> > Any suggestion what is going wrong?
> >
> > Regards,
> > Som
>
>
>
> --
> Lewis
>

Re: Nothing is fetched when running nutch on Hadoop

Reply via email to