or add $HADOOP/bin to your $PATH and use the nutch script in
$NUTCH_HOME/runtime/deploy/bin

Thanks lewis. i figured it out,
> but then i crawled using job file only and it ran successfully.
> bin/hadoop nutch-version.job org.apache.nutch.crawl.Crawl -params
>
> We can run this way right?
>
> Regards,
> Som
>
> On Sat, Jul 7, 2012 at 3:48 PM, Lewis John Mcgibbney <
> [email protected]> wrote:
>
> > Hi,
> >
> > You need to have the nutch job file (with all configuration in there)
> > on your hadoop classpath. You then run hadoop -jar $nutch.jar
> > $nutch.class -params this shoudl be all that is required.
> >
> > Lewis
> >
> > On Sat, Jul 7, 2012 at 12:06 AM, shekhar sharma <[email protected]>
> > wrote:
> > > Hello All,
> > > I am trying to run nutch (1.5.0 binaries) on hadoop, but when i am
> seeing
> > > the db stats it shows me the following output:
> > >
> > > CrawlDb statistics start: finalCrawl/crawldb
> > > Statistics for CrawlDb: finalCrawl/crawldb
> > > TOTAL urls:    6
> > > retry 1:    6
> > > min score:    1.0
> > > avg score:    1.0
> > > max score:    1.0
> > > status 1 (db_unfetched):    6
> > > CrawlDb statistics: done
> > >
> > > What i did is copied entire files from Nutch conf directory to Hadoop
> > conf
> > > directory
> > >
> > > I have rename nutch-default.xml to nutch-site.xml. And also provided
> > > plugins.folder properties, plugins.includes, http.agent.name etc..
> > >
> > > Crawling is successful, but when i am trying to dump the crawldb
> contents
> > > it shows me nothing and when i do stats on crawldb  it tells me that
> > > nothing is fetched..
> > >
> > > But when i do crawl using nutch only, everything is fine..
> > >
> > > Any suggestion what is going wrong?
> > >
> > > Regards,
> > > Som
> >
> >
> >
> > --
> > Lewis
> >
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to