Re: Nutch - Hadoop Help

Lewis John Mcgibbney Tue, 04 Feb 2014 01:42:34 -0800

https://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script



On Tue, Feb 4, 2014 at 7:04 AM, Manikandan Saravanan <
[email protected]> wrote:

> How do I run the crawl script on hadoop?
> --
> Manikandan Saravanan
> Architect - Technology
> TheSocialPeople <http://thesocialpeople.net>
>
> On 4 February 2014 at 1:28:39 am, Lewis John Mcgibbney (
> [email protected] <//[email protected]>) wrote:
>
> Hi Manikandan,
>
> On Mon, Feb 3, 2014 at 3:45 PM, <[email protected]>
> wrote:
>
> > And then, I'm running this:
> > $HADOOP_HOME/bin/hadoop jar /usr/local/nutch/nutch.job
> > org.apache.nutch.crawl.Crawler dmoz -dir /user/hduser/crawl -depth 3
> -topN
> > 5000
> >
>
> You're using the Crawler class. This is not advised at all and is now
> deprecated. There is no point in downloading the crawl script if you are
> going to use the Crawler class. I would suggest you using the crawl
> script.
>
>
> >
> > org.apache.gora.memory.store.MemStore as the Gora storage class.
> >
>
> Please don't use MemStore its implementation in Gora 0.3 is not thread
> safe
> and is only used for trivial tests. Please see the 2.x tutorial on the
> Nutch wiki for details of how to configure the supported Gora persistent
> data stores.
>
>
> Once you've used the crawl script, and configured your Nutch deployment
> job
> file, please get back to us with your results.
> Remeber you will always need to regenerate your Nutch job file if you make
> configuration changes to your Nutch deployment.
> hth
> Thanks
>
>


-- 
*Lewis*

Re: Nutch - Hadoop Help

Reply via email to