https://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script
On Tue, Feb 4, 2014 at 7:04 AM, Manikandan Saravanan < [email protected]> wrote: > How do I run the crawl script on hadoop? > -- > Manikandan Saravanan > Architect - Technology > TheSocialPeople <http://thesocialpeople.net> > > On 4 February 2014 at 1:28:39 am, Lewis John Mcgibbney ( > [email protected] <//[email protected]>) wrote: > > Hi Manikandan, > > On Mon, Feb 3, 2014 at 3:45 PM, <[email protected]> > wrote: > > > And then, I'm running this: > > $HADOOP_HOME/bin/hadoop jar /usr/local/nutch/nutch.job > > org.apache.nutch.crawl.Crawler dmoz -dir /user/hduser/crawl -depth 3 > -topN > > 5000 > > > > You're using the Crawler class. This is not advised at all and is now > deprecated. There is no point in downloading the crawl script if you are > going to use the Crawler class. I would suggest you using the crawl > script. > > > > > > org.apache.gora.memory.store.MemStore as the Gora storage class. > > > > Please don't use MemStore its implementation in Gora 0.3 is not thread > safe > and is only used for trivial tests. Please see the 2.x tutorial on the > Nutch wiki for details of how to configure the supported Gora persistent > data stores. > > > Once you've used the crawl script, and configured your Nutch deployment > job > file, please get back to us with your results. > Remeber you will always need to regenerate your Nutch job file if you make > configuration changes to your Nutch deployment. > hth > Thanks > > -- *Lewis*

