https://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script
On Tue, Feb 4, 2014 at 7:04 AM, Manikandan Saravanan < manikan...@thesocialpeople.net> wrote: > How do I run the crawl script on hadoop? > -- > Manikandan Saravanan > Architect - Technology > TheSocialPeople <http://thesocialpeople.net> > > On 4 February 2014 at 1:28:39 am, Lewis John Mcgibbney ( > lewis.mcgibb...@gmail.com <//lewis.mcgibb...@gmail.com>) wrote: > > Hi Manikandan, > > On Mon, Feb 3, 2014 at 3:45 PM, <user-digest-h...@nutch.apache.org> > wrote: > > > And then, I'm running this: > > $HADOOP_HOME/bin/hadoop jar /usr/local/nutch/nutch.job > > org.apache.nutch.crawl.Crawler dmoz -dir /user/hduser/crawl -depth 3 > -topN > > 5000 > > > > You're using the Crawler class. This is not advised at all and is now > deprecated. There is no point in downloading the crawl script if you are > going to use the Crawler class. I would suggest you using the crawl > script. > > > > > > org.apache.gora.memory.store.MemStore as the Gora storage class. > > > > Please don't use MemStore its implementation in Gora 0.3 is not thread > safe > and is only used for trivial tests. Please see the 2.x tutorial on the > Nutch wiki for details of how to configure the supported Gora persistent > data stores. > > > Once you've used the crawl script, and configured your Nutch deployment > job > file, please get back to us with your results. > Remeber you will always need to regenerate your Nutch job file if you make > configuration changes to your Nutch deployment. > hth > Thanks > > -- *Lewis*