I have some additional question to setup a cluster: If I want a continuous crawling, I create a nutch script with an endless loop? Shall I run nutch instances and the hbase db on different hadoop clusters? If I want to run more nutch jobs simultaneously shall I start the nutch script several times?
2013/11/14 A Laxmi <[email protected]> > Hi Julien- > > From the link you provided ( > http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial) for Nutch 1.x > - > how and where is the crawled data stored? > > Thanks! > > > On Wed, Nov 13, 2013 at 4:58 AM, Julien Nioche < > [email protected]> wrote: > > > Just to add to what Markus said : see > > http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial > > The approach is the same for 2.x. Nutch is just a Hadoop application > with a > > few scripts to make your life easier > > > > Julien > > > > > > On 13 November 2013 09:45, Markus Jelsma <[email protected]> > > wrote: > > > > > You can just install Hadoop on the cluster as you would have otherwise. > > > Then you can run the Nutch job file via the bin/nutch script on any > > Hadoop > > > client such as the jobtracker for example. > > > > > > > > > > > > -----Original message----- > > > > From:flo @ <[email protected]> > > > > Sent: Wednesday 13th November 2013 10:20 > > > > To: [email protected] > > > > Subject: Nutch cluster > > > > > > > > Which is the best approach to setup a nutch cluster with multiple > nutch > > > > instances running on different machines. Is there some kind of > > scheduler > > > > for nutch? > > > > > > > > I already configured a single nutch instance with HBase for storing > the > > > > index in the background. > > > > > > > > Thanks > > > > > > > > flo > > > > > > > > > > > > > > > -- > > > > Open Source Solutions for Text Engineering > > > > http://digitalpebble.blogspot.com/ > > http://www.digitalpebble.com > > http://twitter.com/digitalpebble > > >

