Hi Julien-

>From the link you provided (
http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial) for Nutch 1.x -
how and where is the crawled data stored?

Thanks!


On Wed, Nov 13, 2013 at 4:58 AM, Julien Nioche <
[email protected]> wrote:

> Just to add to what Markus said : see
> http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial
> The approach is the same for 2.x. Nutch is just a Hadoop application with a
> few scripts to make your life easier
>
> Julien
>
>
> On 13 November 2013 09:45, Markus Jelsma <[email protected]>
> wrote:
>
> > You can just install Hadoop on the cluster as you would have otherwise.
> > Then you can run the Nutch job file via the bin/nutch script on any
> Hadoop
> > client such as the jobtracker for example.
> >
> >
> >
> > -----Original message-----
> > > From:flo @ <[email protected]>
> > > Sent: Wednesday 13th November 2013 10:20
> > > To: [email protected]
> > > Subject: Nutch cluster
> > >
> > > Which is the best approach to setup a nutch cluster with multiple nutch
> > > instances running on different machines. Is there some kind of
> scheduler
> > > for nutch?
> > >
> > > I already configured a single nutch instance with HBase for storing the
> > > index in the background.
> > >
> > > Thanks
> > >
> > > flo
> > >
> >
>
>
>
> --
>
> Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>

Reply via email to