I am looking to do the same thing. If anyone finds a way, please post here.

Thanks,
Jordan

On Sun, Aug 10, 2008 at 11:31 AM, soila <[EMAIL PROTECTED]> wrote:

>
> Hi Andrzej,
>
> I am experiencing similar problems distributing the fetch across multiple
> nodes. I am crawling a single host in an intranet and I would like to know
> how I can modify nutch's behavior so that it distributes the search over
> multiple nodes.
>
> Soila
>
> Andrzej Bialecki wrote:
> >
> > brainstorm wrote:
> >> Sure, I tried with mapred.map.tasks and mapred.reduce.tasks with
> >> values 2 and 1 respectively *in the past*, same results. Right now, I
> >> have 32 for both: same results as those settings are just a hint for
> >> nutch.
> >>
> >> Regarding number of threads *per host* I tried with 10 and 20 in the
> >> past, same results.
> >
> > Indeed, the default number of maps and reduces can be changed for any
> > particular job - the number of maps is adjusted according to the number
> > of input splits (InputFormat.getSplits()), and the number of reduces can
> > be adjusted programmatically in the application.
> >
> > Back to your issue: I suspect that your fetchlist is highly homogenous,
> > i.e. contains urls from a single host. Nutch makes sure that all urls
> > from a single host end up in a single map task, to ensure the politeness
> > settings, so that's probably why you see only a single map task fetching
> > all urls.
> >
> >
> > --
> > Best regards,
> > Andrzej Bialecki     <><
> >   ___. ___ ___ ___ _ _   __________________________________
> > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > http://www.sigram.com  Contact: info at sigram dot com
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Distributed-fetching-only-happening-in-one-node---tp18429531p18915705.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>

Reply via email to