I am looking to do the same thing. If anyone finds a way, please post here.
Thanks, Jordan On Sun, Aug 10, 2008 at 11:31 AM, soila <[EMAIL PROTECTED]> wrote: > > Hi Andrzej, > > I am experiencing similar problems distributing the fetch across multiple > nodes. I am crawling a single host in an intranet and I would like to know > how I can modify nutch's behavior so that it distributes the search over > multiple nodes. > > Soila > > Andrzej Bialecki wrote: > > > > brainstorm wrote: > >> Sure, I tried with mapred.map.tasks and mapred.reduce.tasks with > >> values 2 and 1 respectively *in the past*, same results. Right now, I > >> have 32 for both: same results as those settings are just a hint for > >> nutch. > >> > >> Regarding number of threads *per host* I tried with 10 and 20 in the > >> past, same results. > > > > Indeed, the default number of maps and reduces can be changed for any > > particular job - the number of maps is adjusted according to the number > > of input splits (InputFormat.getSplits()), and the number of reduces can > > be adjusted programmatically in the application. > > > > Back to your issue: I suspect that your fetchlist is highly homogenous, > > i.e. contains urls from a single host. Nutch makes sure that all urls > > from a single host end up in a single map task, to ensure the politeness > > settings, so that's probably why you see only a single map task fetching > > all urls. > > > > > > -- > > Best regards, > > Andrzej Bialecki <>< > > ___. ___ ___ ___ _ _ __________________________________ > > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > > ___|||__|| \| || | Embedded Unix, System Integration > > http://www.sigram.com Contact: info at sigram dot com > > > > > > > > -- > View this message in context: > http://www.nabble.com/Distributed-fetching-only-happening-in-one-node---tp18429531p18915705.html > Sent from the Nutch - User mailing list archive at Nabble.com. > >
