Running generate with -numFetchers will create n number of reduce tasks for as generator output. This is used as input for the fetchers so when the fetcher runs it will break the job into n number of tasks. That doesn't mean that they will *all* run in parallel. That is dependent on the max tasks per server and the number of total servers running the job in the hadoop cluster.

But say you have 10 machines in your cluster and you do a generate -numFetchers 10 then they should all run in parallel.

Dennis

[EMAIL PROTECTED] wrote:
Hi,

I was able to dig out a related message/threads from "only" 3 years ago:

http://markmail.org/message/dp6a6isdboz46wez#query:+page:1+mid:o7p2iqqp66zumwcs+state:results

Is the story with running generate with -numFetchers N and running N parallel 
fetch jobs still true?

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Tomislav Poljak <[EMAIL PROTECTED]>
To: [email protected]
Sent: Thursday, April 10, 2008 2:57:04 PM
Subject: Parallel operations in fetch

Hi,
is there a way to do some of these operations in parallel safely:
generate, fetch, parse and updatedb (and if so, how)?

thanks,
         Tomislav




Reply via email to