Running generate with -numFetchers will create n number of reduce tasks
for as generator output. This is used as input for the fetchers so when
the fetcher runs it will break the job into n number of tasks. That
doesn't mean that they will *all* run in parallel. That is dependent on
the max tasks per server and the number of total servers running the job
in the hadoop cluster.
But say you have 10 machines in your cluster and you do a generate
-numFetchers 10 then they should all run in parallel.
Dennis
[EMAIL PROTECTED] wrote:
Hi,
I was able to dig out a related message/threads from "only" 3 years ago:
http://markmail.org/message/dp6a6isdboz46wez#query:+page:1+mid:o7p2iqqp66zumwcs+state:results
Is the story with running generate with -numFetchers N and running N parallel
fetch jobs still true?
Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
From: Tomislav Poljak <[EMAIL PROTECTED]>
To: [email protected]
Sent: Thursday, April 10, 2008 2:57:04 PM
Subject: Parallel operations in fetch
Hi,
is there a way to do some of these operations in parallel safely:
generate, fetch, parse and updatedb (and if so, how)?
thanks,
Tomislav