Re: -numFetchers in generate command

Andrzej Bialecki Fri, 25 Aug 2006 05:17:44 -0700

Vishal Shah wrote:

Hi Andrei,


   I am running some experiments to figure out what numThreads param to
use while fetching on my machine. I made the mistake of putting the # of
map/reduce tasks in hadoop-site.xml and not in mapred-default.xml,
however I can clearly see a change in performace for different numbers
of threads (I tested using 5 different options, ranging from 10 to
2000).

  I was wondering why I am seeing these performance changes even though
the number of reduce parts is only 2 for all the experiments. Also, how
is the number of fetcher threads param used during generate related to
the numthreads param used during fetch?

Well, you will always run as many fetching (map) tasks as many parts youcreated when running Generator's reduce phase. Now, each fetching taskcan run multiple fetching threads in parallel ... so, as you increasethe number of threads your fetching performance will likely increase(unless you face some other limits, like the blocked addresses and yourbandwidth limits).


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: -numFetchers in generate command

Reply via email to