Vishal Shah wrote:
Hi Andrei,
I am running some experiments to figure out what numThreads param to
use while fetching on my machine. I made the mistake of putting the # of
map/reduce tasks in hadoop-site.xml and not in mapred-default.xml,
however I can clearly see a change in performace for different numbers
of threads (I tested using 5 different options, ranging from 10 to
2000).
I was wondering why I am seeing these performance changes even though
the number of reduce parts is only 2 for all the experiments. Also, how
is the number of fetcher threads param used during generate related to
the numthreads param used during fetch?
Well, you will always run as many fetching (map) tasks as many parts you
created when running Generator's reduce phase. Now, each fetching task
can run multiple fetching threads in parallel ... so, as you increase
the number of threads your fetching performance will likely increase
(unless you face some other limits, like the blocked addresses and your
bandwidth limits).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com