The mapred.map.tasks and mapred.reduce.tasks will define the approximate
number of tasks per job. It is highly dependent upon the amount of data
being processed as well. The mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum define the maximum number of
tasks to run on a single tasktracker for map and reduce tasks.
When you say 20 jobs I am assuming you mean tasks. Also what type of
hardware are you running this on, what are your memory settings, running
in local or DFS mode?
Dennis
Alexander Aristov wrote:
Hi all
Can someone suggest me how to restrict number of jobs Nutch lauches in
hadoop when starts segment merger.
When I run generate, fetch, updatedb tasks Nutch starts about 6-10 Mapreduce
jobs (cluster of 2 datanodes) - actual value varies from task to task but
when the script start merging segments it lauches about 20 jobs and servers
get overloaded and crash. Nutch settings are primary default one.
How can I control the number of jobs?
best Regards
Alexander