The mapred.map.tasks and mapred.reduce.tasks will define the approximate number of tasks per job. It is highly dependent upon the amount of data being processed as well. The mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum define the maximum number of tasks to run on a single tasktracker for map and reduce tasks.

When you say 20 jobs I am assuming you mean tasks. Also what type of hardware are you running this on, what are your memory settings, running in local or DFS mode?

Dennis

Alexander Aristov wrote:
Hi all

Can someone suggest me how to restrict number of jobs Nutch lauches in
hadoop when starts segment merger.

When I run generate, fetch, updatedb tasks Nutch starts about 6-10 Mapreduce
jobs (cluster of 2 datanodes) - actual value varies from task to task but
when the script start merging segments it lauches about 20 jobs and servers
get overloaded and crash. Nutch settings are primary default one.

How can I control the number of jobs?

best Regards
Alexander


Reply via email to