How many urls are you fetching and does each machine have the same
settings as below?
Remember that number of fetchers is number of fetcher threads per task
per machine. So you would be running 2 tasks per machine * 12 threads *
3 machines = 75 fetchers.
Dennis
Vishal Shah wrote:
Hi,
I am using Nutch 0.9 for crawling. I recollect that
mapred.tasktracker.tasks.maximum can be used to control the max # of
tasks executed in parallel by a tasktracker.
I am running a fetch with the following config:
3 machines
My mapred-default.xml contains:
mapred.map.tasks=13
mapred.reduce.tasks=7
mapred.tasktracker.tasks.maximum=4
I ran generate using -numFetchers=12, however while fetching I see that
only 2 tasks are running at a time on each machine (instead of 4).
Any pointers?
-vishal.