Hi Dennis, I am trying to fetch 1M urls at a time. Each machine has similar settings. I am pretty sure the problems are happening during the parse phase. I tried using -noParsing option during fetch, and then parsing using the parse command. The fetch works fine, but the parse stalls and fails sometimes.
-vishal. -----Original Message----- From: Dennis Kubes [mailto:[EMAIL PROTECTED] Sent: Friday, September 08, 2006 11:24 PM To: [email protected] Subject: Re: # of tasks executed in parallel How many urls are you fetching and does each machine have the same settings as below? Remember that number of fetchers is number of fetcher threads per task per machine. So you would be running 2 tasks per machine * 12 threads * 3 machines = 75 fetchers. Dennis Vishal Shah wrote: > Hi, > > I am using Nutch 0.9 for crawling. I recollect that > mapred.tasktracker.tasks.maximum can be used to control the max # of > tasks executed in parallel by a tasktracker. > > I am running a fetch with the following config: > > 3 machines > > My mapred-default.xml contains: > > mapred.map.tasks=13 > mapred.reduce.tasks=7 > mapred.tasktracker.tasks.maximum=4 > > I ran generate using -numFetchers=12, however while fetching I see that > only 2 tasks are running at a time on each machine (instead of 4). > > Any pointers? > > -vishal. > >
