Thanks, Steve I'am using Nutch 1.1, and I installed it following this: http://wiki.apache.org/nutch/NutchHadoopTutorial.But I did not see any hadoop-site.xml file. I used grep to see anything related with 'task' (see bellow). Besides, the "crawldb crawl/crawldb" job uses more mapreduce tasks, usually 4, while other jobs uses only 2.Any Idea? b...@nutch03:~/nutch/search$ grep task conf/*conf/capacity-scheduler.xml: <!-- The default configuration settings for the capacity task scheduler -->conf/domain-suffixes.xml: <!-- ke : http://www.kenic.or.ke/index.php?option=com_content&task=view&id=117&Itemid=145-->conf/domain-suffixes.xml: <!-- TASK geographical domains (www.task.gda.pl/uslugi/dns)-->conf/hadoop-policy.xml: <description>ACL for InterTrackerProtocol, used by the tasktrackers to conf/hadoop-policy.xml: <name>security.task.umbilical.protocol.acl</name>conf/hadoop-policy.xml: tasks to communicate with the parent tasktracker. conf/mapred-site.xml: reduce task.conf/mapred-site.xml: <name>mapred.map.tasks</name>conf/mapred-site.xml: define mapred.map tasks to be number of slave hostsconf/mapred-site.xml: <name>mapred.reduce.tasks</name>conf/mapred-site.xml: define mapred.reduce tasks to be number of slave hosts Dennis
--- On Tue, 10/5/10, Steve Cohen <[email protected]> wrote: From: Steve Cohen <[email protected]> Subject: Re: need a larger map task number To: [email protected] Date: Tuesday, October 5, 2010, 9:40 PM For nutch, I found that updating the values in hadoop-site.xml was enough, though I also set values for mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum. On Tue, Oct 5, 2010 at 9:24 AM, Dennis <[email protected]> wrote: > Hi, all > My "fetch" job uses only 2 map tasks and 2 reduce tasks although I > configured "mapred.map.tasks" and "mapred.reduce.tasks" in "mapreduce.xml" > to "32", while I need it run faster.How can I make nutch to use more map and > reduce tasks when it's fetching? > Dennis > > >

