mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum
What's the difference between mapred.tasktracker.reduce.tasks.maximum and mapred.map.tasks ** I want my data to be split against only 10 mappers in the entire cluster. Can I do that using one of the above parameters?
Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum
Hi Mohit mapred.tasktracker.reduce(map).tasks.maximum means how many reduce(map) slot(s) you can have on each tasktracker. mapred.job.reduce(maps) means default number of reduce (map) tasks your job will has. To set the number of mappers in your application. You can write like this: *configuration.setNumMapTasks(the number you want);* Chen Actually, you can just use configuration.set() On Fri, Mar 9, 2012 at 6:42 PM, Mohit Anchlia mohitanch...@gmail.comwrote: What's the difference between mapred.tasktracker.reduce.tasks.maximum and mapred.map.tasks ** I want my data to be split against only 10 mappers in the entire cluster. Can I do that using one of the above parameters?
Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum
What's the difference between setNumMapTasks and mapred.map.tasks? On Fri, Mar 9, 2012 at 5:00 PM, Chen He airb...@gmail.com wrote: Hi Mohit mapred.tasktracker.reduce(map).tasks.maximum means how many reduce(map) slot(s) you can have on each tasktracker. mapred.job.reduce(maps) means default number of reduce (map) tasks your job will has. To set the number of mappers in your application. You can write like this: *configuration.setNumMapTasks(the number you want);* Chen Actually, you can just use configuration.set() On Fri, Mar 9, 2012 at 6:42 PM, Mohit Anchlia mohitanch...@gmail.com wrote: What's the difference between mapred.tasktracker.reduce.tasks.maximum and mapred.map.tasks ** I want my data to be split against only 10 mappers in the entire cluster. Can I do that using one of the above parameters?
Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum
if you do not specify setNumMapTasks, by default, system will use the number you configured for mapred.map.tasks in the conf/mapred-site.xml file. On Fri, Mar 9, 2012 at 7:19 PM, Mohit Anchlia mohitanch...@gmail.comwrote: What's the difference between setNumMapTasks and mapred.map.tasks? On Fri, Mar 9, 2012 at 5:00 PM, Chen He airb...@gmail.com wrote: Hi Mohit mapred.tasktracker.reduce(map).tasks.maximum means how many reduce(map) slot(s) you can have on each tasktracker. mapred.job.reduce(maps) means default number of reduce (map) tasks your job will has. To set the number of mappers in your application. You can write like this: *configuration.setNumMapTasks(the number you want);* Chen Actually, you can just use configuration.set() On Fri, Mar 9, 2012 at 6:42 PM, Mohit Anchlia mohitanch...@gmail.com wrote: What's the difference between mapred.tasktracker.reduce.tasks.maximum and mapred.map.tasks ** I want my data to be split against only 10 mappers in the entire cluster. Can I do that using one of the above parameters?
Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum
Is this system parameter too? Or can I specify as mapred.map.tasks? I am using pig. On Fri, Mar 9, 2012 at 6:19 PM, Chen He airb...@gmail.com wrote: if you do not specify setNumMapTasks, by default, system will use the number you configured for mapred.map.tasks in the conf/mapred-site.xml file. On Fri, Mar 9, 2012 at 7:19 PM, Mohit Anchlia mohitanch...@gmail.com wrote: What's the difference between setNumMapTasks and mapred.map.tasks? On Fri, Mar 9, 2012 at 5:00 PM, Chen He airb...@gmail.com wrote: Hi Mohit mapred.tasktracker.reduce(map).tasks.maximum means how many reduce(map) slot(s) you can have on each tasktracker. mapred.job.reduce(maps) means default number of reduce (map) tasks your job will has. To set the number of mappers in your application. You can write like this: *configuration.setNumMapTasks(the number you want);* Chen Actually, you can just use configuration.set() On Fri, Mar 9, 2012 at 6:42 PM, Mohit Anchlia mohitanch...@gmail.com wrote: What's the difference between mapred.tasktracker.reduce.tasks.maximum and mapred.map.tasks ** I want my data to be split against only 10 mappers in the entire cluster. Can I do that using one of the above parameters?
Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum
Mohit It is a job level config parameter. For plain map reduce jobs you can set the same through CLI as hadoop jar ... -D mapred.map.tasks=n You should be able to do it pig as well. However the number of map tasks for a job are governed by the input splits and the Input Format you are using. So setting this config parameter doesn't guarantee that your job would have the specified number of map tasks. Normally you set the number of reduce tasks this way for your job, mapred.reduce.tasks=n Hope it helps Regards Bejoy K S From handheld, Please excuse typos. -Original Message- From: Mohit Anchlia mohitanch...@gmail.com Date: Fri, 9 Mar 2012 20:34:33 To: common-user@hadoop.apache.org Reply-To: common-user@hadoop.apache.org Subject: Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum Is this system parameter too? Or can I specify as mapred.map.tasks? I am using pig. On Fri, Mar 9, 2012 at 6:19 PM, Chen He airb...@gmail.com wrote: if you do not specify setNumMapTasks, by default, system will use the number you configured for mapred.map.tasks in the conf/mapred-site.xml file. On Fri, Mar 9, 2012 at 7:19 PM, Mohit Anchlia mohitanch...@gmail.com wrote: What's the difference between setNumMapTasks and mapred.map.tasks? On Fri, Mar 9, 2012 at 5:00 PM, Chen He airb...@gmail.com wrote: Hi Mohit mapred.tasktracker.reduce(map).tasks.maximum means how many reduce(map) slot(s) you can have on each tasktracker. mapred.job.reduce(maps) means default number of reduce (map) tasks your job will has. To set the number of mappers in your application. You can write like this: *configuration.setNumMapTasks(the number you want);* Chen Actually, you can just use configuration.set() On Fri, Mar 9, 2012 at 6:42 PM, Mohit Anchlia mohitanch...@gmail.com wrote: What's the difference between mapred.tasktracker.reduce.tasks.maximum and mapred.map.tasks ** I want my data to be split against only 10 mappers in the entire cluster. Can I do that using one of the above parameters?