mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum

2012-03-09 Thread Mohit Anchlia
What's the difference between mapred.tasktracker.reduce.tasks.maximum and
mapred.map.tasks
**
I want my data to be split against only 10 mappers in the entire cluster.
Can I do that using one of the above parameters?


Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum

2012-03-09 Thread Chen He
Hi Mohit

 mapred.tasktracker.reduce(map).tasks.maximum  means how many reduce(map)
slot(s) you can have on each tasktracker.

mapred.job.reduce(maps) means default number of reduce (map) tasks your
job will has.

To set the number of mappers in your application. You can write like this:

*configuration.setNumMapTasks(the number you want);*

Chen

Actually, you can just use configuration.set()

On Fri, Mar 9, 2012 at 6:42 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 What's the difference between mapred.tasktracker.reduce.tasks.maximum and
 mapred.map.tasks
 **
 I want my data to be split against only 10 mappers in the entire cluster.
 Can I do that using one of the above parameters?



Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum

2012-03-09 Thread Mohit Anchlia
What's the difference between setNumMapTasks and mapred.map.tasks?

On Fri, Mar 9, 2012 at 5:00 PM, Chen He airb...@gmail.com wrote:

 Hi Mohit

  mapred.tasktracker.reduce(map).tasks.maximum  means how many reduce(map)
 slot(s) you can have on each tasktracker.

 mapred.job.reduce(maps) means default number of reduce (map) tasks your
 job will has.

 To set the number of mappers in your application. You can write like this:

 *configuration.setNumMapTasks(the number you want);*

 Chen

 Actually, you can just use configuration.set()

 On Fri, Mar 9, 2012 at 6:42 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  What's the difference between mapred.tasktracker.reduce.tasks.maximum and
  mapred.map.tasks
  **
   I want my data to be split against only 10 mappers in the entire
 cluster.
  Can I do that using one of the above parameters?
 



Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum

2012-03-09 Thread Chen He
if you do not specify  setNumMapTasks, by default, system will use the
number you configured  for mapred.map.tasks in the conf/mapred-site.xml
file.

On Fri, Mar 9, 2012 at 7:19 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 What's the difference between setNumMapTasks and mapred.map.tasks?

 On Fri, Mar 9, 2012 at 5:00 PM, Chen He airb...@gmail.com wrote:

  Hi Mohit
 
   mapred.tasktracker.reduce(map).tasks.maximum  means how many
 reduce(map)
  slot(s) you can have on each tasktracker.
 
  mapred.job.reduce(maps) means default number of reduce (map) tasks your
  job will has.
 
  To set the number of mappers in your application. You can write like
 this:
 
  *configuration.setNumMapTasks(the number you want);*
 
  Chen
 
  Actually, you can just use configuration.set()
 
  On Fri, Mar 9, 2012 at 6:42 PM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
 
   What's the difference between mapred.tasktracker.reduce.tasks.maximum
 and
   mapred.map.tasks
   **
I want my data to be split against only 10 mappers in the entire
  cluster.
   Can I do that using one of the above parameters?
  
 



Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum

2012-03-09 Thread Mohit Anchlia
Is this system parameter too? Or can I specify as mapred.map.tasks? I am
using pig.

On Fri, Mar 9, 2012 at 6:19 PM, Chen He airb...@gmail.com wrote:

 if you do not specify  setNumMapTasks, by default, system will use the
 number you configured  for mapred.map.tasks in the conf/mapred-site.xml
 file.

 On Fri, Mar 9, 2012 at 7:19 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  What's the difference between setNumMapTasks and mapred.map.tasks?
 
  On Fri, Mar 9, 2012 at 5:00 PM, Chen He airb...@gmail.com wrote:
 
   Hi Mohit
  
mapred.tasktracker.reduce(map).tasks.maximum  means how many
  reduce(map)
   slot(s) you can have on each tasktracker.
  
   mapred.job.reduce(maps) means default number of reduce (map) tasks
 your
   job will has.
  
   To set the number of mappers in your application. You can write like
  this:
  
   *configuration.setNumMapTasks(the number you want);*
  
   Chen
  
   Actually, you can just use configuration.set()
  
   On Fri, Mar 9, 2012 at 6:42 PM, Mohit Anchlia mohitanch...@gmail.com
   wrote:
  
What's the difference between mapred.tasktracker.reduce.tasks.maximum
  and
mapred.map.tasks
**
 I want my data to be split against only 10 mappers in the entire
   cluster.
Can I do that using one of the above parameters?
   
  
 



Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum

2012-03-09 Thread bejoy . hadoop
Mohit
 It is a job level config parameter. For plain map reduce jobs you can set 
the same through CLI as
hadoop jar ... -D mapred.map.tasks=n
You should be able to do it pig as well.

However the number of map tasks for a job are governed by the input splits and 
the Input Format you are using. So setting this config parameter doesn't 
guarantee that your job would have the specified number of map tasks.
Normally you set the number of reduce tasks this way for your job, 
mapred.reduce.tasks=n

Hope it helps
Regards
Bejoy K S

From handheld, Please excuse typos.

-Original Message-
From: Mohit Anchlia mohitanch...@gmail.com
Date: Fri, 9 Mar 2012 20:34:33 
To: common-user@hadoop.apache.org
Reply-To: common-user@hadoop.apache.org
Subject: Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum

Is this system parameter too? Or can I specify as mapred.map.tasks? I am
using pig.

On Fri, Mar 9, 2012 at 6:19 PM, Chen He airb...@gmail.com wrote:

 if you do not specify  setNumMapTasks, by default, system will use the
 number you configured  for mapred.map.tasks in the conf/mapred-site.xml
 file.

 On Fri, Mar 9, 2012 at 7:19 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  What's the difference between setNumMapTasks and mapred.map.tasks?
 
  On Fri, Mar 9, 2012 at 5:00 PM, Chen He airb...@gmail.com wrote:
 
   Hi Mohit
  
mapred.tasktracker.reduce(map).tasks.maximum  means how many
  reduce(map)
   slot(s) you can have on each tasktracker.
  
   mapred.job.reduce(maps) means default number of reduce (map) tasks
 your
   job will has.
  
   To set the number of mappers in your application. You can write like
  this:
  
   *configuration.setNumMapTasks(the number you want);*
  
   Chen
  
   Actually, you can just use configuration.set()
  
   On Fri, Mar 9, 2012 at 6:42 PM, Mohit Anchlia mohitanch...@gmail.com
   wrote:
  
What's the difference between mapred.tasktracker.reduce.tasks.maximum
  and
mapred.map.tasks
**
 I want my data to be split against only 10 mappers in the entire
   cluster.
Can I do that using one of the above parameters?