Re: setNumTasks

2012-03-22 Thread Mohit Anchlia
Could someone please help me answer this question?

On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia mohitanch...@gmail.comwrote:

 What is the corresponding system property for setNumTasks? Can it be used
 explicitly as system property like mapred.tasks.?


Re: setNumTasks

2012-03-22 Thread Mohit Anchlia
Sorry I meant *setNumMapTasks. *What is mapred.map.tasks for? It's
confusing as to what it's purpose is for? I tried setting it for my job
still I see more map tasks running than *mapred.map.tasks*

On Thu, Mar 22, 2012 at 7:53 AM, Harsh J ha...@cloudera.com wrote:

 There isn't such an API as setNumTasks. There is however,
 setNumReduceTasks, which sets mapred.reduce.tasks.

 Does this answer your question?

 On Thu, Mar 22, 2012 at 8:21 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:
  Could someone please help me answer this question?
 
  On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:
 
  What is the corresponding system property for setNumTasks? Can it be
 used
  explicitly as system property like mapred.tasks.?



 --
 Harsh J



Re: setNumTasks

2012-03-22 Thread Bejoy Ks
Hi Mohit
  The number of map tasks is determined by your number of input splits
and the Input Format used by your MR job. Setting this value won't help you
control the same. AFAIK it would get effective if the value in
mapred.map.tasks is greater than the no of tasks calculated by the Job
based on the splits and Input Format.

Regards
Bejoy KS

On Thu, Mar 22, 2012 at 8:28 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Sorry I meant *setNumMapTasks. *What is mapred.map.tasks for? It's
 confusing as to what it's purpose is for? I tried setting it for my job
 still I see more map tasks running than *mapred.map.tasks*

 On Thu, Mar 22, 2012 at 7:53 AM, Harsh J ha...@cloudera.com wrote:

  There isn't such an API as setNumTasks. There is however,
  setNumReduceTasks, which sets mapred.reduce.tasks.
 
  Does this answer your question?
 
  On Thu, Mar 22, 2012 at 8:21 PM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
   Could someone please help me answer this question?
  
   On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
  
   What is the corresponding system property for setNumTasks? Can it be
  used
   explicitly as system property like mapred.tasks.?
 
 
 
  --
  Harsh J
 



Re: setNumTasks

2012-03-22 Thread Shi Yu
If you want to control the number of input splits at fine granularity, 
you could customize the NLineInputFormat. You need to determine the 
number of lines per each split.  Thus you need to know before is the 
number of lines in your input data, for instance, using


hadoop -text /input/dir/* | wc -l

will give you a number, lets assume it is N

If you have K number of nodes, each nodes has C number of core, 
basically you could start K*C number of mapper jobs.  And you want to 
further assume each mapper process 2 splits (in case that some jobs are 
finished earlier), therefore the optimal number of lines in 
NLineInputFormat is around


N/(2*K*C)

Thus might give you an optimal job balance.   Remember, the 
NLineInputFormat usually takes longer time than other input format to 
initialize, and the line split only concerns about number of lines, but 
is unaware about the content length per each line. Thus, in sequence 
data analysis is some lines are significantly longer than other lines, 
the mapper assigned with longer lines will be much slower than those 
assigned with smaller lines.  So randomly mixing short and long lines 
before split is more preferable.



Shi


On 3/22/2012 10:01 AM, Bejoy Ks wrote:

Hi Mohit
   The number of map tasks is determined by your number of input splits
and the Input Format used by your MR job. Setting this value won't help you
control the same. AFAIK it would get effective if the value in
mapred.map.tasks is greater than the no of tasks calculated by the Job
based on the splits and Input Format.

Regards
Bejoy KS

On Thu, Mar 22, 2012 at 8:28 PM, Mohit Anchliamohitanch...@gmail.comwrote:


Sorry I meant *setNumMapTasks. *What is mapred.map.tasks for? It's
confusing as to what it's purpose is for? I tried setting it for my job
still I see more map tasks running than *mapred.map.tasks*

On Thu, Mar 22, 2012 at 7:53 AM, Harsh Jha...@cloudera.com  wrote:


There isn't such an API as setNumTasks. There is however,
setNumReduceTasks, which sets mapred.reduce.tasks.

Does this answer your question?

On Thu, Mar 22, 2012 at 8:21 PM, Mohit Anchliamohitanch...@gmail.com
wrote:

Could someone please help me answer this question?

On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchliamohitanch...@gmail.com
wrote:


What is the corresponding system property for setNumTasks? Can it be

used

explicitly as system property like mapred.tasks.?



--
Harsh J