[jira] [Commented] (YARN-685) Capacity Scheduler is not distributing the reducers tasks across the cluster

2013-07-22 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13715534#comment-13715534
 ] 

Omkar Vinit Joshi commented on YARN-685:


Thanks .. [~raviprak]...looking at your response it is fairly distributed and 
random. But ReduceTaskAttmptImpl doesn't seem to be doing anything special 
w.r.t. reduce task. It only requests the containers with requested memory on 
any node manager. Now the MR may get requested container on any node manager 
which satisfies the request from resource manager scheduler. Even though it is 
fairly random I don't see why we should do that?
{code}
  public ReduceTaskAttemptImpl(TaskId id, int attempt,
  EventHandler eventHandler, Path jobFile, int partition,
  int numMapTasks, JobConf conf,
  TaskAttemptListener taskAttemptListener,
  TokenJobTokenIdentifier jobToken,
  Credentials credentials, Clock clock,
  AppContext appContext) {
super(id, attempt, eventHandler, taskAttemptListener, jobFile, partition,
conf, new String[] {}, jobToken, credentials, clock,
appContext);
this.numMapTasks = numMapTasks;
  }
{code}

[~devaraj.k] It is clearly random and fairly distributed across the 
cluster...However do we really need that? Why can't we look for reducers close 
to mappers? thoughts?

 Capacity Scheduler is not distributing the reducers tasks across the cluster
 

 Key: YARN-685
 URL: https://issues.apache.org/jira/browse/YARN-685
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.4-alpha
Reporter: Devaraj K

 If we have reducers whose total memory required to complete is less than the 
 total cluster memory, it is not assigning the reducers to all the nodes 
 uniformly(~uniformly). Also at that time there are no other jobs or job tasks 
 running in the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-685) Capacity Scheduler is not distributing the reducers tasks across the cluster

2013-07-19 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713976#comment-13713976
 ] 

Ravi Prakash commented on YARN-685:
---

The first column is the number of nodes. The second column is the number of 
tasks that run on each of those nodes. So 
2 1 - There were 2 nodes which each run 1 tasks
32 2 - There were 32 nodes which each run 2 tasks
1 4 - There was 1 node which ran 4 tasks.


 Capacity Scheduler is not distributing the reducers tasks across the cluster
 

 Key: YARN-685
 URL: https://issues.apache.org/jira/browse/YARN-685
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.4-alpha
Reporter: Devaraj K

 If we have reducers whose total memory required to complete is less than the 
 total cluster memory, it is not assigning the reducers to all the nodes 
 uniformly(~uniformly). Also at that time there are no other jobs or job tasks 
 running in the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-685) Capacity Scheduler is not distributing the reducers tasks across the cluster

2013-07-18 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713285#comment-13713285
 ] 

Omkar Vinit Joshi commented on YARN-685:


[~raviprak] can you please tell me what is value-1 and value-2?? I think 
first one is nodes..what is second?
also what do you mean here?
{code}
For 23, Reduce: 
2 1
32 2
1 4
{code}

 Capacity Scheduler is not distributing the reducers tasks across the cluster
 

 Key: YARN-685
 URL: https://issues.apache.org/jira/browse/YARN-685
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.4-alpha
Reporter: Devaraj K

 If we have reducers whose total memory required to complete is less than the 
 total cluster memory, it is not assigning the reducers to all the nodes 
 uniformly(~uniformly). Also at that time there are no other jobs or job tasks 
 running in the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-685) Capacity Scheduler is not distributing the reducers tasks across the cluster

2013-07-12 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13707290#comment-13707290
 ] 

Ravi Prakash commented on YARN-685:
---

This is not the behavior I am seeing in 0.23 / 2.2. On a 35 node cluster with 
14*1.5 of memory, I first ran a randomtextwriter with 490 maps and 70 reduces. 
Then a sorter on the produced output. The distribution of tasks was

For 23, Map: 
 35 14
For 23, Reduce: 
  2 1
 32 2
  1 4
2.2 Map:
 35 14
2.2 Reduce:
  1 1
 33 2
  1 3

Did you mean its not exactly uniform?

 Capacity Scheduler is not distributing the reducers tasks across the cluster
 

 Key: YARN-685
 URL: https://issues.apache.org/jira/browse/YARN-685
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.4-alpha
Reporter: Devaraj K

 If we have reducers whose total memory required to complete is less than the 
 total cluster memory, it is not assigning the reducers to all the nodes 
 uniformly(~uniformly). Also at that time there are no other jobs or job tasks 
 running in the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-685) Capacity Scheduler is not distributing the reducers tasks across the cluster

2013-07-12 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13707292#comment-13707292
 ] 

Ravi Prakash commented on YARN-685:
---

I meant a randomwriter with 490 maps and 0 reduces. The sorter also had 490 
maps, but 70 reduces.

 Capacity Scheduler is not distributing the reducers tasks across the cluster
 

 Key: YARN-685
 URL: https://issues.apache.org/jira/browse/YARN-685
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.4-alpha
Reporter: Devaraj K

 If we have reducers whose total memory required to complete is less than the 
 total cluster memory, it is not assigning the reducers to all the nodes 
 uniformly(~uniformly). Also at that time there are no other jobs or job tasks 
 running in the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-685) Capacity Scheduler is not distributing the reducers tasks across the cluster

2013-07-09 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704102#comment-13704102
 ] 

Omkar Vinit Joshi commented on YARN-685:


I would like to take this up...is there anyway we can reproduce or verify this? 
if you have any test /sample code to verify this; it would be really helpful.

 Capacity Scheduler is not distributing the reducers tasks across the cluster
 

 Key: YARN-685
 URL: https://issues.apache.org/jira/browse/YARN-685
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.4-alpha
Reporter: Devaraj K

 If we have reducers whose total memory required to complete is less than the 
 total cluster memory, it is not assigning the reducers to all the nodes 
 uniformly(~uniformly). Also at that time there are no other jobs or job tasks 
 running in the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-685) Capacity Scheduler is not distributing the reducers tasks across the cluster

2013-07-09 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704116#comment-13704116
 ] 

Devaraj K commented on YARN-685:


We can reproduce this. In a cluster having multiple NM's, Submit a Job(any Job) 
having some number of reducers whose total memory(i.e resources) required to 
complete the reducers is less than available memory(i.e resources) in the 
cluster. These reducers are not distributing across the cluster, instead they 
are getting assigned to only few nodes.

 Capacity Scheduler is not distributing the reducers tasks across the cluster
 

 Key: YARN-685
 URL: https://issues.apache.org/jira/browse/YARN-685
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.4-alpha
Reporter: Devaraj K

 If we have reducers whose total memory required to complete is less than the 
 total cluster memory, it is not assigning the reducers to all the nodes 
 uniformly(~uniformly). Also at that time there are no other jobs or job tasks 
 running in the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira