[jira] [Commented] (YARN-1662) Capacity Scheduler reservation issue cause Job Hang

2015-05-04 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526473#comment-14526473
 ] 

Sunil G commented on YARN-1662:
---

Yes [~jianhe]
we can close this issue. After YARN-1769, we have a better reservation too.

I checked this and its not happening now.

 Capacity Scheduler reservation issue cause Job Hang
 ---

 Key: YARN-1662
 URL: https://issues.apache.org/jira/browse/YARN-1662
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.2.0
 Environment: Suse 11 SP1 + Linux
Reporter: Sunil G

 There are 2 node managers in my cluster.
 NM1 with 8GB
 NM2 with 8GB
 I am submitting a Job with below details:
 AM with 2GB
 Map needs 5GB
 Reducer needs 3GB
 slowstart is enabled with 0.5
 10maps and 50reducers are assigned.
 5maps are completed. Now few reducers got scheduled.
 Now NM1 has 2GB AM and 3Gb Reducer_1[Used 5GB]
 NM2 has 3Gb Reducer_2  [Used 3GB]
 A Map has now reserved(5GB) in NM1 which has only 3Gb free.
 It hangs forever.
 Potential issue is, reservation is now blocked in NM1 for a Map which needs 
 5GB.
 But the Reducer_1 hangs by waiting for few map ouputs.
 Reducer side preemption also not happened as few headroom is still available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1662) Capacity Scheduler reservation issue cause Job Hang

2015-05-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523998#comment-14523998
 ] 

Jian He commented on YARN-1662:
---

Hi [~sunilg], YARN-1198 has fixed a number of headRoom issues to make sure the 
headroom is correct so that the reducer preemption will kick in correctly. In 
that case, this problem may be resolved ?

 Capacity Scheduler reservation issue cause Job Hang
 ---

 Key: YARN-1662
 URL: https://issues.apache.org/jira/browse/YARN-1662
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
 Environment: Suse 11 SP1 + Linux
Reporter: Sunil G

 There are 2 node managers in my cluster.
 NM1 with 8GB
 NM2 with 8GB
 I am submitting a Job with below details:
 AM with 2GB
 Map needs 5GB
 Reducer needs 3GB
 slowstart is enabled with 0.5
 10maps and 50reducers are assigned.
 5maps are completed. Now few reducers got scheduled.
 Now NM1 has 2GB AM and 3Gb Reducer_1[Used 5GB]
 NM2 has 3Gb Reducer_2  [Used 3GB]
 A Map has now reserved(5GB) in NM1 which has only 3Gb free.
 It hangs forever.
 Potential issue is, reservation is now blocked in NM1 for a Map which needs 
 5GB.
 But the Reducer_1 hangs by waiting for few map ouputs.
 Reducer side preemption also not happened as few headroom is still available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1662) Capacity Scheduler reservation issue cause Job Hang

2014-01-30 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887471#comment-13887471
 ] 

Sunil G commented on YARN-1662:
---

A timed reservation logic if we can implement here, then it will be safer for 
the fresh allocation to try in some other node.
I have reviewd the scheduler part and found that without a seperate timer 
thread, this can be achieved.

addReReservation() will be invoked when the same node tries to rereserve the 
same applications requests in the node.
This is a multiset, hence the internal count will increment everytime when this 
addReReservation() is performed.
Also this will be incremented in every 1 sec(node heartbeat interval) only.

I wish to add a code like below in LeafQueue::assignContainer() method. If the 
limit exceeds, i will try unreseve the same from the node.
This code will hit when the same application trying to re-reserve again in same 
node. 

} else {
  // Reserve by 'charging' in advance...
  reserve(application, priority, node, rmContainer, container);
  
  // Check for re-reservation limit. In this case, unreserve and try for a
  // fresh allocation.
  if (RESERVATION_TIME_LIMIT != 0
   application.getReReservations(priority)  RESERVATION_TIME_LIMIT) {
unreserve(application, priority, node, rmContainer);
return Resources.none();
  }

So for the next nodeupdate from some other node, CS can try allocate resource 
to this application.

NB: Reservation is to ensure that same task can stick on to same node where its 
better to run. 
A bigger configurable limit which is based on the nature of the tasks running, 
can still achieve the above behavior.

Please share your thoughts.

 Capacity Scheduler reservation issue cause Job Hang
 ---

 Key: YARN-1662
 URL: https://issues.apache.org/jira/browse/YARN-1662
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
 Environment: Suse 11 SP1 + Linux
Reporter: Sunil G

 There are 2 node managers in my cluster.
 NM1 with 8GB
 NM2 with 8GB
 I am submitting a Job with below details:
 AM with 2GB
 Map needs 5GB
 Reducer needs 3GB
 slowstart is enabled with 0.5
 10maps and 50reducers are assigned.
 5maps are completed. Now few reducers got scheduled.
 Now NM1 has 2GB AM and 3Gb Reducer_1[Used 5GB]
 NM2 has 3Gb Reducer_2  [Used 3GB]
 A Map has now reserved(5GB) in NM1 which has only 3Gb free.
 It hangs forever.
 Potential issue is, reservation is now blocked in NM1 for a Map which needs 
 5GB.
 But the Reducer_1 hangs by waiting for few map ouputs.
 Reducer side preemption also not happened as few headroom is still available.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)