[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658637#comment-13658637 ]
Alejandro Abdelnur commented on YARN-624: ----------------------------------------- As pointed out, supporting gang at RM/scheduler level will allow detection/avoidance of deadlocks. This would not be trivial (nor efficient) to do if gang is done at AM level. Examples of gang request capabilities could be: * express a set of containers in any nodes. I.e.: 10 containers in any node of the cluster. * express a set of containers in a specified set of nodes. I.e.: 10 containers in rack1. 10 containers one in each of n1...n10 * express different sets of possible gangs that would satisfy the request: I.e.: 10 containers in rack1 or in rack2. 10 containers in n1...n10 or in n11..n20. * indicate a timeout/fallback-to-normal of gang requests. We should decide on what gang capabilities we want/need to address in the short term. > Support gang scheduling in the AM RM protocol > --------------------------------------------- > > Key: YARN-624 > URL: https://issues.apache.org/jira/browse/YARN-624 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, scheduler > Affects Versions: 2.0.4-alpha > Reporter: Sandy Ryza > Assignee: Sandy Ryza > > Per discussion on YARN-392 and elsewhere, gang scheduling, in which a > scheduler runs a set of tasks when they can all be run at the same time, > would be a useful feature for YARN schedulers to support. > Currently, AMs can approximate this by holding on to containers until they > get all the ones they need. However, this lends itself to deadlocks when > different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira