[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767242#comment-13767242 ]
Wangda Tan commented on YARN-1197: ---------------------------------- Hi Bikas, Thanks for reply, it helps me understanding YARN mechanism, but I think there're some misunderstanding. In some HPC cases, how many processes will be launched in different node is not determinated before we submit job, just give it total enough resource (like 100G) in the cluster to it. So we will have following problems, 1) We will launch exactly one daemon process in each node, and this daemon process launch other local processes. This is root cause of why we want this feature 2) We don't know how much resource to request in this case, # Large requests may cause some wasting, and it's hard to get from RM # Small requests may not enough (when cluster is busy, we cannot "regret" if we already have a small room in a node, we can only return it and ask a larger one, but when we returned it, the room may be occupied by another app, and we cannot take it back. When we have a such API, we can implement our AM more easily, we can iteratively send request to RM which is depended on what we already have. And finally, we can merge them to different big containers and give it to real app. (like PBS/TORQUE/MPI), we can make a "small cluster" in YARN, and can support HPC workloads very well. (It's a little similar to mesos, aggregate resources to a slave daemon, and the slave daemon can manage these resources, but we don't need make it dynamic -- increase container size when its running, just merge it before we start processes will be good enough) :) > Add container merge support in YARN > ----------------------------------- > > Key: YARN-1197 > URL: https://issues.apache.org/jira/browse/YARN-1197 > Project: Hadoop YARN > Issue Type: Task > Components: api, nodemanager, resourcemanager > Affects Versions: 2.1.0-beta > Reporter: Wangda Tan > > Currently, YARN cannot support merge several containers in one node to a big > container, which can make us incrementally ask resources, merge them to a > bigger one, and launch our processes. The user scenario is described in the > comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira