[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767242#comment-13767242
 ] 

Wangda Tan commented on YARN-1197:
----------------------------------

Hi Bikas,
Thanks for reply, it helps me understanding YARN mechanism, but I think 
there're some misunderstanding. 

In some HPC cases, how many processes will be launched in different node is not 
determinated before we submit job, just give it total enough resource (like 
100G) in the cluster to it. So we will have following problems,
1) We will launch exactly one daemon process in each node, and this daemon 
process launch other local processes. This is root cause of why we want this 
feature
2) We don't know how much resource to request in this case,
   # Large requests may cause some wasting, and it's hard to get from RM
   # Small requests may not enough (when cluster is busy, we cannot "regret" if 
we already have a small room in a node, we can only return it and ask a larger 
one, but when we returned it, the room may be occupied by another app, and we 
cannot take it back.

When we have a such API, we can implement our AM more easily, we can 
iteratively send request to RM which is depended on what we already have. And 
finally, we can merge them to different big containers and give it to real app. 
(like PBS/TORQUE/MPI), we can make a "small cluster" in YARN, and can support 
HPC workloads very well. (It's a little similar to mesos, aggregate resources 
to a slave daemon, and the slave daemon can manage these resources, but we 
don't need make it dynamic -- increase container size when its running, just 
merge it before we start processes will be good enough) :)
                
> Add container merge support in YARN
> -----------------------------------
>
>                 Key: YARN-1197
>                 URL: https://issues.apache.org/jira/browse/YARN-1197
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: api, nodemanager, resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Wangda Tan
>
> Currently, YARN cannot support merge several containers in one node to a big 
> container, which can make us incrementally ask resources, merge them to a 
> bigger one, and launch our processes. The user scenario is described in the 
> comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to