[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767242#comment-13767242
]
Wangda Tan commented on YARN-1197:
----------------------------------
Hi Bikas,
Thanks for reply, it helps me understanding YARN mechanism, but I think
there're some misunderstanding.
In some HPC cases, how many processes will be launched in different node is not
determinated before we submit job, just give it total enough resource (like
100G) in the cluster to it. So we will have following problems,
1) We will launch exactly one daemon process in each node, and this daemon
process launch other local processes. This is root cause of why we want this
feature
2) We don't know how much resource to request in this case,
# Large requests may cause some wasting, and it's hard to get from RM
# Small requests may not enough (when cluster is busy, we cannot "regret" if
we already have a small room in a node, we can only return it and ask a larger
one, but when we returned it, the room may be occupied by another app, and we
cannot take it back.
When we have a such API, we can implement our AM more easily, we can
iteratively send request to RM which is depended on what we already have. And
finally, we can merge them to different big containers and give it to real app.
(like PBS/TORQUE/MPI), we can make a "small cluster" in YARN, and can support
HPC workloads very well. (It's a little similar to mesos, aggregate resources
to a slave daemon, and the slave daemon can manage these resources, but we
don't need make it dynamic -- increase container size when its running, just
merge it before we start processes will be good enough) :)
> Add container merge support in YARN
> -----------------------------------
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
> Issue Type: Task
> Components: api, nodemanager, resourcemanager
> Affects Versions: 2.1.0-beta
> Reporter: Wangda Tan
>
> Currently, YARN cannot support merge several containers in one node to a big
> container, which can make us incrementally ask resources, merge them to a
> bigger one, and launch our processes. The user scenario is described in the
> comments.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira