[
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986213#comment-13986213
]
Wangda Tan commented on YARN-1368:
----------------------------------
Thanks [~jianhe] for this proposal, I think recover container from NM heartbeat
is a reasonable way, +1 for general ideas,
Some minor comments,
bq. Noticed that FiCaSchedulerNode and FSSchedulerNode are almost the same. Any
reason for keeping both ? thinking to merge the common methods into
SchedulerNode.
Currently IMO, we'd better keep both. To avoid involving too much parts in this
JIRA, we can separate the merge common logic of them to a new task.
bq. ContainerStatus sent in NM registration doesn’t capture enough information
for re-constructing the containers. we may replace that with a new object or
just adding more fields to encapsulate all the necessary information for
re-constructing the container.
Personally I think create a new type specialized for container recovering is
better, ContainerStatus is also used in node heartbeat. Including too much
fields in each heartbeat isn't safe or efficient
> Common work to re-populate containers’ state into scheduler
> -----------------------------------------------------------
>
> Key: YARN-1368
> URL: https://issues.apache.org/jira/browse/YARN-1368
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Bikas Saha
> Assignee: Anubhav Dhoot
> Attachments: YARN-1368.preliminary.patch
>
>
> YARN-1367 adds support for the NM to tell the RM about all currently running
> containers upon registration. The RM needs to send this information to the
> schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover
> the current allocation state of the cluster.
--
This message was sent by Atlassian JIRA
(v6.2#6252)