[
https://issues.apache.org/jira/browse/YARN-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654347#comment-15654347
]
Arun Suresh commented on YARN-5651:
-----------------------------------
[~jianhe], Wondering what the right approach for this is.
Currently, in the normal container startup flow, if the NM recovery happens
*after* the container start request comes in (the RecoveredContainerStatus ==
REQUESTED) but *before* the container is launched (at which point
RecoveredContainerStatus == LAUNCHED), the container is just reported back as
killed. If the Container has been launched and the container is active, then
the ContainerImpl's internal state is regenerated using the
StartContainerRequest.
I was thinking, similarly, if a re-initialization request (re-init / restart or
rollback) arrives for a container, we just mark in the stateStore as
RecoveredContainerStatus == RE_INITIALIZING.
If the NM restarts and recovers before the container has finished
re-initializing, then we just report the container as killed.
If the Container has completed the relaunch, I proposed we:
# we can replace the ContainerImpl's internal state (launchContext, ResourceSet
etc.). We already do this now.
# we also replace the stored StartContainerRequest object, stored in the db,
with a new StartContainerRequest which we create from the ContainerImpl's
internal state.
This way, there is no real need to actually store the
ReInitializeContainerRequest object anywhere. Thoughts ?
> Changes to NMStateStore to persist reinitialization and rollback state
> ----------------------------------------------------------------------
>
> Key: YARN-5651
> URL: https://issues.apache.org/jira/browse/YARN-5651
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Arun Suresh
> Assignee: Arun Suresh
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]