[jira] [Commented] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled

Arun Suresh (JIRA) Tue, 20 Jun 2017 10:39:23 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056153#comment-16056153
 ]


Arun Suresh commented on YARN-6127:
-----------------------------------

Thanks for the patch [~botong]

Couple of comments:
* It looks like when an interceptor needs to persist state, it has to 
explicitly do an 
{{nmContext.getNMStateStore().storeAMRMProxyAppContextEntry()}} while after 
recovery, it must explicitly invoke the {{getRecoveredDataMap()}} to access the 
state. I feel it might be better to just expose an {{InterceptorState}} 
API/class that is available to the Interceptor via the context. This state 
object can then expose a {{get(key)}} and {{put(key, value)}} which would under 
the hood negotiate with the stateStore to store the state and retrieve all 
existing keys and values on recovery. 
* We should be incrementing the major version of the version Info. Also, I 
think we would need to do something similar to YARN-5547 to handle the 
AMRMPROXY_KEY_PREFIX to ensure that rollback does not bomb.

 

> Add support for work preserving NM restart when AMRMProxy is enabled
> --------------------------------------------------------------------
>
>                 Key: YARN-6127
>                 URL: https://issues.apache.org/jira/browse/YARN-6127
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: amrmproxy, nodemanager
>            Reporter: Subru Krishnan
>            Assignee: Botong Huang
>         Attachments: YARN-6127.v1.patch, YARN-6127.v2.patch
>
>
> YARN-1336 added the ability to restart NM without loosing any running 
> containers. In a Federated YARN environment, there's additional state in the 
> {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need 
> to enhance {{AMRMProxy}} to support work-preserving restart.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled

Reply via email to