[jira] [Commented] (YARN-5447) Consider including allocationRequestId in NMContainerStatus to allow recovery in case of RM failover
[ https://issues.apache.org/jira/browse/YARN-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239034#comment-16239034 ] Arun Suresh commented on YARN-5447: --- Looks like YARN-7371 is already handling this. > Consider including allocationRequestId in NMContainerStatus to allow recovery > in case of RM failover > > > Key: YARN-5447 > URL: https://issues.apache.org/jira/browse/YARN-5447 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan >Priority: Major > > We have added a mapping of the allocated container to the original request > through YARN-4887/YARN-4888. There is a corner case in which the mapping will > be lost, i.e. if RM fails over before notifying the AM about newly allocated > container(s). This JIRA tracks the changes required to include the > allocationRequestId in NMContainerStatus to allow recovery in case of RM > failover. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5447) Consider including allocationRequestId in NMContainerStatus to allow recovery in case of RM failover
[ https://issues.apache.org/jira/browse/YARN-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239033#comment-16239033 ] Arun Suresh commented on YARN-5447: --- Got it. To enable this though, we would also need to update the ContainerTokenIdentifier to include the allocationRequestId - since, currently the NM has no idea about the allocateReqId. > Consider including allocationRequestId in NMContainerStatus to allow recovery > in case of RM failover > > > Key: YARN-5447 > URL: https://issues.apache.org/jira/browse/YARN-5447 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan >Priority: Major > > We have added a mapping of the allocated container to the original request > through YARN-4887/YARN-4888. There is a corner case in which the mapping will > be lost, i.e. if RM fails over before notifying the AM about newly allocated > container(s). This JIRA tracks the changes required to include the > allocationRequestId in NMContainerStatus to allow recovery in case of RM > failover. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5447) Consider including allocationRequestId in NMContainerStatus to allow recovery in case of RM failover
[ https://issues.apache.org/jira/browse/YARN-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227295#comment-16227295 ] Jian He commented on YARN-5447: --- yeah, we run into case 2, AM is not persisting the mapping, and the mapping got lost after AM and RM restarted, > Consider including allocationRequestId in NMContainerStatus to allow recovery > in case of RM failover > > > Key: YARN-5447 > URL: https://issues.apache.org/jira/browse/YARN-5447 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > > We have added a mapping of the allocated container to the original request > through YARN-4887/YARN-4888. There is a corner case in which the mapping will > be lost, i.e. if RM fails over before notifying the AM about newly allocated > container(s). This JIRA tracks the changes required to include the > allocationRequestId in NMContainerStatus to allow recovery in case of RM > failover. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5447) Consider including allocationRequestId in NMContainerStatus to allow recovery in case of RM failover
[ https://issues.apache.org/jira/browse/YARN-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221447#comment-16221447 ] Arun Suresh commented on YARN-5447: --- Just noticed your comment [~jianhe]. I don't think I understand your concern. Let us try to distinguish between 2 failover scenarios: # If the RM failsover, assuming the AM is still running when the new RM comes up, the AM will re-register and in the response, it will be notified by the RM all it currently running containers. The RM recreates this list from the NMContainerStatus it receives from the NM heartbeats. Since the AM keeps the mapping between allocationReqId and containerType/role in memory I am guessing we are fine. # If the AM failsover, we will get a new app attempt and this new app attempt will receive all the previous attempts running containers on registration. In this case, the mapping might be lost, if the AM had not persisted it somewhere. This JIRA was to track case 1, we can expand the scope to solving case 2. > Consider including allocationRequestId in NMContainerStatus to allow recovery > in case of RM failover > > > Key: YARN-5447 > URL: https://issues.apache.org/jira/browse/YARN-5447 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > > We have added a mapping of the allocated container to the original request > through YARN-4887/YARN-4888. There is a corner case in which the mapping will > be lost, i.e. if RM fails over before notifying the AM about newly allocated > container(s). This JIRA tracks the changes required to include the > allocationRequestId in NMContainerStatus to allow recovery in case of RM > failover. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5447) Consider including allocationRequestId in NMContainerStatus to allow recovery in case of RM failover
[ https://issues.apache.org/jira/browse/YARN-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16217855#comment-16217855 ] Jian He commented on YARN-5447: --- [~asuresh], in yarn service we are rely on allocateRequestId as a mapping from allocateId -> component. However, this failed in recovery, because the allocateRequestId is not recovered and break the client logic, see YARN-7371 We need to re-consider this ? > Consider including allocationRequestId in NMContainerStatus to allow recovery > in case of RM failover > > > Key: YARN-5447 > URL: https://issues.apache.org/jira/browse/YARN-5447 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > > We have added a mapping of the allocated container to the original request > through YARN-4887/YARN-4888. There is a corner case in which the mapping will > be lost, i.e. if RM fails over before notifying the AM about newly allocated > container(s). This JIRA tracks the changes required to include the > allocationRequestId in NMContainerStatus to allow recovery in case of RM > failover. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org