[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029539#comment-14029539 ]
Tsuyoshi OZAWA commented on YARN-2052: -------------------------------------- [~jianhe], I think it's OK after fencing operation, but one problem is {{recover()}} is invoked before the fencing. My idea to deal with the problem is as follows: 1. Active RM stores current epoch value. 2. After the fail over, new active RM recovers epoch and recognizes the epoch value as {{epoch + 1}}. 3. New active RM issues {{fence()}} on ZKRMStateStore and increment epoch. > ContainerId creation after work preserving restart is broken > ------------------------------------------------------------ > > Key: YARN-2052 > URL: https://issues.apache.org/jira/browse/YARN-2052 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Tsuyoshi OZAWA > > Container ids are made unique by using the app identifier and appending a > monotonically increasing sequence number to it. Since container creation is a > high churn activity the RM does not store the sequence number per app. So > after restart it does not know what the new sequence number should be for new > allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)