[ https://issues.apache.org/jira/browse/YARN-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453991#comment-15453991 ]
Subru Krishnan edited comment on YARN-5601 at 9/1/16 1:43 AM: -------------------------------------------------------------- [~jianhe], to answer your question let me start with why we need epoch in a federated cluster: currently only a single RM generates containerIDs (applicationID + a sequence number) but in a federated cluster, there are multiple RMs that are concurrently generating them. So there will be conflicts if an application spans across multiple sub-clusters. To avoid this conflict, we use epoch in a federated cluster similar to how it's used in the context of work preserving restarts to prevent conflicts. The idea is we will set epoch number to be 0 for first sub-cluster RM, 10000 for second sub-cluster RM, 20000 for third sub-cluster RM, etc. This should be sufficient as we have 1M epochs as they are represented as a 20bit integer. With this, there will be a conflict of containerIDs only if *all* of the below conditions are satisfied: # The RM of sub-cluster 1 is rebooted over 10000 times # There is a running App the is still running (during over 10k reboots of one of the RMs) # The app is run across sub-cluster 1 and sub-cluster 2 # The app is still holding onto containers from sub-cluster 2 issued from the first reboot of that sub-cluster # The containers have Ids low enough that the newly issued containers from RM1 clash Makes sense? was (Author: subru): [~jianhe], to answer your question let me start with why we need epoch in a federated cluster: currently only a single RM generates containerIDs (applicationID + a sequence number) but in a federated cluster, there are multiple RMs that are concurrently generating them. So there will be conflicts if an application spans across multiple sub-clusters. To avoid this conflict, we use epoch in a federated cluster similar to how it's used in the context of work preserving restarts to prevent conflicts. The idea is we will set epoch number to be 0 for first sub-cluster RM, 10000 for second sub-cluster RM, 20000 for third sub-cluster RM, etc. This should be sufficient as we have 1M epochs as they are represented as a 20bit integer. With this, there will be a conflict of containerIDs only if *all* of the below conditions are satisfied: 1) The RM of sub-cluster 1 is rebooted over 10000 times 2) There is a running App the is still running (during over 10k reboots of one of the RMs) 3) The app is run across sub-cluster 1 and sub-cluster 2 4) The app is still holding onto containers from sub-cluster 2 issued from the first reboot of that sub-cluster 5) The containers have Ids low enough that the newly issued containers from RM1 clash Makes sense? > Make the RM epoch base value configurable > ----------------------------------------- > > Key: YARN-5601 > URL: https://issues.apache.org/jira/browse/YARN-5601 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager > Reporter: Subru Krishnan > Assignee: Subru Krishnan > Attachments: YARN-5601-YARN-2915-v1.patch > > > Currently the epoch always starts from zero. This can cause container ids to > conflict for an application under Federation that spans multiple RMs > concurrently. This JIRA proposes to make the RM epoch base value configurable > which will allow us to avoid conflicts by setting different values for each > RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org