[jira] [Commented] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService
[ https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258990#comment-14258990 ] Junping Du commented on YARN-2993: -- The test failure and findbugs warning should be unrelated. +1. The patch looks good to me. Will commit it soon. Several fixes (missing acl check, error log msg ...) and some refinement in AdminService Key: YARN-2993 URL: https://issues.apache.org/jira/browse/YARN-2993 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2993.001.patch This JIRA is to resolve following issues in {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}: *1.* There is no ACLs check for {{refreshServiceAcls}} *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can not refresh Admin ACLs. instead of ... Can not refresh user-groups. *3.* some unnecessary header import. *4.* {code} if (!isRMActive()) { RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, ResourceManager is not active. Can not remove labels.); throwStandbyException(); } {code} is common in lots of methods, just the message is different, we should refine it into one common method. *5.* {code} LOG.info(Exception remove labels, ioe); RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, Exception remove label); throw RPCUtil.getRemoteException(ioe); {code} is common in lots of methods, just the message is different, we should refine it into one common method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService
[ https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2993: - Hadoop Flags: Reviewed Several fixes (missing acl check, error log msg ...) and some refinement in AdminService Key: YARN-2993 URL: https://issues.apache.org/jira/browse/YARN-2993 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2993.001.patch This JIRA is to resolve following issues in {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}: *1.* There is no ACLs check for {{refreshServiceAcls}} *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can not refresh Admin ACLs. instead of ... Can not refresh user-groups. *3.* some unnecessary header import. *4.* {code} if (!isRMActive()) { RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, ResourceManager is not active. Can not remove labels.); throwStandbyException(); } {code} is common in lots of methods, just the message is different, we should refine it into one common method. *5.* {code} LOG.info(Exception remove labels, ioe); RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, Exception remove label); throw RPCUtil.getRemoteException(ioe); {code} is common in lots of methods, just the message is different, we should refine it into one common method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService
[ https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259105#comment-14259105 ] Hudson commented on YARN-2993: -- FAILURE: Integrated in Hadoop-trunk-Commit #6791 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6791/]) YARN-2993. Several fixes (missing acl check, error log msg ...) and some refinement in AdminService. (Contributed by Yi Liu) (junping_du: rev 40ee4bff65b2bfdabfd16ee7d9be3382a0476565) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java Several fixes (missing acl check, error log msg ...) and some refinement in AdminService Key: YARN-2993 URL: https://issues.apache.org/jira/browse/YARN-2993 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2993.001.patch This JIRA is to resolve following issues in {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}: *1.* There is no ACLs check for {{refreshServiceAcls}} *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can not refresh Admin ACLs. instead of ... Can not refresh user-groups. *3.* some unnecessary header import. *4.* {code} if (!isRMActive()) { RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, ResourceManager is not active. Can not remove labels.); throwStandbyException(); } {code} is common in lots of methods, just the message is different, we should refine it into one common method. *5.* {code} LOG.info(Exception remove labels, ioe); RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, Exception remove label); throw RPCUtil.getRemoteException(ioe); {code} is common in lots of methods, just the message is different, we should refine it into one common method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259191#comment-14259191 ] Jian He commented on YARN-2958: --- [~varun_saxena], thanks for working on this! I think we can remove the latestSequenceNumber arg for {{RMStateStore#updateRMDelegationTokenAndSequenceNumber}} and {{RMStateStore#updateRMDelegationTokenAndSequenceNumberInternal}} and also fix all underling stores to not update seq number. And fix the method name to be updateRMDelegationToken only. RMStateStore seems to unnecessarily and wronly store sequence number separately --- Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Attachments: YARN-2958.001.patch, YARN-2958.002.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) { addPersistedDelegationToken(entry.getKey(), entry.getValue()); } } {code} The code above recovers delegationTokenSequenceNumber by reading the last sequence number in the store. It could be wrong. Fortunately, delegationTokenSequenceNumber updates it to the right number. {code} if (identifier.getSequenceNumber() getDelegationTokenSeqNum()) { setDelegationTokenSeqNum(identifier.getSequenceNumber()); } {code} All the stored identifiers will be gone through, and delegationTokenSequenceNumber will be set to the largest sequence number among these identifiers. Therefore, new DT will be assigned a sequence number which is always larger than that of all the recovered DT. To sum up, two negatives make a positive, but it's good to fix the issue. Please let me know if I've missed something here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2994) Document work-preserving RM restart
Jian He created YARN-2994: - Summary: Document work-preserving RM restart Key: YARN-2994 URL: https://issues.apache.org/jira/browse/YARN-2994 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2994) Document work-preserving RM restart
[ https://issues.apache.org/jira/browse/YARN-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2994: -- Attachment: YARN-2994.1.patch Updated the doc to include work-preserving RM restart Document work-preserving RM restart --- Key: YARN-2994 URL: https://issues.apache.org/jira/browse/YARN-2994 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2994.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2994) Document work-preserving RM restart
[ https://issues.apache.org/jira/browse/YARN-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259213#comment-14259213 ] Hadoop QA commented on YARN-2994: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689195/YARN-2994.1.patch against trunk revision 40ee4bf. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6194//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6194//console This message is automatically generated. Document work-preserving RM restart --- Key: YARN-2994 URL: https://issues.apache.org/jira/browse/YARN-2994 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2994.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2992) ZKRMStateStore crashes due to session expiry
[ https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259223#comment-14259223 ] Jian He commented on YARN-2992: --- lgtm, bq. I have observed that RM exits while starting if ZK is not available I think we have retry built in for this scenario ? ZKRMStateStore crashes due to session expiry Key: YARN-2992 URL: https://issues.apache.org/jira/browse/YARN-2992 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-2992-1.patch We recently saw the RM crash with the following stacktrace. On session expiry, we should gracefully transition to standby. {noformat} 2014-12-18 06:28:42,689 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2992) ZKRMStateStore crashes due to session expiry
[ https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259243#comment-14259243 ] Jian He commented on YARN-2992: --- one question: do we need to create a new zkClient object by calling createConnection, or is it OK to re-use the old one ? ZKRMStateStore crashes due to session expiry Key: YARN-2992 URL: https://issues.apache.org/jira/browse/YARN-2992 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-2992-1.patch We recently saw the RM crash with the following stacktrace. On session expiry, we should gracefully transition to standby. {noformat} 2014-12-18 06:28:42,689 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259245#comment-14259245 ] Jian He commented on YARN-2936: --- [~varun_saxena], thanks for taking on this ! Maybe a simply way is to do this: {code} public YARNDelegationTokenIdentifierProto getProto() { builder.setOwner(getOwner().toString()); builder.setRenewer(getRenewer().toString()); builder.setRealUser(getRealUser().toString()); builder.setIssueDate(getIssueDate()); builder.setMaxDate(getMaxDate()); builder.setSequenceNumber(getSequenceNumber()); builder.setMasterKeyId(getMasterKeyId()); return builder.build(); } {code} and create a common method for these setters YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2958: --- Attachment: YARN-2958.003.patch RMStateStore seems to unnecessarily and wronly store sequence number separately --- Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Attachments: YARN-2958.001.patch, YARN-2958.002.patch, YARN-2958.003.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) { addPersistedDelegationToken(entry.getKey(), entry.getValue()); } } {code} The code above recovers delegationTokenSequenceNumber by reading the last sequence number in the store. It could be wrong. Fortunately, delegationTokenSequenceNumber updates it to the right number. {code} if (identifier.getSequenceNumber() getDelegationTokenSeqNum()) { setDelegationTokenSeqNum(identifier.getSequenceNumber()); } {code} All the stored identifiers will be gone through, and delegationTokenSequenceNumber will be set to the largest sequence number among these identifiers. Therefore, new DT will be assigned a sequence number which is always larger than that of all the recovered DT. To sum up, two negatives make a positive, but it's good to fix the issue. Please let me know if I've missed something here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2958: --- Attachment: (was: YARN-2958.003.patch) RMStateStore seems to unnecessarily and wronly store sequence number separately --- Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Attachments: YARN-2958.001.patch, YARN-2958.002.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) { addPersistedDelegationToken(entry.getKey(), entry.getValue()); } } {code} The code above recovers delegationTokenSequenceNumber by reading the last sequence number in the store. It could be wrong. Fortunately, delegationTokenSequenceNumber updates it to the right number. {code} if (identifier.getSequenceNumber() getDelegationTokenSeqNum()) { setDelegationTokenSeqNum(identifier.getSequenceNumber()); } {code} All the stored identifiers will be gone through, and delegationTokenSequenceNumber will be set to the largest sequence number among these identifiers. Therefore, new DT will be assigned a sequence number which is always larger than that of all the recovered DT. To sum up, two negatives make a positive, but it's good to fix the issue. Please let me know if I've missed something here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259266#comment-14259266 ] Varun Saxena commented on YARN-2958: Thanks [~jianhe] for the review. I think latestSequenceNumber is not required even in storeRMDT operation. Will change and upload a new patch. RMStateStore seems to unnecessarily and wronly store sequence number separately --- Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Attachments: YARN-2958.001.patch, YARN-2958.002.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) { addPersistedDelegationToken(entry.getKey(), entry.getValue()); } } {code} The code above recovers delegationTokenSequenceNumber by reading the last sequence number in the store. It could be wrong. Fortunately, delegationTokenSequenceNumber updates it to the right number. {code} if (identifier.getSequenceNumber() getDelegationTokenSeqNum()) { setDelegationTokenSeqNum(identifier.getSequenceNumber()); } {code} All the stored identifiers will be gone through, and delegationTokenSequenceNumber will be set to the largest sequence number among these identifiers. Therefore, new DT will be assigned a sequence number which is always larger than that of all the recovered DT. To sum up, two negatives make a positive, but it's good to fix the issue. Please let me know if I've missed something here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2958: --- Attachment: YARN-2958.003.patch RMStateStore seems to unnecessarily and wronly store sequence number separately --- Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Attachments: YARN-2958.001.patch, YARN-2958.002.patch, YARN-2958.003.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) { addPersistedDelegationToken(entry.getKey(), entry.getValue()); } } {code} The code above recovers delegationTokenSequenceNumber by reading the last sequence number in the store. It could be wrong. Fortunately, delegationTokenSequenceNumber updates it to the right number. {code} if (identifier.getSequenceNumber() getDelegationTokenSeqNum()) { setDelegationTokenSeqNum(identifier.getSequenceNumber()); } {code} All the stored identifiers will be gone through, and delegationTokenSequenceNumber will be set to the largest sequence number among these identifiers. Therefore, new DT will be assigned a sequence number which is always larger than that of all the recovered DT. To sum up, two negatives make a positive, but it's good to fix the issue. Please let me know if I've missed something here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259269#comment-14259269 ] Jian He commented on YARN-2958: --- Thanks [~varun_saxena], bq. latestSequenceNumber is not required even in storeRMDT operation I think it is required, as we want to persist the seq number separately. RMStateStore seems to unnecessarily and wronly store sequence number separately --- Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Attachments: YARN-2958.001.patch, YARN-2958.002.patch, YARN-2958.003.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) { addPersistedDelegationToken(entry.getKey(), entry.getValue()); } } {code} The code above recovers delegationTokenSequenceNumber by reading the last sequence number in the store. It could be wrong. Fortunately, delegationTokenSequenceNumber updates it to the right number. {code} if (identifier.getSequenceNumber() getDelegationTokenSeqNum()) { setDelegationTokenSeqNum(identifier.getSequenceNumber()); } {code} All the stored identifiers will be gone through, and delegationTokenSequenceNumber will be set to the largest sequence number among these identifiers. Therefore, new DT will be assigned a sequence number which is always larger than that of all the recovered DT. To sum up, two negatives make a positive, but it's good to fix the issue. Please let me know if I've missed something here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259272#comment-14259272 ] Varun Saxena commented on YARN-2958: [~jianhe], I mean we can take it from RMDelegationTokenIdentifier#getSequenceNumber and remove latestSequenceNumber parameter from RMStateStore#storeRMDelegationTokenAndSequenceNumber. We can persist the sequence number by taking it from RMDelegationTokenIdentifier RMStateStore seems to unnecessarily and wronly store sequence number separately --- Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Attachments: YARN-2958.001.patch, YARN-2958.002.patch, YARN-2958.003.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) { addPersistedDelegationToken(entry.getKey(), entry.getValue()); } } {code} The code above recovers delegationTokenSequenceNumber by reading the last sequence number in the store. It could be wrong. Fortunately, delegationTokenSequenceNumber updates it to the right number. {code} if (identifier.getSequenceNumber() getDelegationTokenSeqNum()) { setDelegationTokenSeqNum(identifier.getSequenceNumber()); } {code} All the stored identifiers will be gone through, and delegationTokenSequenceNumber will be set to the largest sequence number among these identifiers. Therefore, new DT will be assigned a sequence number which is always larger than that of all the recovered DT. To sum up, two negatives make a positive, but it's good to fix the issue. Please let me know if I've missed something here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259271#comment-14259271 ] Varun Saxena commented on YARN-2958: [~jianhe], I mean we can take it from RMDelegationTokenIdentifier#getSequenceNumber and remove latestSequenceNumber parameter from RMStateStore#storeRMDelegationTokenAndSequenceNumber. We can persist the sequence number by taking it from RMDelegationTokenIdentifier RMStateStore seems to unnecessarily and wronly store sequence number separately --- Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Attachments: YARN-2958.001.patch, YARN-2958.002.patch, YARN-2958.003.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) { addPersistedDelegationToken(entry.getKey(), entry.getValue()); } } {code} The code above recovers delegationTokenSequenceNumber by reading the last sequence number in the store. It could be wrong. Fortunately, delegationTokenSequenceNumber updates it to the right number. {code} if (identifier.getSequenceNumber() getDelegationTokenSeqNum()) { setDelegationTokenSeqNum(identifier.getSequenceNumber()); } {code} All the stored identifiers will be gone through, and delegationTokenSequenceNumber will be set to the largest sequence number among these identifiers. Therefore, new DT will be assigned a sequence number which is always larger than that of all the recovered DT. To sum up, two negatives make a positive, but it's good to fix the issue. Please let me know if I've missed something here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259273#comment-14259273 ] Jian He commented on YARN-2958: --- I see, makes sense, thanks for your explanation! RMStateStore seems to unnecessarily and wronly store sequence number separately --- Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Attachments: YARN-2958.001.patch, YARN-2958.002.patch, YARN-2958.003.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) { addPersistedDelegationToken(entry.getKey(), entry.getValue()); } } {code} The code above recovers delegationTokenSequenceNumber by reading the last sequence number in the store. It could be wrong. Fortunately, delegationTokenSequenceNumber updates it to the right number. {code} if (identifier.getSequenceNumber() getDelegationTokenSeqNum()) { setDelegationTokenSeqNum(identifier.getSequenceNumber()); } {code} All the stored identifiers will be gone through, and delegationTokenSequenceNumber will be set to the largest sequence number among these identifiers. Therefore, new DT will be assigned a sequence number which is always larger than that of all the recovered DT. To sum up, two negatives make a positive, but it's good to fix the issue. Please let me know if I've missed something here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2992) ZKRMStateStore crashes due to session expiry
[ https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259281#comment-14259281 ] Karthik Kambatla commented on YARN-2992: bq. one question: do we need to create a new zkClient object by calling createConnection, or is it OK to re-use the old one ? Thought about it some at the time of working on the patch. We probably don't need the call to createConnection, as the watcher would probably go off before the next retry or the next. However, given the frequency of session expiries and lost connections, I felt it should be okay to explicitly createConnection. I don't think that will add a significant overhead or lead to inaccuracies. ZKRMStateStore crashes due to session expiry Key: YARN-2992 URL: https://issues.apache.org/jira/browse/YARN-2992 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-2992-1.patch We recently saw the RM crash with the following stacktrace. On session expiry, we should gracefully transition to standby. {noformat} 2014-12-18 06:28:42,689 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2992) ZKRMStateStore crashes due to session expiry
[ https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259283#comment-14259283 ] Jian He commented on YARN-2992: --- sounds good. committing. ZKRMStateStore crashes due to session expiry Key: YARN-2992 URL: https://issues.apache.org/jira/browse/YARN-2992 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-2992-1.patch We recently saw the RM crash with the following stacktrace. On session expiry, we should gracefully transition to standby. {noformat} 2014-12-18 06:28:42,689 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2992) ZKRMStateStore crashes due to session expiry
[ https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259286#comment-14259286 ] Hudson commented on YARN-2992: -- FAILURE: Integrated in Hadoop-trunk-Commit #6792 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6792/]) YARN-2992. ZKRMStateStore crashes due to session expiry. Contributed by Karthik Kambatla (jianhe: rev 1454efe5d4fe4214ec5ef9142d55dbeca7dab953) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/CHANGES.txt ZKRMStateStore crashes due to session expiry Key: YARN-2992 URL: https://issues.apache.org/jira/browse/YARN-2992 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Fix For: 2.7.0 Attachments: yarn-2992-1.patch We recently saw the RM crash with the following stacktrace. On session expiry, we should gracefully transition to standby. {noformat} 2014-12-18 06:28:42,689 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs
[ https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2987: --- Attachment: YARN-2987.001.patch ClientRMService#getQueueInfo doesn't check app ACLs --- Key: YARN-2987 URL: https://issues.apache.org/jira/browse/YARN-2987 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Varun Saxena Attachments: YARN-2987.001.patch ClientRMService#getQueueInfo can return a list of applications belonging to the queue, but doesn't actually check if the user has the permission to view the applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs
[ https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2987: --- Attachment: (was: YARN-2987.001.patch) ClientRMService#getQueueInfo doesn't check app ACLs --- Key: YARN-2987 URL: https://issues.apache.org/jira/browse/YARN-2987 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Varun Saxena ClientRMService#getQueueInfo can return a list of applications belonging to the queue, but doesn't actually check if the user has the permission to view the applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs
[ https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2987: --- Attachment: YARN-2987.001.patch ClientRMService#getQueueInfo doesn't check app ACLs --- Key: YARN-2987 URL: https://issues.apache.org/jira/browse/YARN-2987 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Varun Saxena Attachments: YARN-2987.001.patch ClientRMService#getQueueInfo can return a list of applications belonging to the queue, but doesn't actually check if the user has the permission to view the applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259288#comment-14259288 ] Hadoop QA commented on YARN-2958: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689205/YARN-2958.003.patch against trunk revision 40ee4bf. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 15 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6195//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6195//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6195//console This message is automatically generated. RMStateStore seems to unnecessarily and wronly store sequence number separately --- Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Attachments: YARN-2958.001.patch, YARN-2958.002.patch, YARN-2958.003.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) { addPersistedDelegationToken(entry.getKey(), entry.getValue()); } } {code} The code above recovers delegationTokenSequenceNumber by reading the last sequence number in the store. It could be wrong. Fortunately,
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259289#comment-14259289 ] Hadoop QA commented on YARN-2958: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689206/YARN-2958.003.patch against trunk revision 40ee4bf. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 15 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6196//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6196//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6196//console This message is automatically generated. RMStateStore seems to unnecessarily and wronly store sequence number separately --- Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Attachments: YARN-2958.001.patch, YARN-2958.002.patch, YARN-2958.003.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) { addPersistedDelegationToken(entry.getKey(), entry.getValue()); } } {code} The code above recovers delegationTokenSequenceNumber by reading the last sequence number in the store. It could be wrong. Fortunately,
[jira] [Commented] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs
[ https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259297#comment-14259297 ] Hadoop QA commented on YARN-2987: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689209/YARN-2987.001.patch against trunk revision 1454efe. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 15 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6197//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6197//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6197//console This message is automatically generated. ClientRMService#getQueueInfo doesn't check app ACLs --- Key: YARN-2987 URL: https://issues.apache.org/jira/browse/YARN-2987 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Varun Saxena Attachments: YARN-2987.001.patch ClientRMService#getQueueInfo can return a list of applications belonging to the queue, but doesn't actually check if the user has the permission to view the applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2995) Enhance UI to show cluster resource utilization of various container types
Sriram Rao created YARN-2995: Summary: Enhance UI to show cluster resource utilization of various container types Key: YARN-2995 URL: https://issues.apache.org/jira/browse/YARN-2995 Project: Hadoop YARN Issue Type: Task Components: resourcemanager Reporter: Sriram Rao This JIRA proposes to extend the Resource manager UI to show how cluster resources are being used to run *guaranteed start* and *queueable* containers. For example, a graph that shows over time, the fraction of running containers that are *guaranteed start* and the fraction of running containers that are *queueable*. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2995) Enhance UI to show cluster resource utilization of various container types
[ https://issues.apache.org/jira/browse/YARN-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriram Rao updated YARN-2995: - Issue Type: Sub-task (was: Task) Parent: YARN-2877 Enhance UI to show cluster resource utilization of various container types -- Key: YARN-2995 URL: https://issues.apache.org/jira/browse/YARN-2995 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sriram Rao This JIRA proposes to extend the Resource manager UI to show how cluster resources are being used to run *guaranteed start* and *queueable* containers. For example, a graph that shows over time, the fraction of running containers that are *guaranteed start* and the fraction of running containers that are *queueable*. -- This message was sent by Atlassian JIRA (v6.3.4#6332)