[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258876#comment-14258876 ]
Hadoop QA commented on YARN-2958: --------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689133/YARN-2958.002.patch against trunk revision a164ce2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 15 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6191//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6191//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6191//console This message is automatically generated. > RMStateStore seems to unnecessarily and wronly store sequence number > separately > ------------------------------------------------------------------------------- > > Key: YARN-2958 > URL: https://issues.apache.org/jira/browse/YARN-2958 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Reporter: Zhijie Shen > Assignee: Varun Saxena > Priority: Blocker > Attachments: YARN-2958.001.patch, YARN-2958.002.patch > > > It seems that RMStateStore updates last sequence number when storing or > updating each individual DT, to recover the latest sequence number when RM > restarting. > First, the current logic seems to be problematic: > {code} > public synchronized void updateRMDelegationTokenAndSequenceNumber( > RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, > int latestSequenceNumber) { > if(isFencedState()) { > LOG.info("State store is in Fenced state. Can't update RM Delegation > Token."); > return; > } > try { > updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, > renewDate, > latestSequenceNumber); > } catch (Exception e) { > notifyStoreOperationFailed(e); > } > } > {code} > {code} > @Override > protected void updateStoredToken(RMDelegationTokenIdentifier id, > long renewDate) { > try { > LOG.info("updating RMDelegation token with sequence number: " > + id.getSequenceNumber()); > rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, > renewDate, id.getSequenceNumber()); > } catch (Exception e) { > LOG.error("Error in updating persisted RMDelegationToken with sequence > number: " > + id.getSequenceNumber()); > ExitUtil.terminate(1, e); > } > } > {code} > According to code above, even when renewing a DT, the last sequence number is > updated in the store, which is wrong. For example, we have the following > sequence: > 1. Get DT 1 (seq = 1) > 2. Get DT 2( seq = 2) > 3. Renew DT 1 (seq = 1) > 4. Restart RM > The stored and then recovered last sequence number is 1. It makes the next > created DT after RM restarting will conflict with DT 2 on sequence num. > Second, the aforementioned bug doesn't happen actually, because the recovered > last sequence num has been overwritten at by the correctly one. > {code} > public void recover(RMState rmState) throws Exception { > LOG.info("recovering RMDelegationTokenSecretManager."); > // recover RMDTMasterKeys > for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() > .getMasterKeyState()) { > addKey(dtKey); > } > // recover RMDelegationTokens > Map<RMDelegationTokenIdentifier, Long> rmDelegationTokens = > rmState.getRMDTSecretManagerState().getTokenState(); > this.delegationTokenSequenceNumber = > rmState.getRMDTSecretManagerState().getDTSequenceNumber(); > for (Map.Entry<RMDelegationTokenIdentifier, Long> entry : > rmDelegationTokens > .entrySet()) { > addPersistedDelegationToken(entry.getKey(), entry.getValue()); > } > } > {code} > The code above recovers delegationTokenSequenceNumber by reading the last > sequence number in the store. It could be wrong. Fortunately, > delegationTokenSequenceNumber updates it to the right number. > {code} > if (identifier.getSequenceNumber() > getDelegationTokenSeqNum()) { > setDelegationTokenSeqNum(identifier.getSequenceNumber()); > } > {code} > All the stored identifiers will be gone through, and > delegationTokenSequenceNumber will be set to the largest sequence number > among these identifiers. Therefore, new DT will be assigned a sequence number > which is always larger than that of all the recovered DT. > To sum up, two negatives make a positive, but it's good to fix the issue. > Please let me know if I've missed something here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)