Zhijie Shen created YARN-2958:
---------------------------------
Summary: RMStateStore seems to unnecessarily and wronly store
sequence number separately
Key: YARN-2958
URL: https://issues.apache.org/jira/browse/YARN-2958
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Reporter: Zhijie Shen
It seems that RMStateStore updates last sequence number when storing or
updating each individual DT, to recover the latest sequence number when RM
restarting.
First, the current logic seems to be problematic:
{code}
public synchronized void updateRMDelegationTokenAndSequenceNumber(
RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
int latestSequenceNumber) {
if(isFencedState()) {
LOG.info("State store is in Fenced state. Can't update RM Delegation
Token.");
return;
}
try {
updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier,
renewDate,
latestSequenceNumber);
} catch (Exception e) {
notifyStoreOperationFailed(e);
}
}
{code}
{code}
@Override
protected void updateStoredToken(RMDelegationTokenIdentifier id,
long renewDate) {
try {
LOG.info("updating RMDelegation token with sequence number: "
+ id.getSequenceNumber());
rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
renewDate, id.getSequenceNumber());
} catch (Exception e) {
LOG.error("Error in updating persisted RMDelegationToken with sequence
number: "
+ id.getSequenceNumber());
ExitUtil.terminate(1, e);
}
}
{code}
According to code above, even when renewing a DT, the last sequence number is
updated in the store, which is wrong. For example, we have the following
sequence:
1. Get DT 1 (seq = 1)
2. Get DT 2( seq = 2)
3. Renew DT 1 (seq = 1)
4. Restart RM
The stored and then recovered last sequence number is 1. It makes the next
created DT after RM restarting will conflict with DT 2 on sequence num.
Second, the aforementioned bug doesn't happen actually, because the recovered
last sequence num has been overwritten at by the correctly one.
{code}
public void recover(RMState rmState) throws Exception {
LOG.info("recovering RMDelegationTokenSecretManager.");
// recover RMDTMasterKeys
for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
.getMasterKeyState()) {
addKey(dtKey);
}
// recover RMDelegationTokens
Map<RMDelegationTokenIdentifier, Long> rmDelegationTokens =
rmState.getRMDTSecretManagerState().getTokenState();
this.delegationTokenSequenceNumber =
rmState.getRMDTSecretManagerState().getDTSequenceNumber();
for (Map.Entry<RMDelegationTokenIdentifier, Long> entry : rmDelegationTokens
.entrySet()) {
addPersistedDelegationToken(entry.getKey(), entry.getValue());
}
}
{code}
The code above recovers delegationTokenSequenceNumber by reading the last
sequence number in the store. It could be wrong. Fortunately,
delegationTokenSequenceNumber updates it to the right number.
{code}
if (identifier.getSequenceNumber() > getDelegationTokenSeqNum()) {
setDelegationTokenSeqNum(identifier.getSequenceNumber());
}
{code}
All the stored identifiers will be gone through, and
delegationTokenSequenceNumber will be set to the largest sequence number among
these identifiers. Therefore, new DT will be assigned a sequence number which
is always larger than that of all the recovered DT.
To sum up, two negatives make a positive, but it's good to fix the issue.
Please let me know if I've missed something here.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)