Jason Lowe updated YARN-1341:
Thanks for reviewing, Junping!
bq. The change in BaseContainerTokenSecretManager.java is not necessary and I
believe that belongs to YARN-1342.
Good catch, removed.
bq. Can we consolidate the code in a separated method together with
NMContainerTokenSecretManager as we will do similar thing to recover
ContainerToken staff which make code have duplicated things?
I'm not sure I understand what you're requesting. Recovering the NM tokens is
one line of code (3 if we count the "if canRecover" part), and recovering the
container tokens in YARN-1342 will add one more line for that (inside the same
"if canRecover" block). I went ahead and factored this into a separate method,
however I'm not sure it matches what you were expecting as I don't see where
we're saving duplicated code. If what's in the updated patch isn't what you
expected, please provide some sample pseudo-code to demonstrate how we can
avoid duplication of code.
bq. Does log error here is just enough in case of failure in store? If Master
key is updated but not persistent, then it could cause some inconsistency when
recover it. I think we should throw some exception here if store get failed and
rollback the key just set.
The problem with throwing an exception is what to do with the exception -- do
we take down the NM? That seems like a drastic answer since the NM will likely
chug along just fine without the key stored. It only becomes a problem when
the NM restarts and restores an old key. However if we rollback the old key
here then we take that only-breaks-if-we-happened-to-restart case and make it
an always-breaks scenario. Eventually the old key will no longer be valid to
the RM, and none of the AMs will be able to authenticate to the NM. Therefore
I thought it would be better to log the error, press onward, and hope we don't
restart before we store a valid key again (maybe store error was transient)
rather than either take down the NM or have things start failing even without a
> Recover NMTokens upon nodemanager restart
> Key: YARN-1341
> URL: https://issues.apache.org/jira/browse/YARN-1341
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Affects Versions: 2.3.0
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch,
> YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch
This message was sent by Atlassian JIRA