[
https://issues.apache.org/jira/browse/YARN-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630619#comment-13630619
]
Daryn Sharp commented on YARN-503:
----------------------------------
bq. There's likely a race between the RenewalTask and AbortTask. [...] I think
it's possible for a schedulesTask to be executing - in which case the
abortScheduleTask() may have no affect and can result in the wrong task being
cancelled / scheduled.
It's ok if another task is executing, because it's just trying to abort any
pending task. Since there's only one possible pending task per token at any
given time, the wrong task can't be cancelled. Did I miss an edge case?
bq. ManagedApp.add - instead of adding to the app and the token here, this
composite add can be kept outside of ManagedApp/ManagedToken
Not sure I follow. Are you suggesting to move {{managedToken.add(appId)}} into
the loop in {{addApplication}}? I was trying to encapsulate the implementation
details of adding/removing the appId within ManagedApp. Is it ok to leave it
as-is?
bq. ManagedApp.expunge() - is synchronization on 'appTokens' required ?
Strictly speaking, probably not. It's a throwback to earlier implementation
that was doing trickier stuff. It was to avoid concurrent modification
exceptions while iterating, but appTokens isn't mutating in multiple threads.
And the {{remove}} is essentially guarding it too. For that matter, I don't
think {{appTokens}} needs to be a synch-ed set. I'll change it.
bq. MangedToken.expunge() - tokenApps.clear() required ?
Probably not. Seemed like good housekeeping, but I'll remove it.
bq. In the unit test - the 1 second sleep seems rather low. Instead of the
sleep, this can be changed to a timed wait on one of the fields being verified.
I don't like sleeps either. 1s is an eternity in this case because the initial
renew and cancel timer tasks fire immediately on mocked objects, so it should
run in a few ms. I assume you are suggesting using notify in a mock'ed answer
method? Multiple timers are expected to fire in some cases, so it would
probably require something like a CountdownLatch, which will get tricky to keep
swapping in a new one by re-adding mocked responses with the new latch. Let me
know if you feel it's worth it to change it.
> DelegationTokens will be renewed forever if multiple jobs share tokens and
> the first one sets JOB_CANCEL_DELEGATION_TOKEN to false
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-503
> URL: https://issues.apache.org/jira/browse/YARN-503
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 0.23.3, 3.0.0, 2.0.0-alpha
> Reporter: Siddharth Seth
> Assignee: Daryn Sharp
> Attachments: YARN-503.patch, YARN-503.patch
>
>
> The first Job/App to register a token is the one which DelegationTokenRenewer
> associates with a a specific Token. An attempt to remove/cancel these shared
> tokens by subsequent jobs doesn't work - since the JobId will not match.
> As a result, Even if subsequent jobs have
> MRJobConfig.JOB_CANCEL_DELEGATION_TOKEN set to true - tokens will not be
> cancelled when those jobs complete.
> Tokens will eventually be removed from the RM / JT when the service that
> issued them considers them to have expired or via an explicit
> cancelDelegationTokens call (not implemented yet in 23).
> A side affect of this is that the same delegation token will end up being
> renewed multiple times (a separate TimerTask for each job which uses the
> token).
> DelegationTokenRenewer could maintain a reference count/list of jobIds for
> shared tokens.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira