[
https://issues.apache.org/jira/browse/YARN-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15363156#comment-15363156
]
Jian He commented on YARN-5302:
-------------------------------
Hi [~xinxianyin], [~Naganarasimha]
I think if NM restarts, it can still get the updated hdfs token from the NM
heartbeat. Now the problem is that the 'initAppAggregator' happens before the
node heartbeat. Thus, NM has not received the new token when it calls
initAppAggregator.
We could persist the token, either per app or per user. Persisting per user
seems more efficient as less store operation is needed, but that's not easy to
implement because we currently don't have a per-user concept in NM.
Another way is to postpone the 'initAppAggregator' until it has received the
new token, though this also requires an amount of changes in the recover flow.
Which approach do you think is more doable ? I'm kinda in favor of second
> Yarn Application log Aggreagation fails due to NM can not get correct HDFS
> delegation token II
> ----------------------------------------------------------------------------------------------
>
> Key: YARN-5302
> URL: https://issues.apache.org/jira/browse/YARN-5302
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Reporter: Xianyin Xin
> Assignee: Xianyin Xin
> Attachments: YARN-5032.001.patch, YARN-5032.002.patch,
> YARN-5302.003.patch, YARN-5302.004.patch
>
>
> Different with YARN-5098, this happens at NM side. When NM recovers,
> credentials are read from NMStateStore. When initialize app aggregators,
> exception happens because of the overdue tokens. The app is a long running
> service.
> {code:title=LogAggregationService.java}
> protected void initAppAggregator(final ApplicationId appId, String user,
> Credentials credentials, ContainerLogsRetentionPolicy
> logRetentionPolicy,
> Map<ApplicationAccessType, String> appAcls,
> LogAggregationContext logAggregationContext) {
> // Get user's FileSystem credentials
> final UserGroupInformation userUgi =
> UserGroupInformation.createRemoteUser(user);
> if (credentials != null) {
> userUgi.addCredentials(credentials);
> }
> ...
> try {
> // Create the app dir
> createAppDir(user, appId, userUgi);
> } catch (Exception e) {
> appLogAggregator.disableLogAggregation();
> if (!(e instanceof YarnRuntimeException)) {
> appDirException = new YarnRuntimeException(e);
> } else {
> appDirException = (YarnRuntimeException)e;
> }
> appLogAggregators.remove(appId);
> closeFileSystems(userUgi);
> throw appDirException;
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]