[ 
https://issues.apache.org/jira/browse/YARN-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15363156#comment-15363156
 ] 

Jian He commented on YARN-5302:
-------------------------------

Hi [~xinxianyin], [~Naganarasimha]
I think if NM restarts, it can still get the updated hdfs token from the NM 
heartbeat. Now the problem is that the 'initAppAggregator' happens before the 
node heartbeat. Thus, NM has not received the new token when it calls 
initAppAggregator.  
We could persist the token, either per app or per user. Persisting per user 
seems more efficient as less store operation is needed, but that's not easy to 
implement because we currently don't have a per-user concept in NM.
Another way is to postpone the 'initAppAggregator' until it has received the 
new token, though this also requires an amount of changes in the recover flow.  
Which approach do you think is more doable ? I'm kinda in favor of second 

> Yarn Application log Aggreagation fails due to NM can not get correct HDFS 
> delegation token II
> ----------------------------------------------------------------------------------------------
>
>                 Key: YARN-5302
>                 URL: https://issues.apache.org/jira/browse/YARN-5302
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Xianyin Xin
>            Assignee: Xianyin Xin
>         Attachments: YARN-5032.001.patch, YARN-5032.002.patch, 
> YARN-5302.003.patch, YARN-5302.004.patch
>
>
> Different with YARN-5098, this happens at NM side. When NM recovers, 
> credentials are read from NMStateStore. When initialize app aggregators, 
> exception happens because of the overdue tokens. The app is a long running 
> service.
> {code:title=LogAggregationService.java}
>   protected void initAppAggregator(final ApplicationId appId, String user,
>       Credentials credentials, ContainerLogsRetentionPolicy 
> logRetentionPolicy,
>       Map<ApplicationAccessType, String> appAcls,
>       LogAggregationContext logAggregationContext) {
>     // Get user's FileSystem credentials
>     final UserGroupInformation userUgi =
>         UserGroupInformation.createRemoteUser(user);
>     if (credentials != null) {
>       userUgi.addCredentials(credentials);
>     }
>    ...
>     try {
>       // Create the app dir
>       createAppDir(user, appId, userUgi);
>     } catch (Exception e) {
>       appLogAggregator.disableLogAggregation();
>       if (!(e instanceof YarnRuntimeException)) {
>         appDirException = new YarnRuntimeException(e);
>       } else {
>         appDirException = (YarnRuntimeException)e;
>       }
>       appLogAggregators.remove(appId);
>       closeFileSystems(userUgi);
>       throw appDirException;
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to