[jira] [Comment Edited] (YARN-5302) Yarn Application log Aggreagation fails due to NM can not get correct HDFS delegation token II

2016-07-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364953#comment-15364953
 ] 

Varun Saxena edited comment on YARN-5302 at 7/6/16 7:39 PM:


I think changes here will be required irrespective of YARN-5175. Delaying 
creating folders(done inside initAppAggregator) is a better solution IMO 
because this takes care of the case where NM is shut down while updating the 
token in state store.
Maybe we can store the apps for which initialization failed due to invalid 
token somewhere(maybe in NMContext) and process them on next HB.


was (Author: varun_saxena):
I think changes here will be required irrespective of YARN-5175. Delaying 
creating folders(done inside initAppAggregator) is a better solution IMO 
because this takes care of the case where NM is shut down while updating the 
token in state store.
Maybe we can store the apps for which initialization failed due to invalid 
token somewhere(maybe in NMContext) and process them on next HB.

> Yarn Application log Aggreagation fails due to NM can not get correct HDFS 
> delegation token II
> --
>
> Key: YARN-5302
> URL: https://issues.apache.org/jira/browse/YARN-5302
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-5032.001.patch, YARN-5032.002.patch, 
> YARN-5302.003.patch, YARN-5302.004.patch
>
>
> Different with YARN-5098, this happens at NM side. When NM recovers, 
> credentials are read from NMStateStore. When initialize app aggregators, 
> exception happens because of the overdue tokens. The app is a long running 
> service.
> {code:title=LogAggregationService.java}
>   protected void initAppAggregator(final ApplicationId appId, String user,
>   Credentials credentials, ContainerLogsRetentionPolicy 
> logRetentionPolicy,
>   Map appAcls,
>   LogAggregationContext logAggregationContext) {
> // Get user's FileSystem credentials
> final UserGroupInformation userUgi =
> UserGroupInformation.createRemoteUser(user);
> if (credentials != null) {
>   userUgi.addCredentials(credentials);
> }
>...
> try {
>   // Create the app dir
>   createAppDir(user, appId, userUgi);
> } catch (Exception e) {
>   appLogAggregator.disableLogAggregation();
>   if (!(e instanceof YarnRuntimeException)) {
> appDirException = new YarnRuntimeException(e);
>   } else {
> appDirException = (YarnRuntimeException)e;
>   }
>   appLogAggregators.remove(appId);
>   closeFileSystems(userUgi);
>   throw appDirException;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5302) Yarn Application log Aggreagation fails due to NM can not get correct HDFS delegation token II

2016-07-03 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360805#comment-15360805
 ] 

Xianyin Xin edited comment on YARN-5302 at 7/4/16 3:51 AM:
---

Thanks [~Naganarasimha]. YARN-2704 gives fundamental ability to renew a HDFS 
delegation token, however, it didn't cover all the token-caused failures, like 
this jira and YARN-5305. In this jira, the exception happens in recovery stage, 
where the token is read from NMStateStore and it has been expired.

I believe the new requested HDFS token should be persisted to NMStateStore, or, 
if we don't want to do so, we should use try the {{systemCredentials}} when the 
original tokens in NMStateStore expires.


was (Author: xinxianyin):
Thanks [~Naganarasimha]. YARN-2704 gives fundamental ability to renew a HDFS 
delegation token, however, it didn't cover all the token-caused failures, like 
this jira and YARN-5035. In this jira, the exception happens in recovery stage, 
where the token is read from NMStateStore and it has been expired.

I believe the new requested HDFS token should be persisted to NMStateStore, or, 
if we don't want to do so, we should use try the {{systemCredentials}} when the 
original tokens in NMStateStore expires.

> Yarn Application log Aggreagation fails due to NM can not get correct HDFS 
> delegation token II
> --
>
> Key: YARN-5302
> URL: https://issues.apache.org/jira/browse/YARN-5302
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-5032.001.patch
>
>
> Different with YARN-5098, this happens at NM side. When NM recovers, 
> credentials are read from NMStateStore. When initialize app aggregators, 
> exception happens because of the overdue tokens. The app is a long running 
> service.
> {code:title=LogAggregationService.java}
>   protected void initAppAggregator(final ApplicationId appId, String user,
>   Credentials credentials, ContainerLogsRetentionPolicy 
> logRetentionPolicy,
>   Map appAcls,
>   LogAggregationContext logAggregationContext) {
> // Get user's FileSystem credentials
> final UserGroupInformation userUgi =
> UserGroupInformation.createRemoteUser(user);
> if (credentials != null) {
>   userUgi.addCredentials(credentials);
> }
>...
> try {
>   // Create the app dir
>   createAppDir(user, appId, userUgi);
> } catch (Exception e) {
>   appLogAggregator.disableLogAggregation();
>   if (!(e instanceof YarnRuntimeException)) {
> appDirException = new YarnRuntimeException(e);
>   } else {
> appDirException = (YarnRuntimeException)e;
>   }
>   appLogAggregators.remove(appId);
>   closeFileSystems(userUgi);
>   throw appDirException;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5302) Yarn Application log Aggreagation fails due to NM can not get correct HDFS delegation token II

2016-07-01 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358553#comment-15358553
 ] 

Varun Saxena edited comment on YARN-5302 at 7/1/16 7:30 AM:


Ok. So this happens when RM has renewed the token but not yet passed the token 
onto the NM in Heartbeat because NM restarted.
Do we not update the token in NM state store when it changes ?


was (Author: varun_saxena):
Ok. So this happens when RM has renewed the token but not yet passed the token 
onto the NM in Heartbeat because NM restarted.

> Yarn Application log Aggreagation fails due to NM can not get correct HDFS 
> delegation token II
> --
>
> Key: YARN-5302
> URL: https://issues.apache.org/jira/browse/YARN-5302
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Xianyin Xin
>
> Different with YARN-5098, this happens at NM side. When NM recovers, 
> credentials are read from NMStateStore. When initialize app aggregators, 
> exception happens because of the overdue tokens. The app is a long running 
> service.
> {code:title=LogAggregationService.java}
>   protected void initAppAggregator(final ApplicationId appId, String user,
>   Credentials credentials, ContainerLogsRetentionPolicy 
> logRetentionPolicy,
>   Map appAcls,
>   LogAggregationContext logAggregationContext) {
> // Get user's FileSystem credentials
> final UserGroupInformation userUgi =
> UserGroupInformation.createRemoteUser(user);
> if (credentials != null) {
>   userUgi.addCredentials(credentials);
> }
>...
> try {
>   // Create the app dir
>   createAppDir(user, appId, userUgi);
> } catch (Exception e) {
>   appLogAggregator.disableLogAggregation();
>   if (!(e instanceof YarnRuntimeException)) {
> appDirException = new YarnRuntimeException(e);
>   } else {
> appDirException = (YarnRuntimeException)e;
>   }
>   appLogAggregators.remove(appId);
>   closeFileSystems(userUgi);
>   throw appDirException;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5302) Yarn Application log Aggreagation fails due to NM can not get correct HDFS delegation token II

2016-07-01 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358553#comment-15358553
 ] 

Varun Saxena edited comment on YARN-5302 at 7/1/16 7:21 AM:


Ok. So this happens when RM has renewed the token but not yet passed the token 
onto the NM in Heartbeat because NM restarted.


was (Author: varun_saxena):
Ok. So this happens when RM has renewed the token but not yet passed the token 
onto the NM in Heartbeat.

> Yarn Application log Aggreagation fails due to NM can not get correct HDFS 
> delegation token II
> --
>
> Key: YARN-5302
> URL: https://issues.apache.org/jira/browse/YARN-5302
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Xianyin Xin
>
> Different with YARN-5098, this happens at NM side. When NM recovers, 
> credentials are read from NMStateStore. When initialize app aggregators, 
> exception happens because of the overdue tokens. The app is a long running 
> service.
> {code:title=LogAggregationService.java}
>   protected void initAppAggregator(final ApplicationId appId, String user,
>   Credentials credentials, ContainerLogsRetentionPolicy 
> logRetentionPolicy,
>   Map appAcls,
>   LogAggregationContext logAggregationContext) {
> // Get user's FileSystem credentials
> final UserGroupInformation userUgi =
> UserGroupInformation.createRemoteUser(user);
> if (credentials != null) {
>   userUgi.addCredentials(credentials);
> }
>...
> try {
>   // Create the app dir
>   createAppDir(user, appId, userUgi);
> } catch (Exception e) {
>   appLogAggregator.disableLogAggregation();
>   if (!(e instanceof YarnRuntimeException)) {
> appDirException = new YarnRuntimeException(e);
>   } else {
> appDirException = (YarnRuntimeException)e;
>   }
>   appLogAggregators.remove(appId);
>   closeFileSystems(userUgi);
>   throw appDirException;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org