Jason Lowe commented on YARN-3449:

I believe the apps in the appTokenKeepAliveMap will be recovered per my first 
comment, but yes the relative delays stored in that map will not match what was 
there before.  However I'm not sure it matters that we have the exact times in 
there.  Again when the NM re-registers it will report all active applications, 
and the RM will attempt to correct this on the next heartbeat.  The NM will 
then add all apps that are still aggregating to the appTokenKeepAliveMap and 
report that to the RM, and the RM will delay the token removal accordingly.  I 
don't think this changes when the token is renewed on the RM, just when the 
token may be cancelled.

Is this JIRA tracking an actual failure that occurred or a theoretical 

> Recover appTokenKeepAliveMap upon nodemanager restart
> -----------------------------------------------------
>                 Key: YARN-3449
>                 URL: https://issues.apache.org/jira/browse/YARN-3449
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.6.0, 2.7.0
>            Reporter: Junping Du
>            Assignee: Junping Du
> appTokenKeepAliveMap in NodeStatusUpdaterImpl is used to keep application 
> alive after application is finished but NM still need app token to do log 
> aggregation (when enable security and log aggregation). 
> The applications are only inserted into this map when receiving 
> getApplicationsToCleanup() from RM heartbeat response. And RM only send this 
> info one time in RMNodeImpl.updateNodeHeartbeatResponseForCleanup(). NM 
> restart work preserving should put appTokenKeepAliveMap into NMStateStore and 
> get recovered after restart. Without doing this, RM could terminate 
> application earlier, so log aggregation could be failed if security is 
> enabled.

This message was sent by Atlassian JIRA

Reply via email to