Jason Lowe commented on YARN-3449:

While the NM is aggregating logs the application is still present in the state 
store, and the application should be recovered as still active after an NM 
restart.  The NM will then register with those applications listed as still 
active.  When the RM later tells the NM that those applications should be 
cleaned up, the applications should be added to the keep alive list as normal.  
Thus I think the appTokenKeepAliveMap state should already be recovered 
properly without explicitly persisting it -- or am I missing something?

> Recover appTokenKeepAliveMap upon nodemanager restart
> -----------------------------------------------------
>                 Key: YARN-3449
>                 URL: https://issues.apache.org/jira/browse/YARN-3449
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.6.0, 2.7.0
>            Reporter: Junping Du
>            Assignee: Junping Du
> appTokenKeepAliveMap in NodeStatusUpdaterImpl is used to keep application 
> alive after application is finished but NM still need app token to do log 
> aggregation (when enable security and log aggregation). 
> The applications are only inserted into this map when receiving 
> getApplicationsToCleanup() from RM heartbeat response. And RM only send this 
> info one time in RMNodeImpl.updateNodeHeartbeatResponseForCleanup(). NM 
> restart work preserving should put appTokenKeepAliveMap into NMStateStore and 
> get recovered after restart. Without doing this, RM could terminate 
> application earlier, so log aggregation could be failed if security is 
> enabled.

This message was sent by Atlassian JIRA

Reply via email to