[ 
https://issues.apache.org/jira/browse/YARN-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481606#comment-14481606
 ] 

Junping Du commented on YARN-3449:
----------------------------------

bq. Again when the NM re-registers it will report all active applications, and 
the RM will attempt to correct this on the next heartbeat. 
You are right, [~jlowe]. I think I could miss CLEANUP_APP would be resent in 
node reconnection (totally forget it for some strange reason). So that 
shouldn't be a problem. BTW, I didn't see any actual failure on this, so I will 
resolve it as invalid.

> Recover appTokenKeepAliveMap upon nodemanager restart
> -----------------------------------------------------
>
>                 Key: YARN-3449
>                 URL: https://issues.apache.org/jira/browse/YARN-3449
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.6.0, 2.7.0
>            Reporter: Junping Du
>            Assignee: Junping Du
>
> appTokenKeepAliveMap in NodeStatusUpdaterImpl is used to keep application 
> alive after application is finished but NM still need app token to do log 
> aggregation (when enable security and log aggregation). 
> The applications are only inserted into this map when receiving 
> getApplicationsToCleanup() from RM heartbeat response. And RM only send this 
> info one time in RMNodeImpl.updateNodeHeartbeatResponseForCleanup(). NM 
> restart work preserving should put appTokenKeepAliveMap into NMStateStore and 
> get recovered after restart. Without doing this, RM could terminate 
> application earlier, so log aggregation could be failed if security is 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to