[
https://issues.apache.org/jira/browse/YARN-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481606#comment-14481606
]
Junping Du commented on YARN-3449:
----------------------------------
bq. Again when the NM re-registers it will report all active applications, and
the RM will attempt to correct this on the next heartbeat.
You are right, [~jlowe]. I think I could miss CLEANUP_APP would be resent in
node reconnection (totally forget it for some strange reason). So that
shouldn't be a problem. BTW, I didn't see any actual failure on this, so I will
resolve it as invalid.
> Recover appTokenKeepAliveMap upon nodemanager restart
> -----------------------------------------------------
>
> Key: YARN-3449
> URL: https://issues.apache.org/jira/browse/YARN-3449
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Affects Versions: 2.6.0, 2.7.0
> Reporter: Junping Du
> Assignee: Junping Du
>
> appTokenKeepAliveMap in NodeStatusUpdaterImpl is used to keep application
> alive after application is finished but NM still need app token to do log
> aggregation (when enable security and log aggregation).
> The applications are only inserted into this map when receiving
> getApplicationsToCleanup() from RM heartbeat response. And RM only send this
> info one time in RMNodeImpl.updateNodeHeartbeatResponseForCleanup(). NM
> restart work preserving should put appTokenKeepAliveMap into NMStateStore and
> get recovered after restart. Without doing this, RM could terminate
> application earlier, so log aggregation could be failed if security is
> enabled.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)