[
https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485847#comment-15485847
]
Li Lu commented on YARN-3359:
-----------------------------
Thanks [~jdu]! Actually keeping the copies in the NMs will do the work. The
only challenge is when two or more then two collectors for the same application
got launched (because of some cluster partition, for example). Therefore the RM
needs to keep a version number for collectors, so that when rebuilding app to
collector mappings, it knows which collectors are stale and which one is
active.
bq. btw, app's attempt id shouldn't be used here as collector is designed to be
independent of AM lifecycle - it also means AM failed doesn't hint collector
need to be killed/restarted. Do I miss anything?
You're right. We should not do it...
> Recover collector list in RM failed over
> ----------------------------------------
>
> Key: YARN-3359
> URL: https://issues.apache.org/jira/browse/YARN-3359
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Junping Du
> Assignee: Li Lu
> Labels: YARN-5355
>
> Per discussion in YARN-3039, split the recover work from RMStateStore in a
> separated JIRA.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]