[jira] [Commented] (YARN-3359) Recover collector list in RM failed over

Li Lu (JIRA) Mon, 12 Sep 2016 18:06:38 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485847#comment-15485847
 ]


Li Lu commented on YARN-3359:
-----------------------------

Thanks [~jdu]! Actually keeping the copies in the NMs will do the work. The 
only challenge is when two or more then two collectors for the same application 
got launched (because of some cluster partition, for example). Therefore the RM 
needs to keep a version number for collectors, so that when rebuilding app to 
collector mappings, it knows which collectors are stale and which one is 
active. 

bq. btw, app's attempt id shouldn't be used here as collector is designed to be 
independent of AM lifecycle - it also means AM failed doesn't hint collector 
need to be killed/restarted. Do I miss anything?

You're right. We should not do it... 

> Recover collector list in RM failed over
> ----------------------------------------
>
>                 Key: YARN-3359
>                 URL: https://issues.apache.org/jira/browse/YARN-3359
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Junping Du
>            Assignee: Li Lu
>              Labels: YARN-5355
>
> Per discussion in YARN-3039, split the recover work from RMStateStore in a 
> separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-3359) Recover collector list in RM failed over

Reply via email to