[ 
https://issues.apache.org/jira/browse/YARN-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494688#comment-15494688
 ] 

Li Lu commented on YARN-5638:
-----------------------------

Updated the description of this JIRA. What we need here is not a new type of 
"collector id", but to store timestamp data in the RMs and NMs for the 
collectors. This can address the problem when we rebuild collector status for a 
new active rm, as discussed in YARN-3359: 

bq. when one application has two different attempts running (due to some 
network problems, for example) and the RM is trying to rebuild collector 
status, the RM needs to know which collector is for the latest app attempt and 
which one is for the stale attempt.

We do not necessarily need to associate collectors to application attempts. 
Actually, according to timeline server v2 design, we should only associate app 
collectors to applications. However, when maintaining collector data in RMs and 
NMs, we can store the timestamp of each collector. In this way, when the RM 
needs to rebuild collector status, it can gather all known collector data from 
NMs, use the timestamp to decide the most recent state of the collectors, and 
then rebuild all states. 

> Introduce a collector timestamp to uniquely identify collectors creation 
> order in collector discovery
> -----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-5638
>                 URL: https://issues.apache.org/jira/browse/YARN-5638
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Li Lu
>
> As discussed in YARN-3359, we need to further identify timeline collectors' 
> creation order to rebuild collector discovery data in the RM. This JIRA 
> proposes to use <rm_timestamp, logical_version_number> to order collectors 
> for each application in the RM. This timestamp can then be used when a 
> standby RM becomes active and rebuild collector discovery data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to