Li Lu commented on YARN-3040:

Hi [~zjshen], some quick thoughts...

bq. It sounds like each NM will need to have multiple timeline clients (one for 
each application).
bq. That's correct.
bq. The RM will have its own collector, and it does not go through the 
TimelineClient API. How would that work?
bq. RM will have all the above context info. When constructing and starting RM 
collector, we should make sure it be setup.

For both RM and NMs, they are posting predefined "application history info", 
but not "generic" (I'm trying to use the wording in ATS v1 but correct me if 
I'm wrong.). I'm thinking the if it's possible to have another client 
implement, based on our existing implement, that can handle multiple 
applications within the same client? It sounds not quite scalable if we have 
one client for each app in the RM...

bq. I thought version is part of flow id. I think we can revisit it once the 
schema is done, and we finalized the generic description about the flow 
structure and the notation. So far I'd like to keep it as what it is now. 

One most significant advantage to have run ids as integers is we can easily 
sort all existing runs for one flow in ascending or descending order. This 
might be a solid use case in general? 

bq. It makes sense, but when RM restarts we use the new start time of RM to 
identify the app instead of the one before. In current way, cluster_xyz will 
contain the application_xyz_123. This was my rationale before. And this default 
cluster id construction is only used in the case the user didn't specify the 
cluster id in config file. In production, user should specify one. I'll thought 
about the question again.

Mostly fine, but I have some concerns about rolling upgrades. With rolling 
upgrades, if we're not specifying cluster ids explicitly, applications that 
live across an upgrade will have two different primary keys. Even though we may 
merge this in our reader (which still sounds suboptimal), this may pose a 
challenge to our aggregators (data will be aggregated to two different entities 
across time). Any suggestions on this? 

> [Data Model] Make putEntities operation be aware of the app's context
> ---------------------------------------------------------------------
>                 Key: YARN-3040
>                 URL: https://issues.apache.org/jira/browse/YARN-3040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Zhijie Shen
>         Attachments: YARN-3040.1.patch, YARN-3040.2.patch
> Per design in YARN-2928, implement client-side API for handling *flows*. 
> Frameworks should be able to define and pass in all attributes of flows and 
> flow runs to YARN, and they should be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.

This message was sent by Atlassian JIRA

Reply via email to