Zhijie Shen updated YARN-3041:
    Attachment: YARN-3041.5.patch

Thanks for the feedback, Sangjin, Vrushali and Joep! We had an offline 
discussion. I updated the patch according to it. Here's the summary of the 
major changes:

1.  It is necessary to have both Flow and FlowRun in the taxonomy, as the 
concepts of them are most the same. FlowRun is more likely to model an 
individual flow instance of a number applications while Flow sounds like a the 
generic perspective of application organization, which may be nested multiple 
FlowRun instances. Hence, we just need to have FlowRun only, but rename FlowRun 
to Flow for simplicity.

2. To address the aggregation interval, which means we may want to query the 
aggregated information for a particular time window, I change TimelineMetric to 
have starttime and endtime attributes.

3. The types of the first class citizen entities are defined centrally as the 
enums, and the parent-child relationship is defined there too.

4. In the write path, queue is the string attribute of application while user 
is the string attribute of the flow, while we still have the entities of both 
to put the aggregated data at the reader side. One additional implication is 
that all the applications are going to be run by the same user of the parent 

5. Flow id is the composite: user@flow_name(or id)/version/run, which will 
uniquely identify a flow in the storage.

Joep has raised a great point of keeping the type generic to extend the data 
model beyond YARN, such as Mesos. I think we can think and discuss more around 
it, but let's file a separate Jira to tackle this direction.

Here, as mentioned above, let's try to get the first draft of data model in 
asap to unblock the aggregator and the reader work. Hopefully it makes sense to 
the folks here.

> [Data Model] create overall data objects of TS next gen
> -------------------------------------------------------
>                 Key: YARN-3041
>                 URL: https://issues.apache.org/jira/browse/YARN-3041
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Zhijie Shen
>         Attachments: Data_model_proposal_v2.pdf, YARN-3041.2.patch, 
> YARN-3041.3.patch, YARN-3041.4.patch, YARN-3041.5.patch, 
> YARN-3041.preliminary.001.patch
> Per design in YARN-2928, create the ATS entity and events API.
> Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, 
> flow, flow run, YARN app, ...).

This message was sent by Atlassian JIRA

Reply via email to