[
https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhijie Shen updated YARN-3041:
------------------------------
Attachment: YARN-3041.5.patch
Thanks for the feedback, Sangjin, Vrushali and Joep! We had an offline
discussion. I updated the patch according to it. Here's the summary of the
major changes:
1. It is necessary to have both Flow and FlowRun in the taxonomy, as the
concepts of them are most the same. FlowRun is more likely to model an
individual flow instance of a number applications while Flow sounds like a the
generic perspective of application organization, which may be nested multiple
FlowRun instances. Hence, we just need to have FlowRun only, but rename FlowRun
to Flow for simplicity.
2. To address the aggregation interval, which means we may want to query the
aggregated information for a particular time window, I change TimelineMetric to
have starttime and endtime attributes.
3. The types of the first class citizen entities are defined centrally as the
enums, and the parent-child relationship is defined there too.
4. In the write path, queue is the string attribute of application while user
is the string attribute of the flow, while we still have the entities of both
to put the aggregated data at the reader side. One additional implication is
that all the applications are going to be run by the same user of the parent
flow.
5. Flow id is the composite: user@flow_name(or id)/version/run, which will
uniquely identify a flow in the storage.
Joep has raised a great point of keeping the type generic to extend the data
model beyond YARN, such as Mesos. I think we can think and discuss more around
it, but let's file a separate Jira to tackle this direction.
Here, as mentioned above, let's try to get the first draft of data model in
asap to unblock the aggregator and the reader work. Hopefully it makes sense to
the folks here.
> [Data Model] create overall data objects of TS next gen
> -------------------------------------------------------
>
> Key: YARN-3041
> URL: https://issues.apache.org/jira/browse/YARN-3041
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Sangjin Lee
> Assignee: Zhijie Shen
> Attachments: Data_model_proposal_v2.pdf, YARN-3041.2.patch,
> YARN-3041.3.patch, YARN-3041.4.patch, YARN-3041.5.patch,
> YARN-3041.preliminary.001.patch
>
>
> Per design in YARN-2928, create the ATS entity and events API.
> Also, as part of this JIRA, create YARN system entities (e.g. cluster, user,
> flow, flow run, YARN app, ...).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)