[
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368064#comment-14368064
]
Zhijie Shen commented on YARN-3040:
-----------------------------------
I've just uploaded a patch. It's an e2e modification to make the context
information can be passed from the client to the backend storage. The context
information includes *clusterId*, *userId*, *flowId*, *flowRunId* and *appId*.
According to YARN-3240, new TimelineClient is constructed per application, and
in the context of one application, we can reasonably assume this context
information should be unchanged. Therefore, they just need to be specified when
the client is constructed. The context information should be gathered or passed
to AM and NM to construct timeline client properly. For example, for AM, this
information can be passed via env inside CLC. Anyway, it's out of the scope of
this Jira, we will cover that integration once we make some particular
framework AM to use new timeline client.
Back to the context information, some of them can be null, and some of them
doesn't need to be specified explicitly:
* *clusterId*: The application should specify the a unique cluster ID, or by
default the cluster ID will be cluster_<start timestamp of RM>.
* *userId*: The user doesn't need to specify this information. Instead, it will
be obtained by the current ugi of the client.
* *flowId*: The user either pass in a flowID or if it is an orphan application,
the flowId will be the appId by replace the prefix with "flow".
* *flowRunId": If it is an orphan application, it's 0. The reason why it should
be 0 instead of a current timestamp when creating the timeline client is that
their may have multiple clients in AM and NMs to be constructed at different
time. They need to be synced on the same flowRunId.
* *appId*: It's the only mandatory context information as we defined before.
The client is constructed to only work with one application.
I changed the web service endpoint accordingly to make it restful, and change
the writer interface accordingly to pass in the context information when
putting the entity. In addition, I've modified the FS-based writer
implementation to reflect the change. The entity file will be put in the dir
{{root/entities/<clusterId>/<userId>/<flowId>/<flowRunId>/<appId>/<entityType>/<entityId>.thist}}.
It has been verified by TestDistributedShell and
TestFileSystemTimelineWriterImpl.
> [Data Model] Implement client-side API for handling flows
> ---------------------------------------------------------
>
> Key: YARN-3040
> URL: https://issues.apache.org/jira/browse/YARN-3040
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Sangjin Lee
> Assignee: Zhijie Shen
> Attachments: YARN-3040.1.patch
>
>
> Per design in YARN-2928, implement client-side API for handling *flows*.
> Frameworks should be able to define and pass in all attributes of flows and
> flow runs to YARN, and they should be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)