Li Lu commented on YARN-4224:

I've got some offline discussions with [~leftnoteasy] and [~vinodkv]. In 
general, the challenges are from two sides:
- For flows, flowruns, and timeline entities, their ids are a tuple, instead of 
a single "id". For example, a flow with name {{foo}} is actually identified 
through a tuple <cluster_name, user_name, {{foo}}>. 
- For ember or other front end frameworks, it's not encouraged to have a 
hierarchical object model. That is to say, the {{/cluster_id/user_id/flow_id}} 
way is not encouraged for the front end designs. 

Therefore, maybe a solution for this problem is to "flatten" the object model 
for the REST APIs. That is to say, on a flow endpoint /flow/flow_id, what we 
really want on the flow id part is a flattened tuple, or an "UID" of the flows 
for the REST APIs. In this way we can avoid introducing those super long URLs. 
For example, if we want to get the specific data of a flow (such as the list of 
flow runs) with id myWorkflow, we can GET on endpoint 
/flow/clusterId_userId_myWorkflow (underscores inside those ids can be encoded? 
). Afterwards, the entity we're getting back contains flow information, as well 
as a list of flow runs within this flow. The tricky part is, we need to 
directly return the UID of those flow runs, so that the front end can directly 
make the next REST API calls. Therefore, under this hierarchy, front end users 
can directly find the UIDs for all elements that immediately below the current 
element's level (cluster->user->flow->flowrun). 

The rest part of the problem is putting new objects through this interface, 
where user only know the tuple but does not know the UID. We do not want to 
expose the logic to form the "UID" to the front end users. At least, we need to 
take control of this logic on our server side so that we can easily change that 
in the future. Therefore, we may want to provide another endpoint (like "uid") 
which translates parameters into a UID, so that front end users can post to the 
right place. In this way the UID is a pure front end concept. Anything in our 
backend can still use the existing context object model. We will not expose 
this concept to the backend storage. 

So the action item for now seems to be:
1. the concrete endpoints for this model: we need clusters, users, flows, 
flowruns, applications, and entities in our model. Each one of them should 
become a separate end point, and each one of them take an UID as identifier. 
Right now we can only support GET for the web UI, but in future we can PUT data 
2. Batch query APIs for apps, as [~leftnoteasy] proposed. 
3. I just noticed we're using the term "flowId" in our codebase, which is 
actually not an identifier of a flow. I'll open a JIRA to change it to flowName 
to avoid confusion with the UIDs of flows. Although this is not confusing us in 
our codebase, I suspect this may confuse our API users. 

Any comments folks? [~leftnoteasy], [~vinodkv] anything I'm missing from our 
discussion? Thanks! 

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> ------------------------------------------------------------------------------------
>                 Key: YARN-4224
>                 URL: https://issues.apache.org/jira/browse/YARN-4224
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>         Attachments: YARN-4224-YARN-2928.01.patch

This message was sent by Atlassian JIRA

Reply via email to