Sangjin Lee commented on YARN-2928:

Thanks [~zjshen] for putting it together! It looks good mostly. Some high level 

(1) Are "relates to" and "is related to" meant to capture the parent-child 

(Flow) run and application definitely have a parent-child relationship.

Now it's less clear between the flow and the flow run. One scenario that is 
definitely worth considering is a flow of flows, and that brings some 
complications to this.

Suppose you have an oozie flow that starts a pig script which in turn spawns 
multiple MR jobs. If flow is an entity and parent of the flow run, how to model 
this situation becomes more challenging. One idea might be

oozie flow -> oozie flow run -> pig flow -> pig flow run -> MR job

However, the oozie flow run is not really the parent of the pig flow. Rather, 
the oozie flow run is the parent of the pig flow run.

Another idea is not to have the flow as a separate entity but as metadata of 
the flow run entities. And that's actually what the design doc indicates (see 
sections 3.1.1. and 3.1.2).

Now one issue with not having the flow as an entity is that it might complicate 
the aggregation scenario. More on that later...

(3) Could we stick with the same terminology as in the design doc? Those are 
"flow" and "flow run". Thoughts? Better suggestions?

The part about the metrics would need to be further expanded with the metrics 
API JIRA, but I definitely see at least two types of metrics: one that requires 
a time series and another that doesn't. The former may be something like CPU, 
and the latter would be something like HDFS bytes written for example.

For the latter type, the only value that matters for a given metric is the 
latest value. And depending on which type, the way to implement the storage 
could be hugely different.

I think we need to come up with a well-defined set of metric types that cover 
most useful cases. Initially we said we were going to look at the existing 
hadoop metrics types, but we might need to come up with our own here.

The parent-child relationship (and therefore the necessity of making things 
entities) is tightly related with *aggregation* (rolling up the values from 
children to parent). The idea was that for parent-child entities aggregation 
would be done generically as part of creating/updating those entities (what we 
called "primary aggregation" in some discussion).

If cluster or user is not an entity, then there is no parent-child 
relationship, and aggregation from flows to user or cluster would have to be 
done explicitly outside the context of the parent-child relationship.

Of course that is doable; we could just do it as specific aggregation. Maybe 
that's what we need to do (and the queue-level aggregation which Robert 
mentioned could be treated in the same manner).

Either way, I think we should mention how the run/flow/user/cluster/queue 
aggregation would be done.

> Application Timeline Server (ATS) next gen: phase 1
> ---------------------------------------------------
>                 Key: YARN-2928
>                 URL: https://issues.apache.org/jira/browse/YARN-2928
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Critical
>         Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.

This message was sent by Atlassian JIRA

Reply via email to