In the past several months, the timeline service v.2 team has made a
tremendous progress. We have a working storage implementation based on
HBase, timeline collectors, timeline readers with some of the more
important queries and filters, integration with RM and NM, distributed
shell and mapreduce, and some basic UI to boot. We're getting real close to
a complete end-to-end flow (no pun intended). Kudos to the team (cc'ed
here)!

I think it is time to discuss defining a merge to trunk of an alpha-quality
release as our first milestone so that a wider audience has a chance to try
it out. This doesn't replace the timeline service (ATS) v.1 yet, but it
would be a great chance to get feedback.

I think the theme is essentially a basic but complete end-to-end flow that
includes the write path and the read path and some UI. These are the key
major things we may want to complete before we consider merging the first
milestone:
- application aggregation (YARN-3816
<https://issues.apache.org/jira/browse/YARN-3816>)
- flow run compaction work (YARN-4062
<https://issues.apache.org/jira/browse/YARN-4062>)
- finalize the metrics storage (YARN-4053
<https://issues.apache.org/jira/browse/YARN-4053>)
- improve queries and filters (YARN-3863
<https://issues.apache.org/jira/browse/YARN-3863>)
- UI POC based on the new YARN UI framework (YARN-4097
<https://issues.apache.org/jira/browse/YARN-4097>, YARN-4239
<https://issues.apache.org/jira/browse/YARN-4239>?)

In addition to these, we would like to close a few more JIRAs that we're
currently working on. Also, in terms of the app integration, we can debate
whether we stick to the distributed shell for now or spend some more effort
to round out the mapreduce support.

I also think the following major things are probably out of scope for this
first drop:
- time-based (offline) user/queue aggregation based on Phoenix (YARN-3817
<https://issues.apache.org/jira/browse/YARN-3817>)
- fault-tolerant storage (YARN-4061
<https://issues.apache.org/jira/browse/YARN-4061>)
- timeline collector as a separate daemon (YARN-3033
<https://issues.apache.org/jira/browse/YARN-3033>)
- timeline collector containerization
- compatibility with v.1 (YARN-3196
<https://issues.apache.org/jira/browse/YARN-3196>, YARN-3865
<https://issues.apache.org/jira/browse/YARN-3865>)
- support for off-cluster timeline clients (YARN-3981
<https://issues.apache.org/jira/browse/YARN-3981>)

We should discuss whether we agree on the theme of the first milestone
(mentioned above). Given that, then, we should discuss what makes it and
what doesn't (basically the above 2 lists).

We also should discuss the rough time frame to complete this. This email is
to open the discussion. Your thoughts are welcome. Thanks!

Sangjin

Reply via email to