Hi all,

I’d like to open a discussion on merging the Timeline Service v.2 feature
to trunk (YARN-2928 and MAPREDUCE-6331) [1][2]. We have been developing the
feature in a feature branch (YARN-2928 [3]) for a while, and we are
reasonably confident that the state of the feature meets the criteria to be
merged onto trunk and we'd love folks to get their hands on it and provide
valuable feedback so that we can make it production-ready.

In a nutshell, Timeline Service v.2 delivers significant scalability and
usability improvements based on a new architecture. You can browse the
requirements/design doc, the storage schema doc, the new entity/data model,
the YARN documentation, and also discussions on subsequent milestones on
YARN-2928 [1].

What we would like to merge to trunk is termed "alpha 1" (milestone 1). The
feature has a complete end-to-end read/write flow, and you should be able
to start setting it up and testing it. At a high level, the following are
the key features that have been implemented:

- distributed writers (collectors) as NM aux services
- HBase storage
- new entity model that includes flows
- setting the flow context via YARN app tags
- real time metrics aggregation to the application level and the flow level
- rich REST API that supports filters, complex conditionals, limits,
content selection, etc.
- YARN generic events and system metrics
- integration with Distributed Shell and MapReduce

There are a total of 139 subtasks that were completed as part of this
effort.

We paid close attention to ensure that once disabled Timeline Service v.2
does not impact existing functionality when disabled (by default).

I'd like to call out a couple of things to discuss in particular.

*First*, if the merge vote is approved, to which branch should this be
merged and what would be the release version? My preference is that *it
would be merged to branch "trunk" and be part of 3.0.0-alpha1* if approved.
Since the 3.0.0-alpha1 is in active progress, I wanted to get your thoughts
on this.

*Second*, Timeline Service v.2 introduces a dependency on HBase from YARN.
It is not a cyclical dependency (as HBase does not really depend on YARN).
However, the version of Hadoop that HBase currently supports lags behind
the Hadoop version that Timeline Service is based on, so there is a
potential for subtle dependency conflicts. We made some efforts to isolate
the issue (see [4] and [5]). The HBase folks have also been responsive in
keeping up with the trunk as much as they can. Nonetheless, this is
something to keep in mind.

I would love to get your thoughts on these and more before we open a real
voting thread. Thanks!

Regards,
Sangjin

[1] YARN-2928: https://issues.apache.org/jira/browse/YARN-2928
[2] MAPREDUCE-6331: https://issues.apache.org/jira/browse/MAPREDUCE-6331
[3] YARN-2928 commits: https://github.com/apache/hadoop/commits/YARN-2928
[4] YARN-5045: https://issues.apache.org/jira/browse/YARN-5045
[5] YARN-5071: https://issues.apache.org/jira/browse/YARN-5071

Reply via email to