In the past several months, the timeline service v.2 team has made a tremendous progress. We have a working storage implementation based on HBase, timeline collectors, timeline readers with some of the more important queries and filters, integration with RM and NM, distributed shell and mapreduce, and some basic UI to boot. We're getting real close to a complete end-to-end flow (no pun intended). Kudos to the team (cc'ed here)!
I think it is time to discuss defining a merge to trunk of an alpha-quality release as our first milestone so that a wider audience has a chance to try it out. This doesn't replace the timeline service (ATS) v.1 yet, but it would be a great chance to get feedback. I think the theme is essentially a basic but complete end-to-end flow that includes the write path and the read path and some UI. These are the key major things we may want to complete before we consider merging the first milestone: - application aggregation (YARN-3816 <https://issues.apache.org/jira/browse/YARN-3816>) - flow run compaction work (YARN-4062 <https://issues.apache.org/jira/browse/YARN-4062>) - finalize the metrics storage (YARN-4053 <https://issues.apache.org/jira/browse/YARN-4053>) - improve queries and filters (YARN-3863 <https://issues.apache.org/jira/browse/YARN-3863>) - UI POC based on the new YARN UI framework (YARN-4097 <https://issues.apache.org/jira/browse/YARN-4097>, YARN-4239 <https://issues.apache.org/jira/browse/YARN-4239>?) In addition to these, we would like to close a few more JIRAs that we're currently working on. Also, in terms of the app integration, we can debate whether we stick to the distributed shell for now or spend some more effort to round out the mapreduce support. I also think the following major things are probably out of scope for this first drop: - time-based (offline) user/queue aggregation based on Phoenix (YARN-3817 <https://issues.apache.org/jira/browse/YARN-3817>) - fault-tolerant storage (YARN-4061 <https://issues.apache.org/jira/browse/YARN-4061>) - timeline collector as a separate daemon (YARN-3033 <https://issues.apache.org/jira/browse/YARN-3033>) - timeline collector containerization - compatibility with v.1 (YARN-3196 <https://issues.apache.org/jira/browse/YARN-3196>, YARN-3865 <https://issues.apache.org/jira/browse/YARN-3865>) - support for off-cluster timeline clients (YARN-3981 <https://issues.apache.org/jira/browse/YARN-3981>) We should discuss whether we agree on the theme of the first milestone (mentioned above). Given that, then, we should discuss what makes it and what doesn't (basically the above 2 lists). We also should discuss the rough time frame to complete this. This email is to open the discussion. Your thoughts are welcome. Thanks! Sangjin
