Thanks Karthik and Tsuyoshi. Regarding alternate implementations, I'd like to get a better sense of what you're thinking of. Are you interested in strictly a test implementation (e.g. perfectly fine in a single node setup) or a more substantial implementation (may not scale but needs to work in a more realistic setup)?
Regards, Sangjin On Tue, Jun 21, 2016 at 2:51 PM, J. Rottinghuis <[email protected]> wrote: > Thanks Karthik and Tsuyoshi for bringing up good points. > > I've opened https://issues.apache.org/jira/browse/YARN-5281 to track this > discussion and capture all the merits and challenges in one single place. > > Thanks, > > Joep > > On Tue, Jun 21, 2016 at 8:21 AM, Tsuyoshi Ozawa <[email protected]> wrote: > > > Thanks Sangjin for starting the discussion. > > > > >> *First*, if the merge vote is approved, to which branch should this be > > merged and what would be the release version? > > > > As you mentioned, I think it's reasonable for us to target trunk and > > 3.0.0-alpha. > > > > >> Slightly unrelated to the merge, do we plan to support any other > simpler > > backend for users to try out, in addition to HBase? LevelDB? > > > We can however, potentially change the Local File System based > > implementation to a HDFS based implementation and have it as an alternate > > for non-production use, > > > > In Apache Big Data 2016 NA, some users also mentioned that they need HDFS > > implementation. Currently it's pending, but I and Varun tried to work to > > support HDFS backend(YARN-3874). As Karthik mentioned, it's useful for > > early users to try v2.0 APIs though it's doesn't scale. IMHO, it's useful > > for small cluster(e.g. smaller than 10 machines). After merging the > current > > implementation into trunk, I'm interested in resuming YARN-3874 > work(maybe > > Varun is also interested in). > > > > Regards, > > - Tsuyoshi > > > > On Tue, Jun 21, 2016 at 5:07 PM, Varun saxena <[email protected]> > > wrote: > > > Thanks Karthik for sharing your views. > > > > > > With regards to merging, it would help to have clear documentation on > how > > to setup and use ATS. > > > --> We do have documentation on this. You and others who are interested > > can check out YARN-5174 which is the latest documentation related JIRA > for > > ATSv2. > > > > > > Slightly unrelated to the merge, do we plan to support any other > simpler > > backend for users to try out, in addition to HBase? LevelDB? > > > --> We do have a File System based implementation but it is strictly > for > > test purposes (as we write data into a local file). It does not support > all > > the features of Timeline Service v.2 as well. > > > Regarding LevelDB, Timeline Service v.2 has distributed writers and > Level > > DB writes data (log files or SSTable files) to local file system. This > > means there will be no easy way to have a LevelDB based implementation > > because we would not know where to read the data from, especially while > > fetching flow level information. > > > We can however, potentially change the Local File System based > > implementation to a HDFS based implementation and have it as an alternate > > for non-production use, if there is a potential need for it, based on > > community feedback. This however, would have to be further discussed with > > the team. > > > > > > Regards, > > > Varun Saxena. > > > > > > -----Original Message----- > > > From: Karthik Kambatla [mailto:[email protected]] > > > Sent: 21 June 2016 10:29 > > > To: Sangjin Lee > > > Cc: [email protected] > > > Subject: Re: [DISCUSS] merging YARN-2928 (Timeline Service v.2) to > trunk > > > > > > Firstly, thanks Sangjin and others for driving this major feature. > > > > > > Merging to trunk and including in 3.0.0-alpha1 seems reasonable, as it > > will give early access to downstream users. > > > > > > With regards to merging, it would help to have clear documentation on > how > > to setup and use ATS. > > > > > > Slightly unrelated to the merge, do we plan to support any other > simpler > > backend for users to try out, in addition to HBase? LevelDB? I understand > > this wouldn't scale, but would it help with initial adoption and feedback > > from early users? > > > > > > > > > > > > > > > > > > On Mon, Jun 20, 2016 at 10:26 AM, Sangjin Lee <[email protected]> > wrote: > > > > > >> Hi all, > > >> > > >> I’d like to open a discussion on merging the Timeline Service v.2 > > >> feature to trunk (YARN-2928 and MAPREDUCE-6331) [1][2]. We have been > > >> developing the feature in a feature branch (YARN-2928 [3]) for a > > >> while, and we are reasonably confident that the state of the feature > > >> meets the criteria to be merged onto trunk and we'd love folks to get > > >> their hands on it and provide valuable feedback so that we can make it > > production-ready. > > >> > > >> In a nutshell, Timeline Service v.2 delivers significant scalability > > >> and usability improvements based on a new architecture. You can browse > > >> the requirements/design doc, the storage schema doc, the new > > >> entity/data model, the YARN documentation, and also discussions on > > >> subsequent milestones on > > >> YARN-2928 [1]. > > >> > > >> What we would like to merge to trunk is termed "alpha 1" (milestone > > >> 1). The feature has a complete end-to-end read/write flow, and you > > >> should be able to start setting it up and testing it. At a high level, > > >> the following are the key features that have been implemented: > > >> > > >> - distributed writers (collectors) as NM aux services > > >> - HBase storage > > >> - new entity model that includes flows > > >> - setting the flow context via YARN app tags > > >> - real time metrics aggregation to the application level and the flow > > >> level > > >> - rich REST API that supports filters, complex conditionals, limits, > > >> content selection, etc. > > >> - YARN generic events and system metrics > > >> - integration with Distributed Shell and MapReduce > > >> > > >> There are a total of 139 subtasks that were completed as part of this > > >> effort. > > >> > > >> We paid close attention to ensure that once disabled Timeline Service > > >> v.2 does not impact existing functionality when disabled (by default). > > >> > > >> I'd like to call out a couple of things to discuss in particular. > > >> > > >> *First*, if the merge vote is approved, to which branch should this be > > >> merged and what would be the release version? My preference is that > > >> *it would be merged to branch "trunk" and be part of 3.0.0-alpha1* if > > approved. > > >> Since the 3.0.0-alpha1 is in active progress, I wanted to get your > > >> thoughts on this. > > >> > > >> *Second*, Timeline Service v.2 introduces a dependency on HBase from > > YARN. > > >> It is not a cyclical dependency (as HBase does not really depend on > > YARN). > > >> However, the version of Hadoop that HBase currently supports lags > > >> behind the Hadoop version that Timeline Service is based on, so there > > >> is a potential for subtle dependency conflicts. We made some efforts > > >> to isolate the issue (see [4] and [5]). The HBase folks have also been > > >> responsive in keeping up with the trunk as much as they can. > > >> Nonetheless, this is something to keep in mind. > > >> > > >> I would love to get your thoughts on these and more before we open a > > >> real voting thread. Thanks! > > >> > > >> Regards, > > >> Sangjin > > >> > > >> [1] YARN-2928: https://issues.apache.org/jira/browse/YARN-2928 > > >> [2] MAPREDUCE-6331: > > >> https://issues.apache.org/jira/browse/MAPREDUCE-6331 > > >> [3] YARN-2928 commits: > > >> https://github.com/apache/hadoop/commits/YARN-2928 > > >> [4] YARN-5045: https://issues.apache.org/jira/browse/YARN-5045 > > >> [5] YARN-5071: https://issues.apache.org/jira/browse/YARN-5071 > > >> > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > >
