Re: [DISCUSS] merging YARN-2928 (Timeline Service v.2) to trunk

J. Rottinghuis Tue, 21 Jun 2016 14:52:07 -0700

Thanks Karthik and Tsuyoshi for bringing up good points.

I've opened https://issues.apache.org/jira/browse/YARN-5281 to track this
discussion and capture all the merits and challenges in one single place.


Thanks,

Joep

On Tue, Jun 21, 2016 at 8:21 AM, Tsuyoshi Ozawa <[email protected]> wrote:

> Thanks Sangjin for starting the discussion.
>
> >> *First*, if the merge vote is approved, to which branch should this be
> merged and what would be the release version?
>
> As you mentioned, I think it's reasonable for us to target trunk and
> 3.0.0-alpha.
>
> >> Slightly unrelated to the merge, do we plan to support any other simpler
> backend for users to try out, in addition to HBase? LevelDB?
> > We can however, potentially change the Local File System based
> implementation to a HDFS based implementation and have it as an alternate
> for non-production use,
>
> In Apache Big Data 2016 NA, some users also mentioned that they need HDFS
> implementation. Currently it's pending, but I and Varun tried to work to
> support HDFS backend(YARN-3874). As Karthik mentioned, it's useful for
> early users to try v2.0 APIs though it's doesn't scale. IMHO, it's useful
> for small cluster(e.g. smaller than 10 machines). After merging the current
> implementation into trunk, I'm interested in resuming YARN-3874 work(maybe
> Varun is also interested in).
>
> Regards,
> - Tsuyoshi
>
> On Tue, Jun 21, 2016 at 5:07 PM, Varun saxena <[email protected]>
> wrote:
> > Thanks Karthik for sharing your views.
> >
> > With regards to merging, it would help to have clear documentation on how
> to setup and use ATS.
> > --> We do have documentation on this. You and others who are interested
> can check out YARN-5174 which is the latest documentation related JIRA for
> ATSv2.
> >
> > Slightly unrelated to the merge, do we plan to support any other simpler
> backend for users to try out, in addition to HBase? LevelDB?
> > --> We do have a File System based implementation but it is strictly for
> test purposes (as we write data into a local file). It does not support all
> the features of Timeline Service v.2 as well.
> > Regarding LevelDB, Timeline Service v.2 has distributed writers and Level
> DB writes data (log files or SSTable files) to local file system. This
> means there will be no easy way to have a LevelDB based implementation
> because we would not know where to read the data from, especially while
> fetching flow level information.
> > We can however, potentially change the Local File System based
> implementation to a HDFS based implementation and have it as an alternate
> for non-production use, if there is a potential need for it, based on
> community feedback. This however, would have to be further discussed with
> the team.
> >
> > Regards,
> > Varun Saxena.
> >
> > -----Original Message-----
> > From: Karthik Kambatla [mailto:[email protected]]
> > Sent: 21 June 2016 10:29
> > To: Sangjin Lee
> > Cc: [email protected]
> > Subject: Re: [DISCUSS] merging YARN-2928 (Timeline Service v.2) to trunk
> >
> > Firstly, thanks Sangjin and others for driving this major feature.
> >
> > Merging to trunk and including in 3.0.0-alpha1 seems reasonable, as it
> will give early access to downstream users.
> >
> > With regards to merging, it would help to have clear documentation on how
> to setup and use ATS.
> >
> > Slightly unrelated to the merge, do we plan to support any other simpler
> backend for users to try out, in addition to HBase? LevelDB? I understand
> this wouldn't scale, but would it help with initial adoption and feedback
> from early users?
> >
> >
> >
> >
> >
> > On Mon, Jun 20, 2016 at 10:26 AM, Sangjin Lee <[email protected]> wrote:
> >
> >> Hi all,
> >>
> >> I’d like to open a discussion on merging the Timeline Service v.2
> >> feature to trunk (YARN-2928 and MAPREDUCE-6331) [1][2]. We have been
> >> developing the feature in a feature branch (YARN-2928 [3]) for a
> >> while, and we are reasonably confident that the state of the feature
> >> meets the criteria to be merged onto trunk and we'd love folks to get
> >> their hands on it and provide valuable feedback so that we can make it
> production-ready.
> >>
> >> In a nutshell, Timeline Service v.2 delivers significant scalability
> >> and usability improvements based on a new architecture. You can browse
> >> the requirements/design doc, the storage schema doc, the new
> >> entity/data model, the YARN documentation, and also discussions on
> >> subsequent milestones on
> >> YARN-2928 [1].
> >>
> >> What we would like to merge to trunk is termed "alpha 1" (milestone
> >> 1). The feature has a complete end-to-end read/write flow, and you
> >> should be able to start setting it up and testing it. At a high level,
> >> the following are the key features that have been implemented:
> >>
> >> - distributed writers (collectors) as NM aux services
> >> - HBase storage
> >> - new entity model that includes flows
> >> - setting the flow context via YARN app tags
> >> - real time metrics aggregation to the application level and the flow
> >> level
> >> - rich REST API that supports filters, complex conditionals, limits,
> >> content selection, etc.
> >> - YARN generic events and system metrics
> >> - integration with Distributed Shell and MapReduce
> >>
> >> There are a total of 139 subtasks that were completed as part of this
> >> effort.
> >>
> >> We paid close attention to ensure that once disabled Timeline Service
> >> v.2 does not impact existing functionality when disabled (by default).
> >>
> >> I'd like to call out a couple of things to discuss in particular.
> >>
> >> *First*, if the merge vote is approved, to which branch should this be
> >> merged and what would be the release version? My preference is that
> >> *it would be merged to branch "trunk" and be part of 3.0.0-alpha1* if
> approved.
> >> Since the 3.0.0-alpha1 is in active progress, I wanted to get your
> >> thoughts on this.
> >>
> >> *Second*, Timeline Service v.2 introduces a dependency on HBase from
> YARN.
> >> It is not a cyclical dependency (as HBase does not really depend on
> YARN).
> >> However, the version of Hadoop that HBase currently supports lags
> >> behind the Hadoop version that Timeline Service is based on, so there
> >> is a potential for subtle dependency conflicts. We made some efforts
> >> to isolate the issue (see [4] and [5]). The HBase folks have also been
> >> responsive in keeping up with the trunk as much as they can.
> >> Nonetheless, this is something to keep in mind.
> >>
> >> I would love to get your thoughts on these and more before we open a
> >> real voting thread. Thanks!
> >>
> >> Regards,
> >> Sangjin
> >>
> >> [1] YARN-2928: https://issues.apache.org/jira/browse/YARN-2928
> >> [2] MAPREDUCE-6331:
> >> https://issues.apache.org/jira/browse/MAPREDUCE-6331
> >> [3] YARN-2928 commits:
> >> https://github.com/apache/hadoop/commits/YARN-2928
> >> [4] YARN-5045: https://issues.apache.org/jira/browse/YARN-5045
> >> [5] YARN-5071: https://issues.apache.org/jira/browse/YARN-5071
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>

Re: [DISCUSS] merging YARN-2928 (Timeline Service v.2) to trunk

Reply via email to