Re: [DISCUSS] merging YARN-2928 (Timeline Service v.2) to trunk

Sangjin Lee Tue, 21 Jun 2016 15:52:05 -0700

Thanks Karthik and Tsuyoshi. Regarding alternate implementations, I'd like
to get a better sense of what you're thinking of. Are you interested in
strictly a test implementation (e.g. perfectly fine in a single node setup)
or a more substantial implementation (may not scale but needs to work in a
more realistic setup)?


Regards,
Sangjin

On Tue, Jun 21, 2016 at 2:51 PM, J. Rottinghuis <[email protected]>
wrote:

> Thanks Karthik and Tsuyoshi for bringing up good points.
>
> I've opened https://issues.apache.org/jira/browse/YARN-5281 to track this
> discussion and capture all the merits and challenges in one single place.
>
> Thanks,
>
> Joep
>
> On Tue, Jun 21, 2016 at 8:21 AM, Tsuyoshi Ozawa <[email protected]> wrote:
>
> > Thanks Sangjin for starting the discussion.
> >
> > >> *First*, if the merge vote is approved, to which branch should this be
> > merged and what would be the release version?
> >
> > As you mentioned, I think it's reasonable for us to target trunk and
> > 3.0.0-alpha.
> >
> > >> Slightly unrelated to the merge, do we plan to support any other
> simpler
> > backend for users to try out, in addition to HBase? LevelDB?
> > > We can however, potentially change the Local File System based
> > implementation to a HDFS based implementation and have it as an alternate
> > for non-production use,
> >
> > In Apache Big Data 2016 NA, some users also mentioned that they need HDFS
> > implementation. Currently it's pending, but I and Varun tried to work to
> > support HDFS backend(YARN-3874). As Karthik mentioned, it's useful for
> > early users to try v2.0 APIs though it's doesn't scale. IMHO, it's useful
> > for small cluster(e.g. smaller than 10 machines). After merging the
> current
> > implementation into trunk, I'm interested in resuming YARN-3874
> work(maybe
> > Varun is also interested in).
> >
> > Regards,
> > - Tsuyoshi
> >
> > On Tue, Jun 21, 2016 at 5:07 PM, Varun saxena <[email protected]>
> > wrote:
> > > Thanks Karthik for sharing your views.
> > >
> > > With regards to merging, it would help to have clear documentation on
> how
> > to setup and use ATS.
> > > --> We do have documentation on this. You and others who are interested
> > can check out YARN-5174 which is the latest documentation related JIRA
> for
> > ATSv2.
> > >
> > > Slightly unrelated to the merge, do we plan to support any other
> simpler
> > backend for users to try out, in addition to HBase? LevelDB?
> > > --> We do have a File System based implementation but it is strictly
> for
> > test purposes (as we write data into a local file). It does not support
> all
> > the features of Timeline Service v.2 as well.
> > > Regarding LevelDB, Timeline Service v.2 has distributed writers and
> Level
> > DB writes data (log files or SSTable files) to local file system. This
> > means there will be no easy way to have a LevelDB based implementation
> > because we would not know where to read the data from, especially while
> > fetching flow level information.
> > > We can however, potentially change the Local File System based
> > implementation to a HDFS based implementation and have it as an alternate
> > for non-production use, if there is a potential need for it, based on
> > community feedback. This however, would have to be further discussed with
> > the team.
> > >
> > > Regards,
> > > Varun Saxena.
> > >
> > > -----Original Message-----
> > > From: Karthik Kambatla [mailto:[email protected]]
> > > Sent: 21 June 2016 10:29
> > > To: Sangjin Lee
> > > Cc: [email protected]
> > > Subject: Re: [DISCUSS] merging YARN-2928 (Timeline Service v.2) to
> trunk
> > >
> > > Firstly, thanks Sangjin and others for driving this major feature.
> > >
> > > Merging to trunk and including in 3.0.0-alpha1 seems reasonable, as it
> > will give early access to downstream users.
> > >
> > > With regards to merging, it would help to have clear documentation on
> how
> > to setup and use ATS.
> > >
> > > Slightly unrelated to the merge, do we plan to support any other
> simpler
> > backend for users to try out, in addition to HBase? LevelDB? I understand
> > this wouldn't scale, but would it help with initial adoption and feedback
> > from early users?
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Jun 20, 2016 at 10:26 AM, Sangjin Lee <[email protected]>
> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I’d like to open a discussion on merging the Timeline Service v.2
> > >> feature to trunk (YARN-2928 and MAPREDUCE-6331) [1][2]. We have been
> > >> developing the feature in a feature branch (YARN-2928 [3]) for a
> > >> while, and we are reasonably confident that the state of the feature
> > >> meets the criteria to be merged onto trunk and we'd love folks to get
> > >> their hands on it and provide valuable feedback so that we can make it
> > production-ready.
> > >>
> > >> In a nutshell, Timeline Service v.2 delivers significant scalability
> > >> and usability improvements based on a new architecture. You can browse
> > >> the requirements/design doc, the storage schema doc, the new
> > >> entity/data model, the YARN documentation, and also discussions on
> > >> subsequent milestones on
> > >> YARN-2928 [1].
> > >>
> > >> What we would like to merge to trunk is termed "alpha 1" (milestone
> > >> 1). The feature has a complete end-to-end read/write flow, and you
> > >> should be able to start setting it up and testing it. At a high level,
> > >> the following are the key features that have been implemented:
> > >>
> > >> - distributed writers (collectors) as NM aux services
> > >> - HBase storage
> > >> - new entity model that includes flows
> > >> - setting the flow context via YARN app tags
> > >> - real time metrics aggregation to the application level and the flow
> > >> level
> > >> - rich REST API that supports filters, complex conditionals, limits,
> > >> content selection, etc.
> > >> - YARN generic events and system metrics
> > >> - integration with Distributed Shell and MapReduce
> > >>
> > >> There are a total of 139 subtasks that were completed as part of this
> > >> effort.
> > >>
> > >> We paid close attention to ensure that once disabled Timeline Service
> > >> v.2 does not impact existing functionality when disabled (by default).
> > >>
> > >> I'd like to call out a couple of things to discuss in particular.
> > >>
> > >> *First*, if the merge vote is approved, to which branch should this be
> > >> merged and what would be the release version? My preference is that
> > >> *it would be merged to branch "trunk" and be part of 3.0.0-alpha1* if
> > approved.
> > >> Since the 3.0.0-alpha1 is in active progress, I wanted to get your
> > >> thoughts on this.
> > >>
> > >> *Second*, Timeline Service v.2 introduces a dependency on HBase from
> > YARN.
> > >> It is not a cyclical dependency (as HBase does not really depend on
> > YARN).
> > >> However, the version of Hadoop that HBase currently supports lags
> > >> behind the Hadoop version that Timeline Service is based on, so there
> > >> is a potential for subtle dependency conflicts. We made some efforts
> > >> to isolate the issue (see [4] and [5]). The HBase folks have also been
> > >> responsive in keeping up with the trunk as much as they can.
> > >> Nonetheless, this is something to keep in mind.
> > >>
> > >> I would love to get your thoughts on these and more before we open a
> > >> real voting thread. Thanks!
> > >>
> > >> Regards,
> > >> Sangjin
> > >>
> > >> [1] YARN-2928: https://issues.apache.org/jira/browse/YARN-2928
> > >> [2] MAPREDUCE-6331:
> > >> https://issues.apache.org/jira/browse/MAPREDUCE-6331
> > >> [3] YARN-2928 commits:
> > >> https://github.com/apache/hadoop/commits/YARN-2928
> > >> [4] YARN-5045: https://issues.apache.org/jira/browse/YARN-5045
> > >> [5] YARN-5071: https://issues.apache.org/jira/browse/YARN-5071
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> >
>

Re: [DISCUSS] merging YARN-2928 (Timeline Service v.2) to trunk

Reply via email to