Hi Junping, Thanks for your good suggestion.
> However, my concern to release it in 3.0.0-alpha (even as an alpha feature) > is we haven't provide any security support in ATS v2 yet. > Enabling this feature without understanding the risk here could be a disaster > to end-user (even in a test cluster). You're right. Can we document and clarify that it's still "alpha 1", and it doesn't have security features. I also think ATS 1.5 supports security features, so it's good for production - we should document it officially. Thanks, - Tsuyoshi On Thu, Jun 23, 2016 at 5:45 PM, 俊平堵 <[email protected]> wrote: > Big +1 on merging ATS-v2 to trunk. However, my concern to release it in > 3.0.0-alpha (even as an alpha feature) is we haven't provide any security > support in ATS v2 yet. Enabling this feature without understanding the risk > here could be a disaster to end-user (even in a test cluster). > > Kudos to everyone who contributes patches, include: Sangjin, Li, Vrushali, > Naga, Varun, Joep and Zhijie. > > Thanks, > > Junping > > 2016-06-23 13:32 GMT-07:00 Sangjin Lee <[email protected]>: >> >> Thanks folks for the good discussion! >> >> I'm going to keep it open for a few more days as I'd love to get feedback >> from more people. I am thinking of opening a voting thread right after the >> Hadoop Summit next week if there are no objections. Thanks! >> >> Regards, >> Sangjin >> >> On Tue, Jun 21, 2016 at 9:51 PM, Li Lu <[email protected]> wrote: >> >> > I agree that having non-Hbase impls may attract more potential users to >> > ATS. Actually I remember we do have some JIRAs for HDFS implementations. >> > With regard to aggregation, yes, if there are more options on storage >> > implementations we really need to find some ways to describe their >> > implications to different kinds of aggressions. >> > >> > +1 for the idea of some group chats! The break after the ATS talk may be >> > a >> > good candidate? >> > >> > Li Lu >> > >> > On Jun 21, 2016, at 21:28, Karthik Kambatla <[email protected]> wrote: >> > >> > The reasons for my asking about alternate implementations: (1) ease of >> > trying it out for Yarn devs and iteration for bug fixes, improvements >> > and >> > (2) ease of trying it for app-writers/users to figure out if they should >> > use the ATS. Again, personally, I don't see this as necessary for the >> > merge >> > itself, but more so for adoption. >> > >> > A test implementation would be enough for #1, and would partially >> > address >> > #2. A more substantial implementation would be nice, but I guess we need >> > to >> > look at the ROI to decide whether adding that is a good idea. >> > >> > On completeness, I agree. Further, for some backend implementations, it >> > is >> > possible that a particular aggregation/query might be possible but too >> > expensive to turn on. What are your thoughts on provisions for the admin >> > to >> > turn off some queries/aggregations? >> > >> > Orthogonal: is there interest here to catch up on ATS specifically one >> > of >> > the days? May be, during the breaks or after the sessions? >> > >> > On Tue, Jun 21, 2016 at 6:15 PM, Li Lu <[email protected]> wrote: >> > >> >> HDFS or other non-HBase implementations are very helpful. We didn’t >> >> focus >> >> on those implementations in the first milestone because we would like >> >> to >> >> have one working version as a starting point. We can certainly add more >> >> implementations when the feature gets more mature. >> >> >> >> This said, one of my concerns when building these storage >> >> implementations >> >> is “completeness”. We have added a lot of supports to data aggregation. >> >> As >> >> of today, part of the aggregation (flow run aggregation) may be >> >> performed >> >> as HBase coprocessors. When implementing comparable storage impls, it >> >> is >> >> worth noting that one may want to provide some equivalent things to >> >> perform >> >> those aggregations (to really make one implementation “complete >> >> enough”, >> >> or, “interchangeable” to the existing HBase impl). >> >> >> >> Li Lu >> >> > On Jun 21, 2016, at 15:51, Sangjin Lee <[email protected]> wrote: >> >> > >> >> > Thanks Karthik and Tsuyoshi. Regarding alternate implementations, I'd >> >> like >> >> > to get a better sense of what you're thinking of. Are you interested >> >> > in >> >> > strictly a test implementation (e.g. perfectly fine in a single node >> >> setup) >> >> > or a more substantial implementation (may not scale but needs to work >> >> in a >> >> > more realistic setup)? >> >> > >> >> > Regards, >> >> > Sangjin >> >> > >> >> > On Tue, Jun 21, 2016 at 2:51 PM, J. Rottinghuis >> >> > <[email protected] >> >> > >> >> > wrote: >> >> > >> >> >> Thanks Karthik and Tsuyoshi for bringing up good points. >> >> >> >> >> >> I've opened https://issues.apache.org/jira/browse/YARN-5281 to track >> >> this >> >> >> discussion and capture all the merits and challenges in one single >> >> place. >> >> >> >> >> >> Thanks, >> >> >> >> >> >> Joep >> >> >> >> >> >> On Tue, Jun 21, 2016 at 8:21 AM, Tsuyoshi Ozawa <[email protected]> >> >> wrote: >> >> >> >> >> >>> Thanks Sangjin for starting the discussion. >> >> >>> >> >> >>>>> *First*, if the merge vote is approved, to which branch should >> >> >>>>> this >> >> be >> >> >>> merged and what would be the release version? >> >> >>> >> >> >>> As you mentioned, I think it's reasonable for us to target trunk >> >> >>> and >> >> >>> 3.0.0-alpha. >> >> >>> >> >> >>>>> Slightly unrelated to the merge, do we plan to support any other >> >> >> simpler >> >> >>> backend for users to try out, in addition to HBase? LevelDB? >> >> >>>> We can however, potentially change the Local File System based >> >> >>> implementation to a HDFS based implementation and have it as an >> >> alternate >> >> >>> for non-production use, >> >> >>> >> >> >>> In Apache Big Data 2016 NA, some users also mentioned that they >> >> >>> need >> >> HDFS >> >> >>> implementation. Currently it's pending, but I and Varun tried to >> >> >>> work >> >> to >> >> >>> support HDFS backend(YARN-3874). As Karthik mentioned, it's useful >> >> >>> for >> >> >>> early users to try v2.0 APIs though it's doesn't scale. IMHO, it's >> >> useful >> >> >>> for small cluster(e.g. smaller than 10 machines). After merging the >> >> >> current >> >> >>> implementation into trunk, I'm interested in resuming YARN-3874 >> >> >> work(maybe >> >> >>> Varun is also interested in). >> >> >>> >> >> >>> Regards, >> >> >>> - Tsuyoshi >> >> >>> >> >> >>> On Tue, Jun 21, 2016 at 5:07 PM, Varun saxena < >> >> [email protected]> >> >> >>> wrote: >> >> >>>> Thanks Karthik for sharing your views. >> >> >>>> >> >> >>>> With regards to merging, it would help to have clear documentation >> >> >>>> on >> >> >> how >> >> >>> to setup and use ATS. >> >> >>>> --> We do have documentation on this. You and others who are >> >> interested >> >> >>> can check out YARN-5174 which is the latest documentation related >> >> >>> JIRA >> >> >> for >> >> >>> ATSv2. >> >> >>>> >> >> >>>> Slightly unrelated to the merge, do we plan to support any other >> >> >> simpler >> >> >>> backend for users to try out, in addition to HBase? LevelDB? >> >> >>>> --> We do have a File System based implementation but it is >> >> >>>> strictly >> >> >> for >> >> >>> test purposes (as we write data into a local file). It does not >> >> support >> >> >> all >> >> >>> the features of Timeline Service v.2 as well. >> >> >>>> Regarding LevelDB, Timeline Service v.2 has distributed writers >> >> >>>> and >> >> >> Level >> >> >>> DB writes data (log files or SSTable files) to local file system. >> >> >>> This >> >> >>> means there will be no easy way to have a LevelDB based >> >> >>> implementation >> >> >>> because we would not know where to read the data from, especially >> >> while >> >> >>> fetching flow level information. >> >> >>>> We can however, potentially change the Local File System based >> >> >>> implementation to a HDFS based implementation and have it as an >> >> alternate >> >> >>> for non-production use, if there is a potential need for it, based >> >> >>> on >> >> >>> community feedback. This however, would have to be further >> >> >>> discussed >> >> with >> >> >>> the team. >> >> >>>> >> >> >>>> Regards, >> >> >>>> Varun Saxena. >> >> >>>> >> >> >>>> -----Original Message----- >> >> >>>> From: Karthik Kambatla [mailto:[email protected]] >> >> >>>> Sent: 21 June 2016 10:29 >> >> >>>> To: Sangjin Lee >> >> >>>> Cc: [email protected] >> >> >>>> Subject: Re: [DISCUSS] merging YARN-2928 (Timeline Service v.2) to >> >> >> trunk >> >> >>>> >> >> >>>> Firstly, thanks Sangjin and others for driving this major feature. >> >> >>>> >> >> >>>> Merging to trunk and including in 3.0.0-alpha1 seems reasonable, >> >> >>>> as >> >> it >> >> >>> will give early access to downstream users. >> >> >>>> >> >> >>>> With regards to merging, it would help to have clear documentation >> >> >>>> on >> >> >> how >> >> >>> to setup and use ATS. >> >> >>>> >> >> >>>> Slightly unrelated to the merge, do we plan to support any other >> >> >> simpler >> >> >>> backend for users to try out, in addition to HBase? LevelDB? I >> >> understand >> >> >>> this wouldn't scale, but would it help with initial adoption and >> >> feedback >> >> >>> from early users? >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> On Mon, Jun 20, 2016 at 10:26 AM, Sangjin Lee <[email protected]> >> >> >> wrote: >> >> >>>> >> >> >>>>> Hi all, >> >> >>>>> >> >> >>>>> I’d like to open a discussion on merging the Timeline Service v.2 >> >> >>>>> feature to trunk (YARN-2928 and MAPREDUCE-6331) [1][2]. We have >> >> >>>>> been >> >> >>>>> developing the feature in a feature branch (YARN-2928 [3]) for a >> >> >>>>> while, and we are reasonably confident that the state of the >> >> >>>>> feature >> >> >>>>> meets the criteria to be merged onto trunk and we'd love folks to >> >> get >> >> >>>>> their hands on it and provide valuable feedback so that we can >> >> >>>>> make >> >> it >> >> >>> production-ready. >> >> >>>>> >> >> >>>>> In a nutshell, Timeline Service v.2 delivers significant >> >> >>>>> scalability >> >> >>>>> and usability improvements based on a new architecture. You can >> >> browse >> >> >>>>> the requirements/design doc, the storage schema doc, the new >> >> >>>>> entity/data model, the YARN documentation, and also discussions >> >> >>>>> on >> >> >>>>> subsequent milestones on >> >> >>>>> YARN-2928 [1]. >> >> >>>>> >> >> >>>>> What we would like to merge to trunk is termed "alpha 1" >> >> >>>>> (milestone >> >> >>>>> 1). The feature has a complete end-to-end read/write flow, and >> >> >>>>> you >> >> >>>>> should be able to start setting it up and testing it. At a high >> >> level, >> >> >>>>> the following are the key features that have been implemented: >> >> >>>>> >> >> >>>>> - distributed writers (collectors) as NM aux services >> >> >>>>> - HBase storage >> >> >>>>> - new entity model that includes flows >> >> >>>>> - setting the flow context via YARN app tags >> >> >>>>> - real time metrics aggregation to the application level and the >> >> flow >> >> >>>>> level >> >> >>>>> - rich REST API that supports filters, complex conditionals, >> >> >>>>> limits, >> >> >>>>> content selection, etc. >> >> >>>>> - YARN generic events and system metrics >> >> >>>>> - integration with Distributed Shell and MapReduce >> >> >>>>> >> >> >>>>> There are a total of 139 subtasks that were completed as part of >> >> this >> >> >>>>> effort. >> >> >>>>> >> >> >>>>> We paid close attention to ensure that once disabled Timeline >> >> Service >> >> >>>>> v.2 does not impact existing functionality when disabled (by >> >> default). >> >> >>>>> >> >> >>>>> I'd like to call out a couple of things to discuss in particular. >> >> >>>>> >> >> >>>>> *First*, if the merge vote is approved, to which branch should >> >> >>>>> this >> >> be >> >> >>>>> merged and what would be the release version? My preference is >> >> >>>>> that >> >> >>>>> *it would be merged to branch "trunk" and be part of >> >> >>>>> 3.0.0-alpha1* >> >> if >> >> >>> approved. >> >> >>>>> Since the 3.0.0-alpha1 is in active progress, I wanted to get >> >> >>>>> your >> >> >>>>> thoughts on this. >> >> >>>>> >> >> >>>>> *Second*, Timeline Service v.2 introduces a dependency on HBase >> >> >>>>> from >> >> >>> YARN. >> >> >>>>> It is not a cyclical dependency (as HBase does not really depend >> >> >>>>> on >> >> >>> YARN). >> >> >>>>> However, the version of Hadoop that HBase currently supports lags >> >> >>>>> behind the Hadoop version that Timeline Service is based on, so >> >> there >> >> >>>>> is a potential for subtle dependency conflicts. We made some >> >> >>>>> efforts >> >> >>>>> to isolate the issue (see [4] and [5]). The HBase folks have also >> >> been >> >> >>>>> responsive in keeping up with the trunk as much as they can. >> >> >>>>> Nonetheless, this is something to keep in mind. >> >> >>>>> >> >> >>>>> I would love to get your thoughts on these and more before we >> >> >>>>> open a >> >> >>>>> real voting thread. Thanks! >> >> >>>>> >> >> >>>>> Regards, >> >> >>>>> Sangjin >> >> >>>>> >> >> >>>>> [1] YARN-2928: https://issues.apache.org/jira/browse/YARN-2928 >> >> >>>>> [2] MAPREDUCE-6331: >> >> >>>>> https://issues.apache.org/jira/browse/MAPREDUCE-6331 >> >> >>>>> [3] YARN-2928 commits: >> >> >>>>> https://github.com/apache/hadoop/commits/YARN-2928 >> >> >>>>> [4] YARN-5045: https://issues.apache.org/jira/browse/YARN-5045 >> >> >>>>> [5] YARN-5071: https://issues.apache.org/jira/browse/YARN-5071 >> >> >>>>> >> >> >>>> >> >> >>>> >> >> >>>> --------------------------------------------------------------------- >> >> >>>> To unsubscribe, e-mail: [email protected] >> >> >>>> For additional commands, e-mail: [email protected] >> >> >>>> >> >> >>> >> >> >> >> >> >> >> >> > >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
