RE: Apache Hadoop 2.8.3 Release Plan
Thanks Andrew for the comments. Yes, if we're "strictly" following the "maintenance release" practice, that'd be great and it's never my intent to overload it and cause mess. >> If we're struggling with being able to deliver new features in a safe and >> timely fashion, let's try to address that... This is interesting. Do you aware any means to do that? Thanks! Regards, Kai -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, November 21, 2017 2:22 PM To: Zheng, Kai <kai.zh...@intel.com> Cc: Junping Du <j...@hortonworks.com>; common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Apache Hadoop 2.8.3 Release Plan I'm against including new features in maintenance releases, since they're meant to be bug-fix only. If we're struggling with being able to deliver new features in a safe and timely fashion, let's try to address that, not overload the meaning of "maintenance release". Best, Andrew On Mon, Nov 20, 2017 at 5:20 PM, Zheng, Kai <kai.zh...@intel.com> wrote: > Hi Junping, > > Thank you for making 2.8.2 happen and now planning the 2.8.3 release. > > I have an ask, is it convenient to include the back port work for OSS > connector module? We have some Hadoop users that wish to have it by > default for convenience, though in the past they used it by back > porting themselves. I have raised this and got thoughts from Chris and > Steve. Looks like this is more wanted for 2.9 but I wanted to ask > again here for broad feedback and thoughts by this chance. The back > port patch is available for > 2.8 and the one for branch-2 was already in. IMO, 2.8.x is promising > as we can see some shift from 2.7.x, hence it's worth more important > features and efforts. How would you think? Thanks! > > https://issues.apache.org/jira/browse/HADOOP-14964 > > Regards, > Kai > > -Original Message- > From: Junping Du [mailto:j...@hortonworks.com] > Sent: Tuesday, November 14, 2017 9:02 AM > To: common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org; > mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org > Subject: Apache Hadoop 2.8.3 Release Plan > > Hi, > We have several important fixes get landed on branch-2.8 and I > would like to cut off branch-2.8.3 now to start 2.8.3 release work. > So far, I don't see any pending blockers on 2.8.3, so my current > plan is to cut off 1st RC of 2.8.3 in next several days: > - For all coming commits to land on branch-2.8, please mark > the fix version as 2.8.4. > - If there is a really important fix for 2.8.3 and getting > closed, please notify me ahead before landing it on branch-2.8.3. > Please let me know if you have any thoughts or comments on the plan. > > Thanks, > > Junping > > From: dujunp...@gmail.com <dujunp...@gmail.com> on behalf of 俊平堵 < > junping...@apache.org> > Sent: Friday, October 27, 2017 3:33 PM > To: gene...@hadoop.apache.org > Subject: [ANNOUNCE] Apache Hadoop 2.8.2 Release. > > Hi all, > > It gives me great pleasure to announce that the Apache Hadoop > community has voted to release Apache Hadoop 2.8.2, which is now > available for download from Apache mirrors[1]. For download > instructions please refer to the Apache Hadoop Release page [2]. > > Apache Hadoop 2.8.2 is the first GA release of Apache Hadoop 2.8 line > and our newest stable release for entire Apache Hadoop project. For > major changes incuded in Hadoop 2.8 line, please refer Hadoop 2.8.2 main > page[3]. > > This release has 315 resolved issues since previous 2.8.1 release with > following > breakdown: >- 91 in Hadoop Common >- 99 in HDFS >- 105 in YARN >- 20 in MapReduce > Please read the log of CHANGES[4] and RELEASENOTES[5] for more details. > > The release news is posted on the Hadoop website too, you can go to > the downloads section directly [6]. > > Thank you all for contributing to the Apache Hadoop release! > > > Cheers, > > Junping > > > [1] http://www.apache.org/dyn/closer.cgi/hadoop/common > > [2] http://hadoop.apache.org/releases.html > > [3] http://hadoop.apache.org/docs/r2.8.2/index.html > > [4] > http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/ > hadoop-common/release/2.8.2/CHANGES.2.8.2.html > > [5] > http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/ > hadoop-common/release/2.8.2/RELEASENOTES.2.8.2.html > > [6] http://hadoop.apache.org/releases.html#Download > > > - > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > - > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > >
RE: Apache Hadoop 2.8.3 Release Plan
Hi Junping, Thank you for making 2.8.2 happen and now planning the 2.8.3 release. I have an ask, is it convenient to include the back port work for OSS connector module? We have some Hadoop users that wish to have it by default for convenience, though in the past they used it by back porting themselves. I have raised this and got thoughts from Chris and Steve. Looks like this is more wanted for 2.9 but I wanted to ask again here for broad feedback and thoughts by this chance. The back port patch is available for 2.8 and the one for branch-2 was already in. IMO, 2.8.x is promising as we can see some shift from 2.7.x, hence it's worth more important features and efforts. How would you think? Thanks! https://issues.apache.org/jira/browse/HADOOP-14964 Regards, Kai -Original Message- From: Junping Du [mailto:j...@hortonworks.com] Sent: Tuesday, November 14, 2017 9:02 AM To: common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Apache Hadoop 2.8.3 Release Plan Hi, We have several important fixes get landed on branch-2.8 and I would like to cut off branch-2.8.3 now to start 2.8.3 release work. So far, I don't see any pending blockers on 2.8.3, so my current plan is to cut off 1st RC of 2.8.3 in next several days: - For all coming commits to land on branch-2.8, please mark the fix version as 2.8.4. - If there is a really important fix for 2.8.3 and getting closed, please notify me ahead before landing it on branch-2.8.3. Please let me know if you have any thoughts or comments on the plan. Thanks, Junping From: dujunp...@gmail.comon behalf of 俊平堵 Sent: Friday, October 27, 2017 3:33 PM To: gene...@hadoop.apache.org Subject: [ANNOUNCE] Apache Hadoop 2.8.2 Release. Hi all, It gives me great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 2.8.2, which is now available for download from Apache mirrors[1]. For download instructions please refer to the Apache Hadoop Release page [2]. Apache Hadoop 2.8.2 is the first GA release of Apache Hadoop 2.8 line and our newest stable release for entire Apache Hadoop project. For major changes incuded in Hadoop 2.8 line, please refer Hadoop 2.8.2 main page[3]. This release has 315 resolved issues since previous 2.8.1 release with following breakdown: - 91 in Hadoop Common - 99 in HDFS - 105 in YARN - 20 in MapReduce Please read the log of CHANGES[4] and RELEASENOTES[5] for more details. The release news is posted on the Hadoop website too, you can go to the downloads section directly [6]. Thank you all for contributing to the Apache Hadoop release! Cheers, Junping [1] http://www.apache.org/dyn/closer.cgi/hadoop/common [2] http://hadoop.apache.org/releases.html [3] http://hadoop.apache.org/docs/r2.8.2/index.html [4] http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-common/release/2.8.2/CHANGES.2.8.2.html [5] http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-common/release/2.8.2/RELEASENOTES.2.8.2.html [6] http://hadoop.apache.org/releases.html#Download - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
RE: Backporting OSS module to branch 2.x
>> We did not allow a backport of ADLS to branch-2.7 when it was released in >> 2.8.0. There were technical reasons-... Ok, I'm clear now branch-2.7 is already in maintenance mode and allows none of new features to be included. >> Moreover, one should be able to use a jar compiled for 2.9 in a 2.7 cluster, >> so the value of releasing this module with 2.7.5 or 2.8.3 is questionable. This sounds a good suggestion as a workaround for 2.7. For 2.8, as I'm still wondering if 2.8.3 is the last 2.8 release or not. If it is, I agree; otherwise, putting it in branch-2.8 and releasing it along with some other nice things in 2.8.4 would still be desirable. I'm thinking 2.8 releases would be the next one of popular favorites after 2.7 in line with 3.x. It could be too early to stop it, sure that also depends on potential interests and takings as you said in previous emails. Very likely I'm missed in the full picture but I want to catch up so that help in the future. >> Did anyone raise the Aliyun OSS backport during the 2.9.0 release >> discussion? I don't recall seeing it in the wiki or in any thread on the >> topic, but I may well have missed it. Since the vote on RC3 closes on Friday >> and looks likely to pass, this is very late to propose a new feature. Please >> raise this on the 2.9 release thread, so we can figure out how to handle it. Indeed not yet. Yes it looks rather late as we could see RC3 is being voted and goes fine. My idea is to put the work in branch-2.9 first and expect some new release after the 2.9.0 one. Sure let me raise it on the 2.9 release thread when it's the right time. Thanks Chris again for the education and the thoughts. Regards, Kai -Original Message- From: Zheng, Kai [mailto:kai.zh...@intel.com] Sent: Thursday, November 16, 2017 10:18 AM To: common-dev@hadoop.apache.org Subject: Backporting OSS module to branch 2.x There was some discussion about backporting OSS module to branch 2.x and per Chris's suggestion we should do it in the dev list. -Original Message- From: Chris Douglas [mailto:cdoug...@apache.org] Sent: Thursday, November 16, 2017 1:20 AM To: Zheng, Kai <kai.zh...@intel.com<mailto:kai.zh...@intel.com>> Cc: Junping Du <j...@hortonworks.com<mailto:j...@hortonworks.com>>; Konstantin Shvachko <shv.had...@gmail.com<mailto:shv.had...@gmail.com>>; s...@apache.org<mailto:s...@apache.org>; Jason Lowe <jl...@oath.com<mailto:jl...@oath.com>>; Steve Loughran <steve.lough...@gmail.com<mailto:steve.lough...@gmail.com>>; Jonathan Hung <jyhung2...@gmail.com<mailto:jyhung2...@gmail.com>>; Arun Suresh <asur...@apache.org<mailto:asur...@apache.org>>; Vinod Kumar Vavilapalli <vino...@apache.org<mailto:vino...@apache.org>>; secur...@hadoop.apache.org<mailto:secur...@hadoop.apache.org> Subject: Re: Potential security issue of XXE in Hadoop We should move this part of the thread back to the dev list. On Wed, Nov 15, 2017 at 2:33 AM, Zheng, Kai <kai.zh...@intel.com<mailto:kai.zh...@intel.com>> wrote: > We have some wish to backport Ali OSS support for some releases based on > 2.7/2.8/2.9. So per the discussion 2.9 should be fine; for 2.7 and 2.8, as we > haven't cut the 2.7.5 and 2.8.3 yet, I'm hoping we could still be able to do > that. We Intel folks would like to do some taking like the testing and > verifying. The backport work is tracked in [1] and currently Steve has some > concerns for 2.7 and 2.8, we're working the best to solve the concerns, > basically we'd avoid any package change (like httpclient) and make the > changes self-contained just in the Hadoop oss connector module. The backport > patches will be available soon. We did not allow a backport of ADLS to branch-2.7 when it was released in 2.8.0. There were technical reasons- new dependencies could conflict with existing 2.7 client code, patch releases would release at a slower cadence, etc.- but popularity of an older release is not a sufficient reason to change our version policy on features. We tried to get away with that in 0.16 (and a few other times) and it's never gone well. Moreover, one should be able to use a jar compiled for 2.9 in a 2.7 cluster, so the value of releasing this module with 2.7.5 or 2.8.3 is questionable. Did anyone raise the Aliyun OSS backport during the 2.9.0 release discussion? I don't recall seeing it in the wiki or in any thread on the topic, but I may well have missed it. Since the vote on RC3 closes on Friday and looks likely to pass, this is very late to propose a new feature. Please raise this on the 2.9 release thread, so we can figure out how to handle it. Version numbers are cheap, but cutting 2.10 only to include this module will create an annoying maintenance burden for a low payoff. Correspondi
Backporting OSS module to branch 2.x
There was some discussion about backporting OSS module to branch 2.x and per Chris's suggestion we should do it in the dev list. -Original Message- From: Chris Douglas [mailto:cdoug...@apache.org] Sent: Thursday, November 16, 2017 1:20 AM To: Zheng, Kai <kai.zh...@intel.com<mailto:kai.zh...@intel.com>> Cc: Junping Du <j...@hortonworks.com<mailto:j...@hortonworks.com>>; Konstantin Shvachko <shv.had...@gmail.com<mailto:shv.had...@gmail.com>>; s...@apache.org<mailto:s...@apache.org>; Jason Lowe <jl...@oath.com<mailto:jl...@oath.com>>; Steve Loughran <steve.lough...@gmail.com<mailto:steve.lough...@gmail.com>>; Jonathan Hung <jyhung2...@gmail.com<mailto:jyhung2...@gmail.com>>; Arun Suresh <asur...@apache.org<mailto:asur...@apache.org>>; Vinod Kumar Vavilapalli <vino...@apache.org<mailto:vino...@apache.org>>; secur...@hadoop.apache.org<mailto:secur...@hadoop.apache.org> Subject: Re: Potential security issue of XXE in Hadoop We should move this part of the thread back to the dev list. On Wed, Nov 15, 2017 at 2:33 AM, Zheng, Kai <kai.zh...@intel.com<mailto:kai.zh...@intel.com>> wrote: > We have some wish to backport Ali OSS support for some releases based on > 2.7/2.8/2.9. So per the discussion 2.9 should be fine; for 2.7 and 2.8, as we > haven't cut the 2.7.5 and 2.8.3 yet, I'm hoping we could still be able to do > that. We Intel folks would like to do some taking like the testing and > verifying. The backport work is tracked in [1] and currently Steve has some > concerns for 2.7 and 2.8, we're working the best to solve the concerns, > basically we'd avoid any package change (like httpclient) and make the > changes self-contained just in the Hadoop oss connector module. The backport > patches will be available soon. We did not allow a backport of ADLS to branch-2.7 when it was released in 2.8.0. There were technical reasons- new dependencies could conflict with existing 2.7 client code, patch releases would release at a slower cadence, etc.- but popularity of an older release is not a sufficient reason to change our version policy on features. We tried to get away with that in 0.16 (and a few other times) and it's never gone well. Moreover, one should be able to use a jar compiled for 2.9 in a 2.7 cluster, so the value of releasing this module with 2.7.5 or 2.8.3 is questionable. Did anyone raise the Aliyun OSS backport during the 2.9.0 release discussion? I don't recall seeing it in the wiki or in any thread on the topic, but I may well have missed it. Since the vote on RC3 closes on Friday and looks likely to pass, this is very late to propose a new feature. Please raise this on the 2.9 release thread, so we can figure out how to handle it. Version numbers are cheap, but cutting 2.10 only to include this module will create an annoying maintenance burden for a low payoff. Correspondingly, a 2.9.1 release with "only a few" new features is a repeat of history we should avoid. -C > @Konstantin, would you let me know when you'd cut the 2.7.5 release? Sounds > good to have the oss backport work? Note the module has been in trunk for > quite some time and the codes have been production exercised. Is there > anything we could take and help with? Our pleasure to do. Thanks! > > @Junping, for 2.8.3, my similar ask and we would also help. > > [1] https://issues.apache.org/jira/browse/HADOOP-14964 > > Regards, > Kai >
RE: [VOTE] Merge yarn-native-services branch into trunk
Cool to have this feature! Thanks Jian and all. Regards, Kai -Original Message- From: Vinod Kumar Vavilapalli [mailto:vino...@apache.org] Sent: Tuesday, November 07, 2017 7:20 AM To: Jian HeCc: yarn-...@hadoop.apache.org; common-dev@hadoop.apache.org; Hdfs-dev ; mapreduce-...@hadoop.apache.org Subject: Re: [VOTE] Merge yarn-native-services branch into trunk Congratulations to all the contributors involved, this is a great step forward! +Vinod > On Nov 6, 2017, at 2:40 PM, Jian He wrote: > > Okay, I just merged the branch to trunk (108 commits in total !) > Again, thanks for all who contributed to this feature! > > Jian > > On Nov 6, 2017, at 1:26 PM, Jian He > > wrote: > > Here’s +1 from myself. > The vote passes with 7 (+1) bindings and 2 (+1) non-bindings. > > Thanks for all who voted. I’ll merge to trunk by the end of today. > > Jian > > On Nov 6, 2017, at 8:38 AM, Billie Rinaldi > > wrote: > > +1 (binding) > > On Mon, Oct 30, 2017 at 1:06 PM, Jian He > > wrote: > Hi All, > > I would like to restart the vote for merging yarn-native-services to trunk. > Since last vote, we have been working on several issues in documentation, > DNS, CLI modifications etc. We believe now the feature is in a much better > shape. > > Some back ground: > At a high level, the following are the key feautres implemented. > - YARN-5079[1]. A native YARN framework (ApplicationMaster) to orchestrate > existing services to YARN either docker or non-docker based. > - YARN-4793[2]. A Rest API service embeded in RM (optional) for user > to deploy a service via a simple JSON spec > - YARN-4757[3]. Extending today's service registry with a simple DNS > service to enable users to discover services deployed on YARN via > standard DNS lookup > - YARN-6419[4]. UI support for native-services on the new YARN UI All > these new services are optional and are sitting outside of the existing > system, and have no impact on existing system if disabled. > > Special thanks to a team of folks who worked hard towards this: Billie > Rinaldi, Gour Saha, Vinod Kumar Vavilapalli, Jonathan Maron, Rohith Sharma K > S, Sunil G, Akhil PB, Eric Yang. This effort could not be possible without > their ideas and hard work. > Also thanks Allen for some review and verifications. > > Thanks, > Jian > > [1] https://issues.apache.org/jira/browse/YARN-5079 > [2] https://issues.apache.org/jira/browse/YARN-4793 > [3] https://issues.apache.org/jira/browse/YARN-4757 > [4] https://issues.apache.org/jira/browse/YARN-6419 > > > - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
RE: [DISCUSS] A final minor release off branch-2?
Thanks Vinod. >> Of the top of my head, one of the biggest areas is application >> compatibility. When folks move from 2.x to 3.x, are their apps binary >> compatible? Source compatible? Or need changes? I thought these are good concerns from overall perspective. On the other hand, I've discussed with quite a few 3.0 potential users, it looks like most of them are interested in the erasure coding feature and a major scenario for that is to back up their large volume of data to save storage cost. They might run analytics workload using Hive, Spark, Impala and Kylin on the new cluster based on the version, but it's not a must at the first time. They understand there might be some gaps so they'd migrate their workloads incrementally. For the major analytics workload, we've performed lots of benchmark and integration tests as well as other sides I believe, we did find some issues but they should be fixed in downstream projects. I thought the release of GA will accelerate the progress and expose the issues if any. We couldn't wait for it being matured. There isn't perfectness. >> The main goal of the bridging release is to ease transition on stuff that is >> guaranteed to be broken. This sounds a good consideration. I'm thinking if I'm a Hadoop user, for example, I'm using 2.7.4 or 2.8.2 or whatever 2.x version, would I first upgrade to this bridging release then use the bridge support to upgrade to 3.x version? I'm not sure. On the other hand, I might tend to look for some guides or supports in 3.x docs about how to upgrade from 2.7 to 3.x. Frankly speaking, working on some bridging release not targeting any feature isn't so attractive to me as a contributor. Overall, the final minor release off branch-2 is good, we should also give 3.x more time to evolve and mature, therefore it looks to me we would have to work on two release lines meanwhile for some time. I'd like option C), and suggest we focus on the recent releases. Just some thoughts. Regards, Kai -Original Message- From: Vinod Kumar Vavilapalli [mailto:vino...@apache.org] Sent: Tuesday, November 07, 2017 9:43 AM To: Andrew WangCc: Arun Suresh ; common-dev@hadoop.apache.org; yarn-...@hadoop.apache.org; Hdfs-dev ; mapreduce-...@hadoop.apache.org Subject: Re: [DISCUSS] A final minor release off branch-2? The main goal of the bridging release is to ease transition on stuff that is guaranteed to be broken. Of the top of my head, one of the biggest areas is application compatibility. When folks move from 2.x to 3.x, are their apps binary compatible? Source compatible? Or need changes? In 1.x -> 2.x upgrade, we did a bunch of work to atleast make old apps be source compatible. This means relooking at the API compatibility in 3.x and their impact of migrating applications. We will have to revist and un-deprecate old APIs, un-delete old APIs and write documentation on how apps can be migrated. Most of this work will be in 3.x line. The bridging release on the other hand will have deprecation for APIs that cannot be undeleted. This may be already have been done in many places. But we need to make sure and fill gaps if any. Other areas that I can recall from the old days - Config migration: Many configs are deprecated or deleted. We need documentation to help folks to move. We also need deprecations in the bridging release for configs that cannot be undeleted. - You mentioned rolling-upgrades: It will be good to exactly outline the type of testing. For e.g., the rolling-upgrades orchestration order has direct implication on the testing done. - Story for downgrades? - Copying data between 2.x clusters and 3.x clusters: Does this work already? Is it broken anywhere that we cannot fix? Do we need bridging features for this work? +Vinod > On Nov 6, 2017, at 12:49 PM, Andrew Wang wrote: > > What are the known gaps that need bridging between 2.x and 3.x? > > From an HDFS perspective, we've tested wire compat, rolling upgrade, > and rollback. > > From a YARN perspective, we've tested wire compat and rolling upgrade. > Arun just mentioned an NM rollback issue that I'm not familiar with. > > Anything else? External to this discussion, these should be documented > as known issues for 3.0. > > Best. > Andrew > > On Sun, Nov 5, 2017 at 1:46 PM, Arun Suresh wrote: > >> Thanks for starting this discussion VInod. >> >> I agree (C) is a bad idea. >> I would prefer (A) given that ATM, branch-2 is still very close to >> branch-2.9 - and it is a good time to make a collective decision to >> lock down commits to branch-2. >> >> I think we should also clearly define what the 'bridging' release >> should be. >> I assume it means the following: >> * Any 2.x user wanting to move to 3.x must first upgrade to the >> bridging release first and then upgrade to the 3.x release. >> * With
RE: [VOTE] HADOOP-12756 - Aliyun OSS Support branch merge
I'd like to conclude this vote. +1 bindings: Lei Xu, Kai Zheng, Gangumalla, Uma +1: Hao Cheng No 0 and -1 votes. The VOTE passed. I will merge the branch into trunk accordingly. Thanks for the time reviewing the work and casting your votes. Also thanks Steve for providing the review comments in HADOOP-12756 and Genmao Yu for addressing them, doing the complete "hadoop fs" tests. Regards, Kai -Original Message----- From: Zheng, Kai [mailto:kai.zh...@intel.com] Sent: Wednesday, September 28, 2016 10:35 AM To: common-dev@hadoop.apache.org Subject: [VOTE] HADOOP-12756 - Aliyun OSS Support branch merge Hi all, I would like to propose a merge vote for HADOOP-12756 branch to trunk. This branch develops support for Aliyun OSS (another cloud storage) in Hadoop. The voting starts now and will run for 7 days till Oct 5, 2016 07:00 PM PDT. Aliyun OSS is widely used among China's cloud users, and currently it is not easy to access data in Aliyun OSS from Hadoop. The branch develops a new module hadoop-aliyun and provides support for accessing data in Aliyun OSS cloud storage, which will enable more use cases and bring better use experience for Hadoop users. Like the existing s3a support, AliyunOSSFileSystem a new implementation of FileSystem backed by Aliyun OSS is provided. During the implementation, the contributors refer to the s3a support, keeping the same coding style and project structure. . The updated architecture document is here. [https://issues.apache.org/jira/secure/attachment/12829541/Aliyun-OSS-integration-v2.pdf] . The merge patch that is a diff against trunk is posted here, which builds cleanly with manual testing results posted in HADOOP-13584. [https://issues.apache.org/jira/secure/attachment/12829738/HADOOP-13584.004.patch] . The user documentation is also provided as part of the module. HADOOP-12756 has a set of sub-tasks and they are ordered in the same sequence as they were committed to HADOOP-12756. Hopefully this will make it easier for reviewing. What I want to emphasize is: this is a fundamental implementation aiming at guaranteeing functionality and stability. The major functionality has been running in production environments for some while. There're definitely performance optimizations that we can do like the community have done for the existing s3a and azure supports. Merging this to trunk would serve as a very good beginning for the following optimizations aligning with the related efforts together. The new hadoop-aliyun modlue is made possible owing to many people. Thanks to the contributors Mingfei Shi, Genmao Yu and Ling Zhou; thanks to Cheng Hao, Steve Loughran, Chris Nauroth, Yi Liu, Lei (Eddy) Xu, Uma Maheswara Rao G and Allen Wittenauer for their kind reviewing and guidance. Also thanks Arpit Agarwal, Andrew Wang and Anu Engineer for the great process discussions to bring this up. Please kindly vote. Thanks in advance! Regards, Kai - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
RE: [VOTE] HADOOP-12756 - Aliyun OSS Support branch merge
Could I extend this a bit longer considering the PRC holiday (during Oct 1 and Oct 7)? If sounds good I'd like to have another week (next Wednesday) for this. Please advise if you'd like to think otherwise. Thanks. Regards, Kai -Original Message- From: Zheng, Kai [mailto:kai.zh...@intel.com] Sent: Wednesday, September 28, 2016 10:35 AM To: common-dev@hadoop.apache.org Subject: [VOTE] HADOOP-12756 - Aliyun OSS Support branch merge Hi all, I would like to propose a merge vote for HADOOP-12756 branch to trunk. This branch develops support for Aliyun OSS (another cloud storage) in Hadoop. The voting starts now and will run for 7 days till Oct 5, 2016 07:00 PM PDT. Aliyun OSS is widely used among China's cloud users, and currently it is not easy to access data in Aliyun OSS from Hadoop. The branch develops a new module hadoop-aliyun and provides support for accessing data in Aliyun OSS cloud storage, which will enable more use cases and bring better use experience for Hadoop users. Like the existing s3a support, AliyunOSSFileSystem a new implementation of FileSystem backed by Aliyun OSS is provided. During the implementation, the contributors refer to the s3a support, keeping the same coding style and project structure. . The updated architecture document is here. [https://issues.apache.org/jira/secure/attachment/12829541/Aliyun-OSS-integration-v2.pdf] . The merge patch that is a diff against trunk is posted here, which builds cleanly with manual testing results posted in HADOOP-13584. [https://issues.apache.org/jira/secure/attachment/12829738/HADOOP-13584.004.patch] . The user documentation is also provided as part of the module. HADOOP-12756 has a set of sub-tasks and they are ordered in the same sequence as they were committed to HADOOP-12756. Hopefully this will make it easier for reviewing. What I want to emphasize is: this is a fundamental implementation aiming at guaranteeing functionality and stability. The major functionality has been running in production environments for some while. There're definitely performance optimizations that we can do like the community have done for the existing s3a and azure supports. Merging this to trunk would serve as a very good beginning for the following optimizations aligning with the related efforts together. The new hadoop-aliyun modlue is made possible owing to many people. Thanks to the contributors Mingfei Shi, Genmao Yu and Ling Zhou; thanks to Cheng Hao, Steve Loughran, Chris Nauroth, Yi Liu, Lei (Eddy) Xu, Uma Maheswara Rao G and Allen Wittenauer for their kind reviewing and guidance. Also thanks Arpit Agarwal, Andrew Wang and Anu Engineer for the great process discussions to bring this up. Please kindly vote. Thanks in advance! Regards, Kai - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
RE: desc error on official site http://hadoop.apache.org/
Thank you for the catch. This should go to the common-dev mailing list. Would you fire an issue to fix this? Regards, Kai -Original Message- From: 444...@qq.com [mailto:444...@qq.com] Sent: Wednesday, September 28, 2016 9:10 AM To: generalSubject: desc error on official site http://hadoop.apache.org/ It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. ===> It is designed to scale up from single server to thousands of machines, each offering local computation and storage. - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[VOTE] HADOOP-12756 - Aliyun OSS Support branch merge
Hi all, I would like to propose a merge vote for HADOOP-12756 branch to trunk. This branch develops support for Aliyun OSS (another cloud storage) in Hadoop. The voting starts now and will run for 7 days till Oct 5, 2016 07:00 PM PDT. Aliyun OSS is widely used among China's cloud users, and currently it is not easy to access data in Aliyun OSS from Hadoop. The branch develops a new module hadoop-aliyun and provides support for accessing data in Aliyun OSS cloud storage, which will enable more use cases and bring better use experience for Hadoop users. Like the existing s3a support, AliyunOSSFileSystem a new implementation of FileSystem backed by Aliyun OSS is provided. During the implementation, the contributors refer to the s3a support, keeping the same coding style and project structure. . The updated architecture document is here. [https://issues.apache.org/jira/secure/attachment/12829541/Aliyun-OSS-integration-v2.pdf] . The merge patch that is a diff against trunk is posted here, which builds cleanly with manual testing results posted in HADOOP-13584. [https://issues.apache.org/jira/secure/attachment/12829738/HADOOP-13584.004.patch] . The user documentation is also provided as part of the module. HADOOP-12756 has a set of sub-tasks and they are ordered in the same sequence as they were committed to HADOOP-12756. Hopefully this will make it easier for reviewing. What I want to emphasize is: this is a fundamental implementation aiming at guaranteeing functionality and stability. The major functionality has been running in production environments for some while. There're definitely performance optimizations that we can do like the community have done for the existing s3a and azure supports. Merging this to trunk would serve as a very good beginning for the following optimizations aligning with the related efforts together. The new hadoop-aliyun modlue is made possible owing to many people. Thanks to the contributors Mingfei Shi, Genmao Yu and Ling Zhou; thanks to Cheng Hao, Steve Loughran, Chris Nauroth, Yi Liu, Lei (Eddy) Xu, Uma Maheswara Rao G and Allen Wittenauer for their kind reviewing and guidance. Also thanks Arpit Agarwal, Andrew Wang and Anu Engineer for the great process discussions to bring this up. Please kindly vote. Thanks in advance! Regards, Kai
RE: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0
Thanks Sammi. My non-binding +1 to make the release candidate. Regards, Kai -Original Message- From: Chen, Sammi Sent: Friday, September 02, 2016 4:59 PM To: Zheng, Kai <kai.zh...@intel.com>; Andrew Wang <andrew.w...@cloudera.com>; Arun Suresh <asur...@apache.org> Cc: common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org; Chen, Sammi <sammi.c...@intel.com> Subject: RE: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0 +1 (non-binding). Thanks for driving this Andrew! * Download and built from source. * Setup a 10 node cluster (1 name node + 9 data nodes) * Verified normal HDFS file put/get operation with 3x replication * With 2 data nodes failure, verified HDFS file put/get operation with 3x replication, file integrity is OK * Enable Erasure Code policy "RS-DEFAULT-6-3-64k", verified HDFS file put/get operation * Enable Erasure Code policy "RS-DEFAULT-6-3-64k", with 3 data nodes failure, verified HDFS file put/get operation, file integrity is OK Cheers -Sammi -Original Message- From: Zheng, Kai Sent: Friday, September 02, 2016 3:25 PM To: Chen, Sammi Subject: FW: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0 Hi Sammi, Could you help provide our feedback? I know you did lots of tests. Thanks! Regards, Kai -Original Message- From: Arun Suresh [mailto:asur...@apache.org] Sent: Friday, September 02, 2016 11:33 AM To: Andrew Wang <andrew.w...@cloudera.com> Cc: common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0 +1 (binding). Thanks for driving this Andrew.. * Download and built from source. * Setup a 5 mode cluster. * Verified that MR works with opportunistic containers * Verified that the AMRMClient supports 'allocationRequestId' Cheers -Arun On Thu, Sep 1, 2016 at 4:31 PM, Aaron Fabbri <fab...@cloudera.com> wrote: > +1, non-binding. > > I built everything on OS X and ran the s3a contract tests successfully: > > mvn test -Dtest=org.apache.hadoop.fs.contract.s3a.\* > > ... > > Results : > > > Tests run: 78, Failures: 0, Errors: 0, Skipped: 1 > > > [INFO] > -- > -- > > [INFO] BUILD SUCCESS > > [INFO] > -- > -- > > On Thu, Sep 1, 2016 at 3:39 PM, Andrew Wang <andrew.w...@cloudera.com> > wrote: > > > Good point Allen, I forgot about `hadoop version`. Since it's > > populated > by > > a version-info.properties file, people can always cat that file. > > > > On Thu, Sep 1, 2016 at 3:21 PM, Allen Wittenauer < > a...@effectivemachines.com > > > > > wrote: > > > > > > > > > On Sep 1, 2016, at 3:18 PM, Allen Wittenauer < > a...@effectivemachines.com > > > > > > wrote: > > > > > > > > > > > >> On Sep 1, 2016, at 2:57 PM, Andrew Wang > > > >> <andrew.w...@cloudera.com> > > > wrote: > > > >> > > > >> Steve requested a git hash for this release. This led us into a > brief > > > >> discussion of our use of git tags, wherein we realized that > > > >> although release tags are immutable (start with "rel/"), RC tags are > > > >> not. > This > > is > > > >> based on the HowToRelease instructions. > > > > > > > > We should probably embed the git hash in one of the files > > > > that > > > gets gpg signed. That's an easy change to create-release. > > > > > > > > > (Well, one more easily accessible than 'hadoop version') > > > - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
RE: [DISCUSS] The order of classpath isolation work and updating/shading dependencies on trunk
For the leveldb thing, wouldn't we have an alternative option in Java for the platforms where leveldb isn't supported yet due to whatever reasons. IMO, native library would be best to be used for optimization and production for performance. For development and pure Java platform, by default pure Java approach should still be provided and used. That is to say, if no Hadoop native is used, all the functionalities should still work and not break. HDFS erasure coding goes in the way. For that, we spent much effort in developing an ISA-L compatible erasure coder in pure Java that's used by default, though for performance the ISA-L native one is recommended in production deployment. Regards, Kai -Original Message- From: Allen Wittenauer [mailto:a...@effectivemachines.com] Sent: Saturday, July 23, 2016 8:16 AM To: Sangjin LeeCc: Sean Busbey ; common-dev@hadoop.apache.org; yarn-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org Subject: Re: [DISCUSS] The order of classpath isolation work and updating/shading dependencies on trunk But if I don't use ApplicationClassLoader, my java app is basically screwed then, right? Also: right now, the non-Linux and/or non-x86 platforms have to supply their own leveldbjni jar (or at least the C level library?) in order to make YARN even functional. How is that going to work with the class path manipulation? > On Jul 22, 2016, at 9:57 AM, Sangjin Lee wrote: > > The work on HADOOP-13070 and the ApplicationClassLoader are generic and go > beyond YARN. It can be used in any JVM that uses hadoop. The current use > cases are MR containers, hadoop's RunJar (as in "hadoop jar"), and the YARN > node manager auxiliary services. I'm not sure if that's what you were asking, > but I hope it helps. > > Regards, > Sangjin > > On Fri, Jul 22, 2016 at 9:16 AM, Sean Busbey wrote: > My work on HADOOP-11804 *only* helps processes that sit outside of > YARN. :) > > On Fri, Jul 22, 2016 at 10:48 AM, Allen Wittenauer > wrote: > > > > Does any of this work actually help processes that sit outside of YARN? > > > >> On Jul 21, 2016, at 12:29 PM, Sean Busbey wrote: > >> > >> thanks for bringing this up! big +1 on upgrading dependencies for 3.0. > >> > >> I have an updated patch for HADOOP-11804 ready to post this week. > >> I've been updating HBase's master branch to try to make use of it, > >> but could use some other reviews. > >> > >> On Thu, Jul 21, 2016 at 4:30 AM, Tsuyoshi Ozawa wrote: > >>> Hi developers, > >>> > >>> I'd like to discuss how to make an advance towards dependency > >>> management in Apache Hadoop trunk code since there has been lots > >>> work about updating dependencies in parallel. Summarizing recent > >>> works and activities as follows: > >>> > >>> 0) Currently, we have merged minimum update dependencies for > >>> making Hadoop JDK-8 compatible(compilable and runnable on JDK-8). > >>> 1) After that, some people suggest that we should update the other > >>> dependencies on trunk(e.g. protobuf, netty, jackthon etc.). > >>> 2) In parallel, Sangjin and Sean are working on classpath isolation: > >>> HADOOP-13070, HADOOP-11804 and HADOOP-11656. > >>> > >>> Main problems we try to solve in the activities above is as follows: > >>> > >>> * 1) tries to solve dependency hell between user-level jar and > >>> system(Hadoop)-level jar. > >>> * 2) tries to solve updating old libraries. > >>> > >>> IIUC, 1) and 2) looks not related, but it's related in fact. 2) > >>> tries to separate class loader between client-side dependencies > >>> and server-side dependencies in Hadoop, so we can the change > >>> policy of updating libraries after doing 2). We can also decide > >>> which libraries can be shaded after 2). > >>> > >>> Hence, IMHO, a straight way we should go to is doing 2 at first. > >>> After that, we can update both client-side and server-side > >>> dependencies based on new policy(maybe we should discuss what kind > >>> of incompatibility is acceptable, and the others are not). > >>> > >>> Thoughts? > >>> > >>> Thanks, > >>> - Tsuyoshi > >>> > >>> -- > >>> --- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > >>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > >>> > >> > >> > >> > >> -- > >> busbey > >> > >> --- > >> -- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >> > > > > > > > > - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > > > > > > --
RE: Setting JIRA fix versions for 3.0.0 releases
My humble feeling is almost the same regarding the urgent need of a 3.0 alpha release. Considering EC, shell-script rewriting and etc. are significant changes and there are interested users that want to evaluate EC storage method, an alpha 3.0 release will definitely help a lot allowing users to try the new features and then expose critical bugs or gaps. This may take quite some time, and should be very important to build confidence preparing for a solid 3.0 release. I understand Vinod's concern and the need of lining up the release efforts, so if it's to work on multiple 2.x releases it should be avoided. Mentioning 3.0 alpha, it's different and the best would be to allow parallel going to speed up EC and the like, meanwhile any 2.x release won't be in a hurry pushed by 3.0 release. Thanks for any consideration. Regards, Kai -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Friday, July 22, 2016 3:33 AM To: Vinod Kumar VavilapalliCc: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Setting JIRA fix versions for 3.0.0 releases I really, really want a 3.0.0-alpha1 ASAP, since it's basically impossible for downstreams to test incompat changes and new features without a release artifact. I've been doing test builds, and branch-3.0.0-alpha1 is ready for an RC besides possibly this fix version issue. I'm not too worried about splitting community bandwidth, for the following reasons: * 3.0.0-alpha1 is very explicitly an alpha, which means no quality or compatibility guarantees. It needs less vetting than a 2.x release. * Given that 3.0.0 is still in alpha, there aren't many true showstopper bugs. Most blockers I see are also apply to both 2.x as well as 3.0.0. * Community bandwidth isn't zero-sum. This particularly applies to people working on features that are only present in trunk, like EC, shell script rewrite, etc. Longer-term, I assume the 2.x line is not ending with 2.8. So we'd still have the issue of things committed for 2.9.0 that will be appearing for the first time in 3.0.0-alpha1. Assuming a script exists to fix up 2.9 JIRAs, it's only incrementally more work to also fix up 2.8 and other unreleased versions too. Best, Andrew On Thu, Jul 21, 2016 at 11:53 AM, Vinod Kumar Vavilapalli < vino...@apache.org> wrote: > The L & N fixes just went out, I’m working to push out 2.7.3 - running > into a Nexus issue. Once that goes out, I’ll immediately do a 2.8.0. > > Like I requested before in one of the 3.x threads, can we just line up > 3.0.0-alpha1 right behind 2.8.0? > > That simplifies most of this confusion, we can avoid splitting the > bandwidth from the community on fixing blockers / vetting these > concurrent releases. Waiting a little more for 3.0.0 alpha to avoid > most of this is worth it, IMO. > > Thanks > +Vinod > > > On Jul 21, 2016, at 11:34 AM, Andrew Wang > wrote: > > > > Hi all, > > > > Since we're planning to spin releases off of both branch-2 and > > trunk, the changelog for 3.0.0-alpha1 based on JIRA information > > isn't accurate. This is because historically, we've only set 2.x fix > > versions, and 2.8.0 and > > 2.9.0 and etc have not been released. So there's a whole bunch of > > changes which will show up for the first time in 3.0.0-alpha1. > > > > I think I can write a script to (carefully) add 3.0.0-alpha1 to > > these JIRAs, but I figured I'd give a heads up here in case anyone > > felt differently. I can also update the HowToCommit page to match. > > > > Thanks, > > Andrew > > - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
RE: Not being able to add HDFS contributors
Thanks Andrew for the work around!! It works great ... Regards, Kai -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Wednesday, July 20, 2016 8:10 AM To: Chris Douglas <chris.doug...@gmail.com> Cc: Zheng, Kai <kai.zh...@intel.com>; common-dev@hadoop.apache.org Subject: Re: Not being able to add HDFS contributors What works for me is pasting in the JIRA userID, checking "ignore popups from this page" to quash the browser alerts, and then hitting the "update" button. What's broken is the username auto-complete, actually saving works fine. On Tue, Jul 19, 2016 at 5:08 PM, Chris Douglas <chris.doug...@gmail.com> wrote: > I had the same problem. Infra was able to add them, but I kept getting > an error. -C > > On Tue, Jul 19, 2016 at 2:29 PM, Zheng, Kai <kai.zh...@intel.com> wrote: > > Hi, > > > > I tried many times in the week at different time but just found it's > > not > possible to add more HDFS contributors. I can add some Hadoop ones, though. > It becomes an issue because without adding someone and assigning > issues to him first, he won't be able to work on it and upload patches ... > > > > Could anyone help look at this? Thx a lot! > > > > Regards, > > Kai > > > > - > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >
Not being able to add HDFS contributors
Hi, I tried many times in the week at different time but just found it's not possible to add more HDFS contributors. I can add some Hadoop ones, though. It becomes an issue because without adding someone and assigning issues to him first, he won't be able to work on it and upload patches ... Could anyone help look at this? Thx a lot! Regards, Kai
Clean up target/fix versions
Hi, I noticed it's pretty hard to opt a version (say 3.0-alpha1) in the fix/target version box in the JIRA system since the list is pretty long and not well sorted. Could we clean it up or resort/re-list them in order for the most possibly used ones to be displayed first. Regards, Kai
RE: Different JIRA permissions for HADOOP and HDFS
Yeah, this would be great, so some guys like me won't need to trouble you asking the question again and again :). Thanks a lot. Regards, Kai -Original Message- From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp] Sent: Monday, June 20, 2016 3:17 PM To: Zheng, Kai <kai.zh...@intel.com>; common-dev@hadoop.apache.org Subject: Re: Different JIRA permissions for HADOOP and HDFS There is no doc. 1. Login to ASF JIRA 2. Go to the project page (e.g. https://issues.apache.org/jira/browse/HADOOP ) 3. Hit "Administration" tab 4. Hit "Roles" tab in left side 5. Add administrators/committers/contributors I'll document this in https://wiki.apache.org/hadoop/HowToCommit Regards, Akira On 6/20/16 16:08, Zheng, Kai wrote: > Thanks Akira for the nice info. So where is the link to do it or any how to > doc? Sorry I browsed the existing wiki doc but didn't find how to add > contributors. > > Regards, > Kai > > -Original Message- > From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp] > Sent: Monday, June 20, 2016 12:22 PM > To: Zheng, Kai <kai.zh...@intel.com>; common-dev@hadoop.apache.org > Subject: Re: Different JIRA permissions for HADOOP and HDFS > > Yes, the role allows committers to add/remove all the roles. > > Now about 400 accounts have contributors roles in Hadoop common, and about > 1000 contributors in history. > > Regards, > Akira > > On 6/19/16 19:43, Zheng, Kai wrote: >> Thanks Akira for the work. >> >> What the committer role can do in addition to the committing codes? Can the >> role allow to add/remove a contributor? As I said in my last email, I want >> to have some contributor(s) back and may add more in some time later. >> >> Not sure if we need to clean up long time no active contributors. It may be >> nice to know how many contributors the project has in its history. If the >> list is too long, maybe we can put them in another list, like >> OLD_CONTRIBUTORS. >> >> Regards, >> Kai >> >> -Original Message- >> From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp] >> Sent: Saturday, June 18, 2016 12:56 PM >> To: Zheng, Kai <kai.zh...@intel.com>; common-dev@hadoop.apache.org >> Subject: Re: Different JIRA permissions for HADOOP and HDFS >> >> I'm doing the following steps to reduce the number of contributors: >> >> 1. Find committers who have only contributor role 2. Add them into >> committer role 3. Remove them from contributor role >> >> However, this is a temporary solution. >> Probably we need to do one of the followings in the near future. >> >> * Create contributor2 role to increase the limit >> * Remove contributors who have not been active for a long time >> >> Regards, >> Akira >> >> On 6/18/16 10:24, Zheng, Kai wrote: >>> Hi Akira, >>> >>> Some contributors (not committer) I know were found lost and we can't >>> assign tasks to. Any way I can add them or have to trouble others for that >>> each time when there is a new one? Thanks! >>> >>> Regards, >>> Kai >>> >>> -Original Message- >>> From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp] >>> Sent: Monday, June 06, 2016 12:47 AM >>> To: common-dev@hadoop.apache.org >>> Subject: Re: Different JIRA permissions for HADOOP and HDFS >>> >>> Now I can't add any more contributors in HADOOP Common, so I'll remove the >>> contributors who have committers role to make the group smaller. >>> Please tell me if you have lost your roles by mistake. >>> >>> Regards, >>> Akira >>> >>> On 5/18/16 13:48, Akira AJISAKA wrote: >>>> In HADOOP/HDFS/MAPREDUCE/YARN, I removed the administrators from >>>> contributor group. After that, added Varun into contributor roles. >>>> # Ray is already added into contributor roles. >>>> >>>> Hi contributors/committers, please tell me if you have lost your >>>> roles by mistake. >>>> >>>>> just remove a big chunk of the committers from all the lists >>>> In Apache Hadoop Project bylaws, "A Committer is considered >>>> emeritus by their own declaration or by not contributing in any >>>> form to the project for over six months." Therefore we can remove >>>> them from the list, but I'm thinking this is the last option. >>>> >>>> Regards, >>>> Akira >>>> >>>> On 5/18/16 09:07, Allen Wittenauer wrote: >>>>> >>>>
RE: Different JIRA permissions for HADOOP and HDFS
Thanks Akira for the nice info. So where is the link to do it or any how to doc? Sorry I browsed the existing wiki doc but didn't find how to add contributors. Regards, Kai -Original Message- From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp] Sent: Monday, June 20, 2016 12:22 PM To: Zheng, Kai <kai.zh...@intel.com>; common-dev@hadoop.apache.org Subject: Re: Different JIRA permissions for HADOOP and HDFS Yes, the role allows committers to add/remove all the roles. Now about 400 accounts have contributors roles in Hadoop common, and about 1000 contributors in history. Regards, Akira On 6/19/16 19:43, Zheng, Kai wrote: > Thanks Akira for the work. > > What the committer role can do in addition to the committing codes? Can the > role allow to add/remove a contributor? As I said in my last email, I want to > have some contributor(s) back and may add more in some time later. > > Not sure if we need to clean up long time no active contributors. It may be > nice to know how many contributors the project has in its history. If the > list is too long, maybe we can put them in another list, like > OLD_CONTRIBUTORS. > > Regards, > Kai > > -Original Message- > From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp] > Sent: Saturday, June 18, 2016 12:56 PM > To: Zheng, Kai <kai.zh...@intel.com>; common-dev@hadoop.apache.org > Subject: Re: Different JIRA permissions for HADOOP and HDFS > > I'm doing the following steps to reduce the number of contributors: > > 1. Find committers who have only contributor role 2. Add them into > committer role 3. Remove them from contributor role > > However, this is a temporary solution. > Probably we need to do one of the followings in the near future. > > * Create contributor2 role to increase the limit > * Remove contributors who have not been active for a long time > > Regards, > Akira > > On 6/18/16 10:24, Zheng, Kai wrote: >> Hi Akira, >> >> Some contributors (not committer) I know were found lost and we can't assign >> tasks to. Any way I can add them or have to trouble others for that each >> time when there is a new one? Thanks! >> >> Regards, >> Kai >> >> -Original Message- >> From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp] >> Sent: Monday, June 06, 2016 12:47 AM >> To: common-dev@hadoop.apache.org >> Subject: Re: Different JIRA permissions for HADOOP and HDFS >> >> Now I can't add any more contributors in HADOOP Common, so I'll remove the >> contributors who have committers role to make the group smaller. >> Please tell me if you have lost your roles by mistake. >> >> Regards, >> Akira >> >> On 5/18/16 13:48, Akira AJISAKA wrote: >>> In HADOOP/HDFS/MAPREDUCE/YARN, I removed the administrators from >>> contributor group. After that, added Varun into contributor roles. >>> # Ray is already added into contributor roles. >>> >>> Hi contributors/committers, please tell me if you have lost your >>> roles by mistake. >>> >>>> just remove a big chunk of the committers from all the lists >>> In Apache Hadoop Project bylaws, "A Committer is considered emeritus >>> by their own declaration or by not contributing in any form to the >>> project for over six months." Therefore we can remove them from the >>> list, but I'm thinking this is the last option. >>> >>> Regards, >>> Akira >>> >>> On 5/18/16 09:07, Allen Wittenauer wrote: >>>> >>>> We should probably just remove a big chunk of the committers from >>>> all the lists. Most of them have disappeared from Hadoop anyway. >>>> (The 55% growth in JIRA issues in patch available state in the past >>>> year alone a pretty good testament to that fact.) >>>> >>>>> On May 17, 2016, at 4:40 PM, Akira Ajisaka <aajis...@apache.org> wrote: >>>>> >>>>>> Is there some way for us to add a "Contributors2" group with the >>>>>> same permissions as a workaround? Or we could try to clean out >>>>>> contributors who are no longer active, but that might be hard to figure >>>>>> out. >>>>> >>>>> Contributors2 seems fine. AFAIK, committers sometimes cleaned out >>>>> contributors who are no longer active. >>>>> http://search-hadoop.com/m/uOzYt77s6mnzcRu1/v=threaded >>>>> >>>>> Another option: Can we remove committers from contributor group to >>>>> reduce the number of contributor
RE: A top container module like hadoop-cloud for cloud integration modules
Thanks Steve for the feedback and thoughts. Looks like people don't want to move around the related modules as it may not add much real value. It's fine. I may provide better thoughts later when learn the aspect deeper. Regards, Kai -Original Message- From: Steve Loughran [mailto:ste...@hortonworks.com] Sent: Wednesday, June 15, 2016 6:16 PM To: Zheng, Kai <kai.zh...@intel.com> Cc: common-dev@hadoop.apache.org Subject: Re: A top container module like hadoop-cloud for cloud integration modules > On 13 Jun 2016, at 14:02, Zheng, Kai <kai.zh...@intel.com> wrote: > > Hi, > > Noticed it's an obvious trend Hadoop is supporting more and more cloud > platforms, I suggest we have a top container module to hold such integration > modules, like the ones for aws, openstack, azure and upcoming one aliyun. The > rational is simple besides the trend: I'm kind of =0 right now > > 1. Existing modules are mixed in Hadoop-tools that becomes a little big > being of 18 modules now. Cloud specific ones can be grouped together and > separated out, making more sense; the reason for having separate hadoop-aws, hadoop-openstack modules was always to permit the modules to use APIs exclusive to cloud infrastructures, structure the downstream dependencies, *and* allow people like the EMR team to swap in their own closed-source version. I don't think anyone does that though. It also lets us completely isolate testing: each module's tests only run if you have the credentials. > > 2. Future abstraction and common specs & codes sharing could be easier > or thereafter allowed; Right now hadoop-common is where cross FS work and tests go. (Hint, reviewers for HADOOP-12807 needed.). I think we could start there with org.apache.hadoop.cloud package and only split it out if compilation ordering merits it —or it adds any dependencies to hadoop-common. > > 3. Common testing approach could be defined together, for example, some > mechanisms as discussed by Chris, Steve and Allen in HADOOP-12756; > In SPARK-7481 I've added downstream tests for S3a and azure in spark; this shows up that S3a in Hadoop 2.6 gets its blocksize wrong (0) in listings, so the splits are all 1 byte wrong; work dies. I think downstream tests in: Spark, Hive, etc would really round out cloud infra testing, but we can't put those into Hadoop as the build DAG prevents it. (Reviews for SPARK-7481 needed too, BTW). System tests of Aliyun and perhaps GFS connectors would need to go in there or in bigtop —which is the other place I've discussed having cloud integration tests. > 4. Documentation for "Hadoop on Cloud"? Not sure it's needed, as we > already have a section for "Hadoop compatible File Systems". Again, we can stick this in common > > If sounds good, the change would be a good fit for Hadoop 3.0, even though > the change should not involve big impact, as it can avoid affecting the > artifacts. It may cause some inconveniences for the current development > efforts, though. > I think it would make sense if other features went in. A good committer against object stores would be an example here: it depends on the MR libraries, so can't go into common.Today it'd have to go into hadoop-mapreduce. This isn't too bad, as long as the APIs it uses are all in hadoop-common. It's only as things get more complex that it matters. - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
RE: Different JIRA permissions for HADOOP and HDFS
Thanks Akira for the work. What the committer role can do in addition to the committing codes? Can the role allow to add/remove a contributor? As I said in my last email, I want to have some contributor(s) back and may add more in some time later. Not sure if we need to clean up long time no active contributors. It may be nice to know how many contributors the project has in its history. If the list is too long, maybe we can put them in another list, like OLD_CONTRIBUTORS. Regards, Kai -Original Message- From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp] Sent: Saturday, June 18, 2016 12:56 PM To: Zheng, Kai <kai.zh...@intel.com>; common-dev@hadoop.apache.org Subject: Re: Different JIRA permissions for HADOOP and HDFS I'm doing the following steps to reduce the number of contributors: 1. Find committers who have only contributor role 2. Add them into committer role 3. Remove them from contributor role However, this is a temporary solution. Probably we need to do one of the followings in the near future. * Create contributor2 role to increase the limit * Remove contributors who have not been active for a long time Regards, Akira On 6/18/16 10:24, Zheng, Kai wrote: > Hi Akira, > > Some contributors (not committer) I know were found lost and we can't assign > tasks to. Any way I can add them or have to trouble others for that each time > when there is a new one? Thanks! > > Regards, > Kai > > -Original Message- > From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp] > Sent: Monday, June 06, 2016 12:47 AM > To: common-dev@hadoop.apache.org > Subject: Re: Different JIRA permissions for HADOOP and HDFS > > Now I can't add any more contributors in HADOOP Common, so I'll remove the > contributors who have committers role to make the group smaller. > Please tell me if you have lost your roles by mistake. > > Regards, > Akira > > On 5/18/16 13:48, Akira AJISAKA wrote: >> In HADOOP/HDFS/MAPREDUCE/YARN, I removed the administrators from >> contributor group. After that, added Varun into contributor roles. >> # Ray is already added into contributor roles. >> >> Hi contributors/committers, please tell me if you have lost your >> roles by mistake. >> >>> just remove a big chunk of the committers from all the lists >> In Apache Hadoop Project bylaws, "A Committer is considered emeritus >> by their own declaration or by not contributing in any form to the >> project for over six months." Therefore we can remove them from the >> list, but I'm thinking this is the last option. >> >> Regards, >> Akira >> >> On 5/18/16 09:07, Allen Wittenauer wrote: >>> >>> We should probably just remove a big chunk of the committers from >>> all the lists. Most of them have disappeared from Hadoop anyway. >>> (The 55% growth in JIRA issues in patch available state in the past >>> year alone a pretty good testament to that fact.) >>> >>>> On May 17, 2016, at 4:40 PM, Akira Ajisaka <aajis...@apache.org> wrote: >>>> >>>>> Is there some way for us to add a "Contributors2" group with the >>>>> same permissions as a workaround? Or we could try to clean out >>>>> contributors who are no longer active, but that might be hard to figure >>>>> out. >>>> >>>> Contributors2 seems fine. AFAIK, committers sometimes cleaned out >>>> contributors who are no longer active. >>>> http://search-hadoop.com/m/uOzYt77s6mnzcRu1/v=threaded >>>> >>>> Another option: Can we remove committers from contributor group to >>>> reduce the number of contributors? I've already removed myself from >>>> contributor group and it works well. >>>> >>>> Regards, >>>> Akira >>>> >>>> On 5/18/16 03:16, Robert Kanter wrote: >>>>> We've also had a related long-standing issue (or at least I have) >>>>> where I can't add any more contributors to HADOOP or HDFS because >>>>> JIRA times out on looking up their username. I'm guessing we have >>>>> too many contributors for those projects. I bet YARN and MAPREDUCE are >>>>> close. >>>>> Is there some way for us to add a "Contributors2" group with the >>>>> same permissions as a workaround? Or we could try to clean out >>>>> contributors who are no longer active, but that might be hard to figure >>>>> out. >>>>> >>>>> - Robert >>>>> >>>>> On Tue, May 17, 2016
RE: Different JIRA permissions for HADOOP and HDFS
HADOOP/MAPREDUCE/YARN. >>>>> > > >>>>> > > Regards, >>>>> > > Akira >>>>> > > >>>>> > > On 5/17/16 13:45, Vinayakumar B wrote: >>>>> > > >>>>> > >> Hi Junping, >>>>> > >> >>>>> > >> It looks like, I too dont have permissions in projects except >>>>HDFS. >>>>> > >> >>>>> > >> Please grant me also to the group. >>>>> > >> >>>>> > >> Thanks in advance, >>>>> > >> -Vinay >>>>> > >> On 17 May 2016 6:10 a.m., "Sangjin Lee" <sj...@apache.org >>>><mailto:sj...@apache.org>> wrote: >>>>> > >> >>>>> > >> Thanks Junping! It seems to work now. >>>>> > >> >>>>> > >> On Mon, May 16, 2016 at 5:22 PM, Junping Du >>>><j...@hortonworks.com <mailto:j...@hortonworks.com>> >>>>> > wrote: >>>>> > >> >>>>> > >> Someone fix the permission issue so that Administrator, >>>>committer and >>>>> > >>> reporter can edit the issue now. >>>>> > >>> >>>>> > >>> Sangjin, it sounds like you were not in JIRA's committer >>>>list before >>>>> > and >>>>> > >>> I >>>>> > >>> just add you into committer roles for 4 projects. Hope it >>>>works for >>>>> > >>> you now. >>>>> > >>> >>>>> > >>> >>>>> > >>> Thanks, >>>>> > >>> >>>>> > >>> >>>>> > >>> Junping >>>>> > >>> -- >>>>> > >>> *From:* sjl...@gmail.com <mailto:sjl...@gmail.com> >>>><sjl...@gmail.com <mailto:sjl...@gmail.com>> on behalf of Sangjin >>>>> Lee < >>>>> > >>> sj...@apache.org <mailto:sj...@apache.org>> >>>>> > >>> *Sent:* Monday, May 16, 2016 11:43 PM >>>>> > >>> *To:* Zhihai Xu >>>>> > >>> *Cc:* Junping Du; Arun Suresh; Zheng, Kai; Andrew Wang; >>>>> > >>> common-dev@hadoop.apache.org >>>><mailto:common-dev@hadoop.apache.org>; yarn-...@hadoop.apache.org >>>><mailto:yarn-...@hadoop.apache.org> >>>>> > >>> >>>>> > >>> *Subject:* Re: Different JIRA permissions for HADOOP and >>>> HDFS >>>>> > >>> >>>>> > >>> I also find myself unable to edit most of the JIRA fields, >>>>and that >>>>> is >>>>> > >>> across projects (HADOOP, YARN, MAPREDUCE, and HDFS). >>>>Commenting and >>>>> the >>>>> > >>> workflow buttons seem to work, however. >>>>> > >>> >>>>> > >>> On Mon, May 16, 2016 at 8:14 AM, Zhihai Xu <zhi...@uber.com >>>><mailto:zhi...@uber.com>> wrote: >>>>> > >>> >>>>> > >>> Great, Thanks Junping! Yes, the JIRA assignment works for me >>>>now. >>>>> > >>>> >>>>> > >>>> zhihai >>>>> > >>>> >>>>> > >>>> On Mon, May 16, 2016 at 5:29 AM, Junping Du >>>><j...@hortonworks.com <mailto:j...@hortonworks.com>> >>>>> > >>>> wrote: >>>>> > >>>> >>>>> > >>>> Zhihai, I just set you with committer permissions on >>>>MAPREDUCE JIRA. >>>>> > >>>>> >>>>> > >>>> Would >>>>> > >>>> >>>>> > >>>>> you try if the JIRA assignment works now? I cannot help on >>>>Hive >>>>> > >
A top container module like hadoop-cloud for cloud integration modules
Hi, Noticed it's an obvious trend Hadoop is supporting more and more cloud platforms, I suggest we have a top container module to hold such integration modules, like the ones for aws, openstack, azure and upcoming one aliyun. The rational is simple besides the trend: 1. Existing modules are mixed in Hadoop-tools that becomes a little big being of 18 modules now. Cloud specific ones can be grouped together and separated out, making more sense; 2. Future abstraction and common specs & codes sharing could be easier or thereafter allowed; 3. Common testing approach could be defined together, for example, some mechanisms as discussed by Chris, Steve and Allen in HADOOP-12756; 4. Documentation for "Hadoop on Cloud"? Not sure it's needed, as we already have a section for "Hadoop compatible File Systems". If sounds good, the change would be a good fit for Hadoop 3.0, even though the change should not involve big impact, as it can avoid affecting the artifacts. It may cause some inconveniences for the current development efforts, though. Comments are welcome. Thanks! Regards, Kai
RE: About fix versions
Thanks Allen a lot. This is comprehensive. >> So if a patch has been committed to branch-2.8, branch-2, and trunk, then >> the fix version should be 2.8.0 and only 2.8.0. This sounds like the right rule I seem to need and want to know, but guess it may change around the 3.0 release. Regards, Kai -Original Message- From: Allen Wittenauer [mailto:allenwittena...@yahoo.com] Sent: Sunday, May 29, 2016 12:17 PM To: Zheng, Kai <kai.zh...@intel.com> Cc: common-dev@hadoop.apache.org Subject: Re: About fix versions > On May 28, 2016, at 5:13 PM, Zheng, Kai <kai.zh...@intel.com> wrote: > > Hi, > > This may be a stupid question but I want to make sure. What fix versions > would we fill with when a committer just wants to commit a patch to trunk or > branch-2 branch? This is covered on the https://wiki.apache.org/hadoop/HowToCommit page: == snip == Resolve the issue as fixed, thanking the contributor. Always set the "Fix Version" at this point, but please only set a single fix version, the earliest release in which the change will appear. Special case- when committing to a non-mainline branch (such as branch-0.22 or branch-0.23 ATM), please set fix-version to either 2.x.x or 3.x.x appropriately too. == snip == So if a patch has been committed to branch-2.8, branch-2, and trunk, then the fix version should be 2.8.0 and only 2.8.0. Bear in mind that this field determines what changes appear in the changelog and release notes. If this field is filled out incorrectly, then this commit will effectively be missing for end users or appear in the wrong version as ‘new’. > I remembered it's a release manager's role to decide which jira/patch to > include when working on a release. Would anyone help clarify a bit about > this, thanks! This is when cutting the release and has no bearing on what committers should be putting in JIRA. If an RM decides that a commit shouldn’t be in a release, they are responsible for reverting the commit and changing the fix version to whatever is appropriate.
About fix versions
Hi, This may be a stupid question but I want to make sure. What fix versions would we fill with when a committer just wants to commit a patch to trunk or branch-2 branch? I remembered it's a release manager's role to decide which jira/patch to include when working on a release. Would anyone help clarify a bit about this, thanks! Regards, Kai
RE: Release numbering for 3.x leading up to GA
Thanks for thinking about this, Andrew and Zhe. I updated the patch today for HADOOP-13010 (rather large) and would be great if we could get it in the week. Regards, Kai -Original Message- From: Zhe Zhang [mailto:zhe.zhang.resea...@gmail.com] Sent: Friday, May 20, 2016 2:38 PM To: Andrew WangCc: Gangumalla, Uma ; Roman Shaposhnik ; Karthik Kambatla ; common-dev@hadoop.apache.org Subject: Re: Release numbering for 3.x leading up to GA Thanks Andrew. I also had a talk with Kai offline. Agreed that we should try our best to finalize coder config changes for alpha1. On Tue, May 17, 2016 at 5:34 PM Andrew Wang wrote: > The sooner the better for incompatible changes, but at this point we > are explicitly not guaranteeing any compatibility between alpha releases. > > For EC, my understanding is that we're still working on the coder > configuration. Given that we're still working on L changes, I think > it's possible that the coder configuration will be finished in time. > > On Tue, May 17, 2016 at 4:42 PM, Zhe Zhang > > wrote: > > > Naming convention looks good. Thanks Andrew for driving this! > > > > Could you explain a little more on the criteria of cutting alpha1 / > > alpha2? What are the goals we want to achieve for alpha1? From EC's > > perspective, maybe we should target on having all > > compatibility-related changes in alpha1, like new config keys and fsimage > > format? > > > > Thanks, > > > > On Thu, May 12, 2016 at 11:35 AM Andrew Wang > > > > wrote: > > > >> Hi folks, > >> > >> I think it's working, though it takes some time for the rename to > >> propagate in JIRA. JIRA is also currently being hammered by > >> spammers, which might > be > >> related. > >> > >> Anyway, the new "3.0.0-alpha1" version should be live for all four > >> subprojects, so have at it! > >> > >> Best, > >> Andrew > >> > >> On Thu, May 12, 2016 at 11:01 AM, Gangumalla, Uma < > >> uma.ganguma...@intel.com> > >> wrote: > >> > >> > Thanks Andrew for driving. Sounds good. Go ahead please. > >> > > >> > Good luck :-) > >> > > >> > Regards, > >> > Uma > >> > > >> > On 5/12/16, 10:52 AM, "Andrew Wang" wrote: > >> > > >> > >Hi all, > >> > > > >> > >Sounds like we have general agreement on this release numbering > scheme > >> for > >> > >3.x. > >> > > > >> > >I'm going to attempt some mvn and JIRA invocations to get the > >> > >version numbers lined up for alpha1, wish me luck. > >> > > > >> > >Best, > >> > >Andrew > >> > > > >> > >On Tue, May 3, 2016 at 9:52 AM, Roman Shaposhnik < > ro...@shaposhnik.org > >> > > >> > >wrote: > >> > > > >> > >> On Tue, May 3, 2016 at 8:18 AM, Karthik Kambatla < > ka...@cloudera.com > >> > > >> > >> wrote: > >> > >> > The naming scheme sounds good. Since we want to start out > >> > >> > sooner, > >> I am > >> > >> > assuming we are not limiting ourselves to two alphas as the > >> > >> > email > >> > >>might > >> > >> > indicate. > >> > >> > > >> > >> > Also, as the release manager, can you elaborate on your > >> definitions of > >> > >> > alpha and beta? Specifically, when do we expect downstream > >> projects to > >> > >> try > >> > >> > and integrate and when we expect Hadoop users to try out the > bits? > >> > >> > >> > >> Not to speak of all the downstream PMC,s but Bigtop project > >> > >> will > jump > >> > >> on the first alpha the same way we jumped on the first alpha > >> > >> back in the 1 -> 2 transition period. > >> > >> > >> > >> Given that Bigtop currently integrates quite a bit of Hadoop > >> ecosystem > >> > >> that work is going to produce valuable feedback that we plan > >> > >>to communicate to the individual PMCs. What PMCs do with that > >> > >>feedback, of course, > >> will > >> > >> be up to them (obviously Bigtop can't take the ownership of > >> > >> issues > >> that > >> > >> go outside of integration work between projects in the Hadoop > >> ecoystem). > >> > >> > >> > >> Thanks, > >> > >> Roman. > >> > >> > >> > > >> > > >> > - > >> > To unsubscribe, e-mail: > >> > common-dev-unsubscr...@hadoop.apache.org > >> > For additional commands, e-mail: > >> > common-dev-h...@hadoop.apache.org > >> > > >> > > >> > > -- > > Zhe Zhang > > Apache Hadoop Committer > > http://zhe-thoughts.github.io/about/ | @oldcap > > > -- Zhe Zhang Apache Hadoop Committer http://zhe-thoughts.github.io/about/ | @oldcap
RE: Different JIRA permissions for HADOOP and HDFS
It works for me now, thanks Andrew! Regards, Kai -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Monday, May 16, 2016 12:14 AM To: Zheng, Kai <kai.zh...@intel.com> Cc: common-dev@hadoop.apache.org Subject: Re: Different JIRA permissions for HADOOP and HDFS I just gave you committer permissions on JIRA, try now? On Mon, May 16, 2016 at 12:03 AM, Zheng, Kai <kai.zh...@intel.com> wrote: > I just ran into the bad situation that I committed HDFS-8449 but can't > resolve the issue due to lacking the required permission to me. Am not > sure if it's caused by my setup or environment change (temporally > working in a new time zone). Would anyone help resolve the issue for > me to avoid bad state? Thanks! > > -----Original Message- > From: Zheng, Kai [mailto:kai.zh...@intel.com] > Sent: Sunday, May 15, 2016 3:20 PM > To: Allen Wittenauer <allenwittena...@yahoo.com.INVALID> > Cc: common-dev@hadoop.apache.org > Subject: RE: Different JIRA permissions for HADOOP and HDFS > > Thanks Allen for illustrating this in details. I understand. The left > question is, is it intended only JIRA owner (not sure about admin > users) can do the operations like updating a patch? > > Regards, > Kai > > -Original Message- > From: Allen Wittenauer [mailto:allenwittena...@yahoo.com.INVALID] > Sent: Saturday, May 14, 2016 9:38 AM > To: Zheng, Kai <kai.zh...@intel.com> > Cc: common-dev@hadoop.apache.org > Subject: Re: Different JIRA permissions for HADOOP and HDFS > > > > On May 14, 2016, at 7:07 AM, Zheng, Kai <kai.zh...@intel.com> wrote: > > > > Hi, > > > > Noticed this difference but not sure if it’s intended. YARN is > > similar > with HDFS. It’s not convenient. Any clarifying? > > > Under JIRA, different projects (e.g., HADOOP, YARN, MAPREDUCE, > HDFS, YETUS, HBASE, ACCUMULO, etc) may have different settings. At > one point in time, all of the Hadoop subprojects were under one JIRA > project (HADOOP). But then a bunch of folks decided they didn’t want > to see the other sub projects issues so they split them up…. and thus > setting the stage for duplicate code and operational divergence in the source. > > Since people don’t realize or care that they are separate, > people will file INFRA tickets or whatever to change “their project” > and not the rest. This leads to the JIRA projects also diverging… > which ultimately drives those of us who actually look at the project as a > whole bonkers. > - > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > - > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
RE: Different JIRA permissions for HADOOP and HDFS
I just ran into the bad situation that I committed HDFS-8449 but can't resolve the issue due to lacking the required permission to me. Am not sure if it's caused by my setup or environment change (temporally working in a new time zone). Would anyone help resolve the issue for me to avoid bad state? Thanks! -Original Message- From: Zheng, Kai [mailto:kai.zh...@intel.com] Sent: Sunday, May 15, 2016 3:20 PM To: Allen Wittenauer <allenwittena...@yahoo.com.INVALID> Cc: common-dev@hadoop.apache.org Subject: RE: Different JIRA permissions for HADOOP and HDFS Thanks Allen for illustrating this in details. I understand. The left question is, is it intended only JIRA owner (not sure about admin users) can do the operations like updating a patch? Regards, Kai -Original Message- From: Allen Wittenauer [mailto:allenwittena...@yahoo.com.INVALID] Sent: Saturday, May 14, 2016 9:38 AM To: Zheng, Kai <kai.zh...@intel.com> Cc: common-dev@hadoop.apache.org Subject: Re: Different JIRA permissions for HADOOP and HDFS > On May 14, 2016, at 7:07 AM, Zheng, Kai <kai.zh...@intel.com> wrote: > > Hi, > > Noticed this difference but not sure if it’s intended. YARN is similar with > HDFS. It’s not convenient. Any clarifying? Under JIRA, different projects (e.g., HADOOP, YARN, MAPREDUCE, HDFS, YETUS, HBASE, ACCUMULO, etc) may have different settings. At one point in time, all of the Hadoop subprojects were under one JIRA project (HADOOP). But then a bunch of folks decided they didn’t want to see the other sub projects issues so they split them up…. and thus setting the stage for duplicate code and operational divergence in the source. Since people don’t realize or care that they are separate, people will file INFRA tickets or whatever to change “their project” and not the rest. This leads to the JIRA projects also diverging… which ultimately drives those of us who actually look at the project as a whole bonkers. - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
RE: Different JIRA permissions for HADOOP and HDFS
Thanks Allen for illustrating this in details. I understand. The left question is, is it intended only JIRA owner (not sure about admin users) can do the operations like updating a patch? Regards, Kai -Original Message- From: Allen Wittenauer [mailto:allenwittena...@yahoo.com.INVALID] Sent: Saturday, May 14, 2016 9:38 AM To: Zheng, Kai <kai.zh...@intel.com> Cc: common-dev@hadoop.apache.org Subject: Re: Different JIRA permissions for HADOOP and HDFS > On May 14, 2016, at 7:07 AM, Zheng, Kai <kai.zh...@intel.com> wrote: > > Hi, > > Noticed this difference but not sure if it’s intended. YARN is similar with > HDFS. It’s not convenient. Any clarifying? Under JIRA, different projects (e.g., HADOOP, YARN, MAPREDUCE, HDFS, YETUS, HBASE, ACCUMULO, etc) may have different settings. At one point in time, all of the Hadoop subprojects were under one JIRA project (HADOOP). But then a bunch of folks decided they didn’t want to see the other sub projects issues so they split them up…. and thus setting the stage for duplicate code and operational divergence in the source. Since people don’t realize or care that they are separate, people will file INFRA tickets or whatever to change “their project” and not the rest. This leads to the JIRA projects also diverging… which ultimately drives those of us who actually look at the project as a whole bonkers. - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
RE: Different JIRA permissions for HADOOP and HDFS
Yeah, kinds of embarrassing. Thanks Ted. Or simply, with login access some HADOOP or HDFS JIRAs like the following below, note the allowed operations are quite different, and most usable operations like attaching for HDFS JIRAs are not showing. Wondering they’re disabled recently for some reason? https://issues.apache.org/jira/browse/HADOOP-12782 https://issues.apache.org/jira/browse/HDFS-10285 Regards, Kai From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Saturday, May 14, 2016 7:28 AM To: Zheng, Kai <kai.zh...@intel.com> Cc: common-dev@hadoop.apache.org Subject: Re: Different JIRA permissions for HADOOP and HDFS Looks like you attached some images which didn't go through. Consider using 3rd party image site. Cheers On Sat, May 14, 2016 at 7:07 AM, Zheng, Kai <kai.zh...@intel.com<mailto:kai.zh...@intel.com>> wrote: Hi, Noticed this difference but not sure if it’s intended. YARN is similar with HDFS. It’s not convenient. Any clarifying? Thanks. -kai
Different JIRA permissions for HADOOP and HDFS
Hi, Noticed this difference but not sure if it's intended. YARN is similar with HDFS. It's not convenient. Any clarifying? Thanks. -kai [cid:image001.png@01D1ADAF.16160940] [cid:image002.png@01D1ADAF.16160940]
RE: Release numbering for 3.x leading up to GA
Ok, got it. Thanks for the explanation. Regards, Kai From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, May 03, 2016 5:41 AM To: Zheng, Kai <kai.zh...@intel.com> Cc: common-dev@hadoop.apache.org Subject: Re: Release numbering for 3.x leading up to GA >> but I'm going to spend time on our first RC this week. Sorry what does this mean? Did you mean the first RC version or 3.0.0-alpha1 will be cut out this week? Anyway will try to get some tasks done sooner. First RC for whatever we name the first 3.0 alpha release. There's no need to rush things to make this first alpha, since there are more alphas planned. That said, if you have changes that affect compatibility, the sooner the better :)
RE: Release numbering for 3.x leading up to GA
Thanks for driving this, Andrew. Sounds great. >> but I'm going to spend time on our first RC this week. Sorry what does this mean? Did you mean the first RC version or 3.0.0-alpha1 will be cut out this week? Anyway will try to get some tasks done sooner. Regards, Kai -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, May 03, 2016 5:07 AM To: Roman ShaposhnikCc: common-dev@hadoop.apache.org Subject: Re: Release numbering for 3.x leading up to GA On Mon, May 2, 2016 at 2:00 PM, Roman Shaposhnik wrote: > On Mon, May 2, 2016 at 1:50 PM, Andrew Wang > wrote: > > Hi all, > > > > I wanted to confirm my version numbering plan for Hadoop 3.x. We had > > a related thread on this topic about a year ago, mostly focusing on > > the > > branch-2 maintenance releases: > > > > > http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201504.mbox > /%3CFAA30A53-C2BD-4380-9245-C8DBEC7BF386%40hortonworks.com%3E > > > > For Hadoop 3, I wanted to do something like scheme (2) in Vinod's > original > > email from the above thread. e.g. leading up to GA, we'd have: > > > > 3.0.0-alpha1 > > 3.0.0-alpha2 > > 3.0.0-beta1 > > 3.0.0 > > +1 on the naming scheme. Also (and I know this is an impossible > question to answer, but still) > what are the rough timing expectations on these? > > Thanks Roman. Regarding release timing, this is what we discussed on another thread: > For exit criteria, how about we time box it? My plan was to do monthly alphas through the summer, leading up to beta in late August / early Sep. At that point we freeze and stabilize for GA in Nov/Dec. As you say, release plans need to be flexibly, but I'm going to spend time on our first RC this week.
RE: hadoop 2.7.2 build failure with error "plugin descriptor"
Looks like you missed some maven plugin. May be helpful to run `mvn install` before the building. -Original Message- From: ? ? [mailto:yu20...@hotmail.com] Sent: Tuesday, March 29, 2016 3:39 PM To: common-dev@hadoop.apache.org Subject: hadoop 2.7.2 build failure with error "plugin descriptor" Hi guys, When trying to build hadoop 2.7.2 on Ubuntu 5.10, I met following problem. [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop Main . SUCCESS [ 0.281 s] [INFO] Apache Hadoop Project POM .. SUCCESS [ 0.422 s] [INFO] Apache Hadoop Annotations .. SUCCESS [ 0.388 s] [INFO] Apache Hadoop Project Dist POM . SUCCESS [ 0.056 s] [INFO] Apache Hadoop Assemblies ... SUCCESS [ 0.078 s] [INFO] Apache Hadoop Maven Plugins SUCCESS [ 0.179 s] [INFO] Apache Hadoop MiniKDC .. SUCCESS [ 0.388 s] [INFO] Apache Hadoop Auth . SUCCESS [ 0.150 s] [INFO] Apache Hadoop Auth Examples SUCCESS [ 0.074 s] [INFO] Apache Hadoop Common ... FAILURE [ 0.002 s] [INFO] Apache Hadoop NFS .. SKIPPED [INFO] Apache Hadoop KMS .. SKIPPED [INFO] Apache Hadoop Common Project ... SKIPPED [INFO] Apache Hadoop HDFS . SKIPPED [INFO] Apache Hadoop HttpFS ... SKIPPED [INFO] Apache Hadoop HDFS BookKeeper Journal .. SKIPPED [INFO] Apache Hadoop HDFS-NFS . SKIPPED [INFO] Apache Hadoop HDFS Project . SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 4.249 s [INFO] Finished at: 2016-03-29T15:35:05+08:00 [INFO] Final Memory: 39M/215M [INFO] [ERROR] Failed to parse plugin descriptor for org.apache.hadoop:hadoop-maven-plugins:2.7.2 (/home/opensrc/hadoop-2.7.2-src/hadoop-maven-plugins/target/classes): No plugin descriptor found at META-INF/maven/plugin.xml -> [Help 1] org.apache.maven.plugin.PluginDescriptorParsingException: Failed to parse plugin descriptor for org.apache.hadoop:hadoop-maven-plugins:2.7.2 (/home/opensrc/hadoop-2.7.2-src/hadoop-maven-plugins/target/classes): No plugin descriptor found at META-INF/maven/plugin.xml at org.apache.maven.plugin.internal.DefaultMavenPluginManager.extractPluginDescriptor(DefaultMavenPluginManager.java:249) at org.apache.maven.plugin.internal.DefaultMavenPluginManager.getPluginDescriptor(DefaultMavenPluginManager.java:184) at org.apache.maven.plugin.internal.DefaultMavenPluginManager.getMojoDescriptor(DefaultMavenPluginManager.java:298) at org.apache.maven.plugin.DefaultBuildPluginManager.getMojoDescriptor(DefaultBuildPluginManager.java:241) at org.apache.maven.lifecycle.internal.DefaultLifecycleExecutionPlanCalculator.setupMojoExecution(DefaultLifecycleExecutionPlanCalculator.java:169) at org.apache.maven.lifecycle.internal.DefaultLifecycleExecutionPlanCalculator.setupMojoExecutions(DefaultLifecycleExecutionPlanCalculator.java:155) at org.apache.maven.lifecycle.internal.DefaultLifecycleExecutionPlanCalculator.calculateExecutionPlan(DefaultLifecycleExecutionPlanCalculator.java:131) at org.apache.maven.lifecycle.internal.DefaultLifecycleExecutionPlanCalculator.calculateExecutionPlan(DefaultLifecycleExecutionPlanCalculator.java:145) at org.apache.maven.lifecycle.internal.builder.BuilderCommon.resolveBuildPlan(BuilderCommon.java:96) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:109) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106) at org.apache.maven.cli.MavenCli.execute(MavenCli.java:862) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:286) at org.apache.maven.cli.MavenCli.main(MavenCli.java:197) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at
RE: [VOTE] Accept Chimera as new Apache Commons Component
Nice proposal. The high performance crypto offering is nice to be broadly accessible. Thanks. Non-binding +1. Regards, Kai -Original Message- From: Chris Nauroth [mailto:cnaur...@hortonworks.com] Sent: Wednesday, March 23, 2016 5:47 AM To: common-dev@hadoop.apache.org; Commons Developers ListSubject: Re: [VOTE] Accept Chimera as new Apache Commons Component +1 (non-binding) --Chris Nauroth On 3/21/16, 1:45 AM, "Benedikt Ritter" wrote: >Hi all, > >after long discussions I think we have gathered enough information to >decide whether we want to accept the Chimera project as a new Apache >Commons component. > >Proposed name: Apache Commons Crypto >Proposal text: >https://github.com/intel-hadoop/chimera/blob/master/PROPOSAL.html >Initial Code Base: https://github.com/intel-hadoop/chimera/ >Initial Committers (Names in alphabetical order): >- Aaron T. Myers (a...@apache.org, Apache Hadoop PMC, one of the >original Crypto dev team in Apache Hadoop) >- Andrew Wang (w...@apache.org, Apache Hadoop PMC, one of the original >Crypto dev team in Apache Hadoop) >- Chris Nauroth (cnaur...@apache.org, Apache Hadoop PMC and active >reviewer) >- Colin P. McCabe (cmcc...@apache.org, Apache Hadoop PMC, one of the >original Crypto dev team in Apache Hadoop) >- Dapeng Sun (s...@apache.org, Apache Sentry Committer, Chimera >contributor) >- Dian Fu (dia...@apache.org, Apache Sqoop Committer, Chimera >contributor) >- Dong Chen (do...@apache.org, Apache Hive Committer,interested on >Chimera) >- Ferdinand Xu (x...@apache.org, Apache Hive Committer, Chimera >contributor) >- Haifeng Chen (haifengc...@apache.org, Chimera lead and code >contributor) >- Marcelo Vanzin (Apache Spark Committer, Chimera contributor) >- Uma Maheswara Rao G (umamah...@apache.org, Apache Hadoop PMC, One of >the original Crypto dev/review team in Apache Hadoop) >- Yi Liu (y...@apache.org, Apache Hadoop PMC, One of the original >Crypto dev/review team in Apache Hadoop) > >Please review the proposal and vote. >This vote will close no sooner than 72 hours from now, i.e. after 0900 >GMT 24-Mar 2016 > > [ ] +1 Accept Chimera as new Apache Commons Component [ ] +0 OK, > but... > [ ] -0 OK, but really should fix... > [ ] -1 I oppose this because... > >Thank you! >Benedikt
RE: Style checking related to getters
Thanks Uma. HADOOP-12859 was created for this. To correct, actually it's related to class setters. Regards, Kai -Original Message- From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] Sent: Tuesday, March 01, 2016 8:22 AM To: common-dev@hadoop.apache.org Subject: Re: Style checking related to getters +1 for disabling them. Regards, Uma On 2/29/16, 11:16 AM, "Andrew Wang" <andrew.w...@cloudera.com> wrote: >Hi Kai, > >Could you file a JIRA and post patch to disable that checkstyle rule? >You can look at HADOOP-12713 for an example. Ping me and I'll review. > >Best, >Andrew > >On Sun, Feb 28, 2016 at 11:28 PM, Zheng, Kai <kai.zh...@intel.com> wrote: > >> Hi, >> >> I'm wondering if we could get rid of the style checking in getters >> like the following (from HDFS-9733). It's annoying because it's a >> common Java practice and widely used in the project. >> >> >> void setBlockLocations(LocatedBlocks blockLocations) {:42: >> 'blockLocations' hides a field. >> >> void setTimeout(int timeout) {:25: 'timeout' hides a field. >> >> void setLocatedBlocks(List locatedBlocks) {:46: >> 'locatedBlocks' hides a field. >> >> void setRemaining(long remaining) {:28: 'remaining' hides a field. >> >> void setBytesPerCRC(int bytesPerCRC) {:29: 'bytesPerCRC' hides a field. >> >> void setCrcType(DataChecksum.Type crcType) {:39: 'crcType' hides a >>field. >> >> void setCrcPerBlock(long crcPerBlock) {:30: 'crcPerBlock' hides a field. >> >> void setRefetchBlocks(boolean refetchBlocks) {:35: 'refetchBlocks' >>hides a >> field. >> >> void setLastRetriedIndex(int lastRetriedIndex) {:34: 'lastRetriedIndex' >> hides a field. >> >> Regards, >> Kai >>
RE: Style checking related to getters
Thanks Andrew for the confirm. Yes I will do that. Regards, Kai -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, March 01, 2016 3:16 AM To: common-dev@hadoop.apache.org Subject: Re: Style checking related to getters Hi Kai, Could you file a JIRA and post patch to disable that checkstyle rule? You can look at HADOOP-12713 for an example. Ping me and I'll review. Best, Andrew On Sun, Feb 28, 2016 at 11:28 PM, Zheng, Kai <kai.zh...@intel.com> wrote: > Hi, > > I'm wondering if we could get rid of the style checking in getters > like the following (from HDFS-9733). It's annoying because it's a > common Java practice and widely used in the project. > > > void setBlockLocations(LocatedBlocks blockLocations) {:42: > 'blockLocations' hides a field. > > void setTimeout(int timeout) {:25: 'timeout' hides a field. > > void setLocatedBlocks(List locatedBlocks) {:46: > 'locatedBlocks' hides a field. > > void setRemaining(long remaining) {:28: 'remaining' hides a field. > > void setBytesPerCRC(int bytesPerCRC) {:29: 'bytesPerCRC' hides a field. > > void setCrcType(DataChecksum.Type crcType) {:39: 'crcType' hides a field. > > void setCrcPerBlock(long crcPerBlock) {:30: 'crcPerBlock' hides a field. > > void setRefetchBlocks(boolean refetchBlocks) {:35: 'refetchBlocks' > hides a field. > > void setLastRetriedIndex(int lastRetriedIndex) {:34: 'lastRetriedIndex' > hides a field. > > Regards, > Kai >
Style checking related to getters
Hi, I'm wondering if we could get rid of the style checking in getters like the following (from HDFS-9733). It's annoying because it's a common Java practice and widely used in the project. void setBlockLocations(LocatedBlocks blockLocations) {:42: 'blockLocations' hides a field. void setTimeout(int timeout) {:25: 'timeout' hides a field. void setLocatedBlocks(List locatedBlocks) {:46: 'locatedBlocks' hides a field. void setRemaining(long remaining) {:28: 'remaining' hides a field. void setBytesPerCRC(int bytesPerCRC) {:29: 'bytesPerCRC' hides a field. void setCrcType(DataChecksum.Type crcType) {:39: 'crcType' hides a field. void setCrcPerBlock(long crcPerBlock) {:30: 'crcPerBlock' hides a field. void setRefetchBlocks(boolean refetchBlocks) {:35: 'refetchBlocks' hides a field. void setLastRetriedIndex(int lastRetriedIndex) {:34: 'lastRetriedIndex' hides a field. Regards, Kai
RE: Introduce Apache Kerby to Hadoop
Hi Haohui, I'm glad to know GRPC and it sounds cool. I think it's a good proposal to suggest Hadoop IPC/RPC upgrading to GRPC. We haven't evaluated GRPC for the question of RPC encryption optimization because it's another story. It's not an overlap for the optimization work because even if we use GRPC, the RPC protocol messages still need to go through the stack of SASL/GSSAPI/Kerberos. What's desired here is not to re-implement any RPC layer, or the stack, but is to optimize the stack, by possibly implementing and plugin-ing new SASL or GSSAPI mechanism. Hope this clarifying helps. Thanks. Regards, Kai -Original Message- From: Haohui Mai [mailto:ricet...@gmail.com] Sent: Sunday, February 28, 2016 3:02 AM To: common-dev@hadoop.apache.org Subject: Re: Introduce Apache Kerby to Hadoop Have we evaluated GRPC? A robust RPC requires significant effort. Migrating to GRPC can save ourselves a lot of headache. Haohui On Sat, Feb 27, 2016 at 1:35 AM Andrew Purtell <andrew.purt...@gmail.com> wrote: > I get a excited thinking about the prospect of better performance with > auth-conf QoP. HBase RPC is an increasingly distant fork but still > close enough to Hadoop in that respect. Our bulk data transfer > protocol isn't a separate thing like in HDFS, which avoids a SASL > wrapped implementation, so we really suffer when auth-conf is > negotiated. You'll see the same impact where there might be a high > frequency of NameNode RPC calls or similar still. Throughput drops 3-4x, or > worse. > > > On Feb 22, 2016, at 4:56 PM, Zheng, Kai <kai.zh...@intel.com> wrote: > > > > Thanks for the confirm and further inputs, Steve. > > > >>> the latter would dramatically reduce the cost of wire-encrypting IPC. > > Yes to optimize Hadoop IPC/RPC encryption is another opportunity > > Kerby > can help with, it's possible because we may hook Chimera or AES-NI > thing into the Kerberos layer by leveraging the Kerberos library. As > it may be noted, HADOOP-12725 is on the going for this aspect. There > may be good result and further update on this recently. > > > >>> For now, I'd like to see basic steps -upgrading minkdc to krypto, > >>> see > how it works. > > Yes, starting with this initial steps upgrading MiniKDC to use Kerby > > is > the right thing we could do. After some interactions with Kerby > project, we may have more ideas how to proceed on the followings. > > > >>> Long term, I'd like Hadoop 3 to be Kerby-ized > > This sounds great! With necessary support from the community like > feedback and patch reviewing, we can speed up the related work. > > > > Regards, > > Kai > > > > -Original Message- > > From: Steve Loughran [mailto:ste...@hortonworks.com] > > Sent: Monday, February 22, 2016 6:51 PM > > To: common-dev@hadoop.apache.org > > Subject: Re: Introduce Apache Kerby to Hadoop > > > > > > > > I've discussed this offline with Kai, as part of the "let's fix > kerberos" project. Not only is it a better Kerberos engine, we can do > more diagnostics, get better algorithms and ultimately get better APIs > for doing Kerberos and SASL —the latter would dramatically reduce the > cost of wire-encrypting IPC. > > > > For now, I'd like to see basic steps -upgrading minkdc to krypto, > > see > how it works. > > > > Long term, I'd like Hadoop 3 to be Kerby-ized > > > > > >> On 22 Feb 2016, at 06:41, Zheng, Kai <kai.zh...@intel.com> wrote: > >> > >> Hi folks, > >> > >> I'd like to mention Apache Kerby [1] here to the community and > >> propose > to introduce the project to Hadoop, a sub project of Apache Directory > project. > >> > >> Apache Kerby is a Kerberos centric project and aims to provide a > >> first > Java Kerberos library that contains both client and server supports. > The relevant features include: > >> It supports full Kerberos encryption types aligned with both MIT > >> KDC and MS AD; Client APIs to allow to login via password, > >> credential cache, keytab file and etc.; Utilities for generate, > >> operate and inspect keytab and credential cache files; A simple KDC > >> server that borrows some ideas from Hadoop-MiniKDC and can be used > >> in tests but with minimal overhead in external dependencies; A > >> brand new token > mechanism is provided, can be experimentally used, using it a JWT > token can be used to exchange a TGT or service ticket; Anonymous > PKINIT support, can be experientially used, as the first Java library > that supports the Kerberos major ext
RE: Introduce Apache Kerby to Hadoop
Thanks Andrew for the update on HBase side! >> Throughput drops 3-4x, or worse. Hopefully we can avoid much of the encryption overhead. We're prototyping a solution working on that. Regards, Kai -Original Message- From: Andrew Purtell [mailto:andrew.purt...@gmail.com] Sent: Saturday, February 27, 2016 5:35 PM To: common-dev@hadoop.apache.org Subject: Re: Introduce Apache Kerby to Hadoop I get a excited thinking about the prospect of better performance with auth-conf QoP. HBase RPC is an increasingly distant fork but still close enough to Hadoop in that respect. Our bulk data transfer protocol isn't a separate thing like in HDFS, which avoids a SASL wrapped implementation, so we really suffer when auth-conf is negotiated. You'll see the same impact where there might be a high frequency of NameNode RPC calls or similar still. Throughput drops 3-4x, or worse. > On Feb 22, 2016, at 4:56 PM, Zheng, Kai <kai.zh...@intel.com> wrote: > > Thanks for the confirm and further inputs, Steve. > >>> the latter would dramatically reduce the cost of wire-encrypting IPC. > Yes to optimize Hadoop IPC/RPC encryption is another opportunity Kerby can > help with, it's possible because we may hook Chimera or AES-NI thing into the > Kerberos layer by leveraging the Kerberos library. As it may be noted, > HADOOP-12725 is on the going for this aspect. There may be good result and > further update on this recently. > >>> For now, I'd like to see basic steps -upgrading minkdc to krypto, see how >>> it works. > Yes, starting with this initial steps upgrading MiniKDC to use Kerby is the > right thing we could do. After some interactions with Kerby project, we may > have more ideas how to proceed on the followings. > >>> Long term, I'd like Hadoop 3 to be Kerby-ized > This sounds great! With necessary support from the community like feedback > and patch reviewing, we can speed up the related work. > > Regards, > Kai > > -Original Message- > From: Steve Loughran [mailto:ste...@hortonworks.com] > Sent: Monday, February 22, 2016 6:51 PM > To: common-dev@hadoop.apache.org > Subject: Re: Introduce Apache Kerby to Hadoop > > > > I've discussed this offline with Kai, as part of the "let's fix kerberos" > project. Not only is it a better Kerberos engine, we can do more diagnostics, > get better algorithms and ultimately get better APIs for doing Kerberos and > SASL —the latter would dramatically reduce the cost of wire-encrypting IPC. > > For now, I'd like to see basic steps -upgrading minkdc to krypto, see how it > works. > > Long term, I'd like Hadoop 3 to be Kerby-ized > > >> On 22 Feb 2016, at 06:41, Zheng, Kai <kai.zh...@intel.com> wrote: >> >> Hi folks, >> >> I'd like to mention Apache Kerby [1] here to the community and propose to >> introduce the project to Hadoop, a sub project of Apache Directory project. >> >> Apache Kerby is a Kerberos centric project and aims to provide a first Java >> Kerberos library that contains both client and server supports. The relevant >> features include: >> It supports full Kerberos encryption types aligned with both MIT KDC >> and MS AD; Client APIs to allow to login via password, credential >> cache, keytab file and etc.; Utilities for generate, operate and >> inspect keytab and credential cache files; A simple KDC server that >> borrows some ideas from Hadoop-MiniKDC and can be used in tests but >> with minimal overhead in external dependencies; A brand new token mechanism >> is provided, can be experimentally used, using it a JWT token can be used to >> exchange a TGT or service ticket; Anonymous PKINIT support, can be >> experientially used, as the first Java library that supports the Kerberos >> major extension. >> >> The project stands alone and is ensured to only depend on JRE for easier >> usage. It has made the first release (1.0.0-RC1) and 2nd release (RC2) is >> upcoming. >> >> >> As an initial step, this proposal suggests using Apache Kerby to upgrade the >> existing codes related to ApacheDS for the Kerberos support. The >> advantageous: >> >> 1. The kerby-kerb library is all the need, which is purely in Java, >> SLF4J is the only dependency, the whole is rather small; >> >> 2. There is a SimpleKDC in the library for test usage, which borrowed >> the MiniKDC idea and implemented all the support existing in MiniKDC. >> We had a POC that rewrote MiniKDC using Kerby SimpleKDC and it works >> fine; >> >> 3. Full Kerberos encryption types (many of them are not available in >> JRE but supported
RE: Introduce Apache Kerby to Hadoop
Thanks Larry for your thoughts and inputs. >> Replacing MiniKDC with kerby certainly makes sense. Thanks. >> Kerby-izing Hadoop 3 needs to be defined carefully. Fully agree. We're still working to make the relevant Kerberos support come to the ideal state, either in Kerby project or outside of it. When appropriate and sounds good, we can think about what's next steps, come up design and discuss this then. Maybe we can discuss about these inputs separately after the initial things done? Regards, Kai -Original Message- From: larry mccay [mailto:lmc...@apache.org] Sent: Monday, February 22, 2016 9:05 PM To: common-dev@hadoop.apache.org Subject: Re: Introduce Apache Kerby to Hadoop Replacing MiniKDC with kerby certainly makes sense. Kerby-izing Hadoop 3 needs to be defined carefully. As much as a JWT proponent that I am, I don't know that that taking up non-standard features such as the JWT token would necessarily serve us well. If we are talking about client side only uptake in Hadoop 3 as a better diagnosable client library that completely makes sense. Better algorithms and APIs would require server side compliance as well - no? These decisions would need to align deployment usecases that want to go directly to AD/MIT. Perhaps, it just means careful configuration of algorithms to match the server side in those cases. +1 on the baby step of replacing MiniKDC - as this is really just +alignment with the directory project roadmap anyway. On Mon, Feb 22, 2016 at 5:51 AM, Steve Loughran <ste...@hortonworks.com> wrote: > > > I've discussed this offline with Kai, as part of the "let's fix kerberos" > project. Not only is it a better Kerberos engine, we can do more > diagnostics, get better algorithms and ultimately get better APIs for > doing Kerberos and SASL —the latter would dramatically reduce the cost > of wire-encrypting IPC. > > For now, I'd like to see basic steps -upgrading minkdc to krypto, see > how it works. > > Long term, I'd like Hadoop 3 to be Kerby-ized > > > > On 22 Feb 2016, at 06:41, Zheng, Kai <kai.zh...@intel.com> wrote: > > > > Hi folks, > > > > I'd like to mention Apache Kerby [1] here to the community and > > propose > to introduce the project to Hadoop, a sub project of Apache Directory > project. > > > > Apache Kerby is a Kerberos centric project and aims to provide a > > first > Java Kerberos library that contains both client and server supports. > The relevant features include: > > It supports full Kerberos encryption types aligned with both MIT KDC > > and > MS AD; > > Client APIs to allow to login via password, credential cache, keytab > file and etc.; > > Utilities for generate, operate and inspect keytab and credential > > cache > files; > > A simple KDC server that borrows some ideas from Hadoop-MiniKDC and > > can > be used in tests but with minimal overhead in external dependencies; > > A brand new token mechanism is provided, can be experimentally used, > using it a JWT token can be used to exchange a TGT or service ticket; > > Anonymous PKINIT support, can be experientially used, as the first > > Java > library that supports the Kerberos major extension. > > > > The project stands alone and is ensured to only depend on JRE for > > easier > usage. It has made the first release (1.0.0-RC1) and 2nd release (RC2) > is upcoming. > > > > > > As an initial step, this proposal suggests using Apache Kerby to > > upgrade > the existing codes related to ApacheDS for the Kerberos support. The > advantageous: > > > > 1. The kerby-kerb library is all the need, which is purely in Java, > SLF4J is the only dependency, the whole is rather small; > > > > 2. There is a SimpleKDC in the library for test usage, which > > borrowed > the MiniKDC idea and implemented all the support existing in MiniKDC. > We had a POC that rewrote MiniKDC using Kerby SimpleKDC and it works > fine; > > > > 3. Full Kerberos encryption types (many of them are not available in > > JRE > but supported by major Kerberos vendors) and more functionalities like > credential cache support; > > > > 4. Perhaps the most concerned, Hadoop MiniKDC and etc. depend on the > > old > Kerberos implementation in Directory Server project, but the > implementation is stopped being maintained. Directory project has a > plan to replace the implementation using Kerby. MiniKDC can use Kerby > directly to simplify the deps; > > > > 5. Extensively tested with all kinds of unit tests, already being > > used > for some time (like PSU), even in production environment; > > > > 6. Actively developed, an
Introduce Apache Kerby to Hadoop
Hi folks, I'd like to mention Apache Kerby [1] here to the community and propose to introduce the project to Hadoop, a sub project of Apache Directory project. Apache Kerby is a Kerberos centric project and aims to provide a first Java Kerberos library that contains both client and server supports. The relevant features include: It supports full Kerberos encryption types aligned with both MIT KDC and MS AD; Client APIs to allow to login via password, credential cache, keytab file and etc.; Utilities for generate, operate and inspect keytab and credential cache files; A simple KDC server that borrows some ideas from Hadoop-MiniKDC and can be used in tests but with minimal overhead in external dependencies; A brand new token mechanism is provided, can be experimentally used, using it a JWT token can be used to exchange a TGT or service ticket; Anonymous PKINIT support, can be experientially used, as the first Java library that supports the Kerberos major extension. The project stands alone and is ensured to only depend on JRE for easier usage. It has made the first release (1.0.0-RC1) and 2nd release (RC2) is upcoming. As an initial step, this proposal suggests using Apache Kerby to upgrade the existing codes related to ApacheDS for the Kerberos support. The advantageous: 1. The kerby-kerb library is all the need, which is purely in Java, SLF4J is the only dependency, the whole is rather small; 2. There is a SimpleKDC in the library for test usage, which borrowed the MiniKDC idea and implemented all the support existing in MiniKDC. We had a POC that rewrote MiniKDC using Kerby SimpleKDC and it works fine; 3. Full Kerberos encryption types (many of them are not available in JRE but supported by major Kerberos vendors) and more functionalities like credential cache support; 4. Perhaps the most concerned, Hadoop MiniKDC and etc. depend on the old Kerberos implementation in Directory Server project, but the implementation is stopped being maintained. Directory project has a plan to replace the implementation using Kerby. MiniKDC can use Kerby directly to simplify the deps; 5. Extensively tested with all kinds of unit tests, already being used for some time (like PSU), even in production environment; 6. Actively developed, and can be fixed and released in time if necessary, separately and independently from other components in Apache Directory project. By actively developing Apache Kerby and now applying it to Hadoop, our side wish to make the Kerberos deploying, troubleshooting and further enhancement can be much easier and thereafter possible. Wish this is a good beginning, and eventually Apache Kerby can benefit other projects in the ecosystem as well. This Kerberos related work is actually a long time effort led by Weihua Jiang in Intel, and had been kindly encouraged by Andrew Purtell, Steve Loughran, Gangumalla Uma, Andrew Wang and etc., thanks a lot for their great discussions and inputs in the past. Your feedback is very welcome. Thanks in advance. [1] https://github.com/apache/directory-kerby Regards, Kai
RE: Looking to a Hadoop 3 release
Thanks Andrew for driving this. Wonder if it's a good chance for HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note it's not an incompatible change, but feel better to be done in the major release. Regards, Kai -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Friday, February 19, 2016 7:04 AM To: hdfs-...@hadoop.apache.org; Kihwal LeeCc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Looking to a Hadoop 3 release Hi Kihwal, I think there's still value in continuing the 2.x releases. 3.x comes with the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't be beta or GA for some number of months. In the meanwhile, it'd be good to keep putting out regular, stable 2.x releases. Best, Andrew On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee wrote: > Moving Hadoop 3 forward sounds fine. If EC is one of the main > motivations, are we getting rid of branch-2.8? > > Kihwal > > From: Andrew Wang > To: "common-dev@hadoop.apache.org" > Cc: "yarn-...@hadoop.apache.org" ; " > mapreduce-...@hadoop.apache.org" ; > hdfs-dev > Sent: Thursday, February 18, 2016 4:35 PM > Subject: Re: Looking to a Hadoop 3 release > > Hi all, > > Reviving this thread. I've seen renewed interest in a trunk release > since HDFS erasure coding has not yet made it to branch-2. Along with > JDK8, the shell script rewrite, and many other improvements, I think > it's time to revisit Hadoop 3.0 release plans. > > My overall plan is still the same as in my original email: a series of > regular alpha releases leading up to beta and GA. Alpha releases make > it easier for downstreams to integrate with our code, and making them > regular means features can be included when they are ready. > > I know there are some incompatible changes waiting in the wings (i.e. > HDFS-6984 making FileStatus a PB rather than Writable, some of > HADOOP-9991 bumping dependency versions) that would be good to get in. > If you have changes like this, please set the target version to 3.0.0 > and mark them "Incompatible". We can use this JIRA query to track: > > > https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2 > 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20% > 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado > op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority > > There's some release-related stuff that needs to be sorted out > (namely, the new CHANGES.txt and release note generation from Yetus), > but I'd tentatively like to roll the first alpha a month out, so third > week of March. > > Best, > Andrew > > On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata wrote: > > > Avoiding the use of JDK8 language features (and, presumably, APIs) > > means you've abandoned #1, i.e., you haven't (really) bumped the JDK > > source version to JDK8. > > > > Also, note that releasing from trunk is a way of achieving #3, it's > > not a way of abandoning it. > > > > > > > > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang > > > > wrote: > > > Hi Raymie, > > > > > > Konst proposed just releasing off of trunk rather than cutting a > > branch-2, > > > and there was general agreement there. So, consider #3 abandoned. > > > 1&2 > can > > > be achieved at the same time, we just need to avoid using JDK8 > > > language features in trunk so things can be backported. > > > > > > Best, > > > Andrew > > > > > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata > > > > > wrote: > > > > > >> In this (and the related threads), I see the following three > > requirements: > > >> > > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support). > > >> > > >> 2. "We'll still be releasing 2.x releases for a while, with > > >> similar feature sets as 3.x." > > >> > > >> 3. Avoid the "risk of split-brain behavior" by "minimize > > >> backporting headaches. Pulling trunk > branch-2 > branch-2.x is already > > >> tedious. > > >> Adding a branch-3, branch-3.x would be obnoxious." > > >> > > >> These three cannot be achieved at the same time. Which do we abandon? > > >> > > >> > > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia > > >> > > >> wrote: > > >> > > > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth > wrote: > > >> >> > > >> >> 2) Simplification of configs - potentially separating client > > >> >> side > > >> configs > > >> >> and those used by daemons. This is another source of perpetual > > confusion > > >> >> for users. > > >> > + 1 on this. > > >> > > > >> > sanjay > > >> > > > > >
Some questions about DistCp
Hi, I recently did some investigation about DistCp and have some questions. I thought before diving into JIRA things it would be good to discuss them first here. I read the doc at the following link and regard it as the latest revision that corresponds with the trunk codebase. http://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html If that's right, then we may need to complement it with the following important features because I don't see they are mentioned in the doc. 1. -diff option, use snapshot diff report to identify the differences between source and target to compute the copying list. 2. -numListstatusThreads option, number of threads to concurrently compute the copying list. 3. -p t, to preserve timestamps. As above features are great things for user to use in order to speed up the time consuming inter or intra cluster sync, not only to add these options in the table of command line options, but also better to document them well as we did for other functions. A main use case is that performing copy from source HDFS cluster to target HDFS cluster. It was mentioned each NodeManager can reach and communicate with both the source and destination file systems. In this case where is recommended to run the DistCp command, in the source cluster or target? Might be better to run it in the source side so copy mappers can read locally via short circuit (but would then write remotely)? Any consideration in this aspect? In above case (both source and target are HDFS cluster), there was a consideration for replicated files that, if the block size and checksum opt are not reserved (via -pb), then after copy is done we may skip the file checksums comparing and avoid the checksum computing, because in such situation, since block size and checksum type may differ, then the file checksums surely differ. Sure, in most time source and target clusters may use the same setting, so even not preserved, I guess the block size and checksum type may still be the same particularly by default values. So more safely, maybe we can improve this as, compare the block size and checksum opt first, if they're the same, then compare the file checksums, otherwise not. Makes sense? Note this is partly raised in HDFS-9613. For striped files, we'll need to update the command as well, and probably handle it specially. This is currently under discussion in HDFS-8430. Thanks for the discussion. Regards, Kai
RE: Question about subtype exceptions in the thrown list in addition to a more general one
Thanks Chris for the thoughts and details. I agree it's not easy to change those IOException subclasses even not relevant to I/O. Regards, Kai -Original Message- From: Chris Nauroth [mailto:cnaur...@hortonworks.com] Sent: Wednesday, December 30, 2015 2:52 AM To: common-dev@hadoop.apache.org Subject: Re: Question about subtype exceptions in the thrown list in addition to a more general one Hello Kai, I'm not aware of a specific coding standard we have on this topic, and I don't have a strong opinion either way. I think it can be valuable sometimes for public APIs to document each subclass in a separate @throws in the JavaDocs if we expect the caller might want to handle each case differently. As you said, this is less relevant for the actual throws clause in the code. Regarding IOException, there is a great tendency in the Hadoop codebase to subclass IOException, even if the failure is not obviously related to I/O. This is partly a consequence of the way error handling is implemented in the RPC framework. The RemoteException class is tightly coupled to IOException for its "unwrap" logic to pull out the root cause of the exception. As a result, any exception that needs to cross a process boundary over RPC generally ends up needing to subclass IOException. I don't think this is something that can be changed easily. --Chris Nauroth On 12/28/15, 7:46 PM, "Zheng, Kai" <kai.zh...@intel.com> wrote: > >Hi, > >Would it be good to add to throw a subtype exception in addition to a >more general exception already there in the thrown list? Is it some >coding style that's required to follow or developers can do it as they >like? > >It's often seen that only the general exception is in the thrown list, >like in ReadableByteChannel#read in Oracle Java, where in fact the >method may throw 4 subtype exceptions. >http://docs.oracle.com/javase/7/docs/api/java/nio/channels/ReadableByte >Cha >nnel.html >int read(ByteBuffer dst) throws IOException > >In some of Hadoop codes, it goes otherwise. For example, in Hdfs.java, >ref. the following codes, note that in the thrown list, all the former >3 exceptions extends IOException. >=== > @Override > public RemoteIterator listStatusIterator(final Path f) >throws AccessControlException, FileNotFoundException, >UnresolvedLinkException, IOException { >return new DirListingIterator(f, false) { > > @Override > public FileStatus next() throws IOException { >return getNext().makeQualified(getUri(), f); > } >}; > } >=== > >Doing this way, I thought the benefit could be the caller can see >clearly what kinds of exceptions could be thrown, however in this case >we can achieve it by adding the Javadoc. Sure there is no hurt but it >looks kinds of dummy in some IDE, hinting some information like "There >is a more general exception ... in the thrown list already". Given we >would list all the possible exceptions, it's a little hard to maintain >the codes. > >By the way, in the above example, it looks a little weird that >AccessControlException and UnresolvedLinkException both extend >IOException. There is a little reason to do that for the latter, but >for the former, it looks rather like an issue. > >Please help clarify if I missed something. Thanks. > >Regards, >Kai
Question about subtype exceptions in the thrown list in addition to a more general one
Hi, Would it be good to add to throw a subtype exception in addition to a more general exception already there in the thrown list? Is it some coding style that's required to follow or developers can do it as they like? It's often seen that only the general exception is in the thrown list, like in ReadableByteChannel#read in Oracle Java, where in fact the method may throw 4 subtype exceptions. http://docs.oracle.com/javase/7/docs/api/java/nio/channels/ReadableByteChannel.html int read(ByteBuffer dst) throws IOException In some of Hadoop codes, it goes otherwise. For example, in Hdfs.java, ref. the following codes, note that in the thrown list, all the former 3 exceptions extends IOException. === @Override public RemoteIterator listStatusIterator(final Path f) throws AccessControlException, FileNotFoundException, UnresolvedLinkException, IOException { return new DirListingIterator(f, false) { @Override public FileStatus next() throws IOException { return getNext().makeQualified(getUri(), f); } }; } === Doing this way, I thought the benefit could be the caller can see clearly what kinds of exceptions could be thrown, however in this case we can achieve it by adding the Javadoc. Sure there is no hurt but it looks kinds of dummy in some IDE, hinting some information like "There is a more general exception ... in the thrown list already". Given we would list all the possible exceptions, it's a little hard to maintain the codes. By the way, in the above example, it looks a little weird that AccessControlException and UnresolvedLinkException both extend IOException. There is a little reason to do that for the latter, but for the former, it looks rather like an issue. Please help clarify if I missed something. Thanks. Regards, Kai
RE: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk
Non-binding +1 According to our extensive performance tests, striping + ISA-L coder based erasure coding not only can save storage, but also can increase the throughput of a client or a cluster. It will be a great addition to HDFS and its users. Based on the latest branch codes, we also observed it's very reliable in the concurrent tests. We'll provide the perf test report after it's sorted out and hope it helps. Thanks! Regards, Kai -Original Message- From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] Sent: Wednesday, September 23, 2015 8:50 AM To: hdfs-...@hadoop.apache.org; common-dev@hadoop.apache.org Subject: Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk +1 Great addition to HDFS. Thanks all contributors for the nice work. Regards, Uma On 9/22/15, 3:40 PM, "Zhe Zhang"wrote: >Hi, > >I'd like to propose a vote to merge the HDFS-7285 feature branch back >to trunk. Since November 2014 we have been designing and developing >this feature under the umbrella JIRAs HDFS-7285 and HADOOP-11264, and >have committed approximately 210 patches. > >The HDFS-7285 feature branch was created to support the first phase of >HDFS erasure coding (HDFS-EC). The objective of HDFS-EC is to >significantly reduce storage space usage in HDFS clusters. Instead of >always creating 3 replicas of each block with 200% storage space >overhead, HDFS-EC provides data durability through parity data blocks. >With most EC configurations, the storage overhead is no more than 50%. >Based on profiling results of production clusters, we decided to >support EC with the striped block layout in the first phase, so that >small files can be better handled. This means dividing each logical >HDFS file block into smaller units (striping cells) and spreading them >on a set of DataNodes in round-robin fashion. Parity cells are >generated for each stripe of original data cells. We have made changes >to NameNode, client, and DataNode to generalize the block concept and >handle the mapping between a logical file block and its internal >storage blocks. For further details please see the design doc on >HDFS-7285. >HADOOP-11264 focuses on providing flexible and high-performance codec >calculation support. > >The nightly Jenkins job of the branch has reported several successful >runs, and doesn't show new flaky tests compared with trunk. We have >posted several versions of the test plan including both unit testing >and cluster testing, and have executed most tests in the plan. The most >basic functionalities have been extensively tested and verified in >several real clusters with different hardware configurations; results >have been very stable. We have created follow-on tasks for more >advanced error handling and optimization under the umbrella HDFS-8031. >We also plan to implement or harden the integration of EC with existing >features such as WebHDFS, snapshot, append, truncate, hflush, hsync, >and so forth. > >Development of this feature has been a collaboration across many >companies and institutions. I'd like to thank J. Andreina, Takanobu >Asanuma, Vinayakumar B, Li Bo, Takuya Fukudome, Uma Maheswara Rao G, >Rui Li, Yi Liu, Colin McCabe, Xinwei Qin, Rakesh R, Gao Rui, Kai >Sasaki, Walter Su, Tsz Wo Nicholas Sze, Andrew Wang, Yong Zhang, Jing >Zhao, Hui Zheng and Kai Zheng for their code contributions and reviews. >Andrew and Kai Zheng also made fundamental contributions to the initial >design. Rui Li, Gao Rui, Kai Sasaki, Kai Zheng and many other >contributors have made great efforts in system testing. Many thanks go >to Weihua Jiang for proposing the JIRA, and ATM, Todd Lipcon, Silvius >Rus, Suresh, as well as many others for providing helpful feedbacks. > >Following the community convention, this vote will last for 7 days >(ending September 29th). Votes from Hadoop committers are binding but >non-binding votes are very welcome as well. And here's my non-binding +1. > >Thanks, >--- >Zhe Zhang
RE: Jira down :(
It's OK now. Regards, Kai -Original Message- From: Vinayakumar B [mailto:vinayakum...@apache.org] Sent: Friday, June 05, 2015 3:12 PM To: common-dev@hadoop.apache.org Subject: Re: Jira down :( Thanks Tsuyoshi. Regards, Vinay On Fri, Jun 5, 2015 at 12:38 PM, Tsuyoshi Ozawa oz...@apache.org wrote: Hi Vinay, status.apache.org told us that JIRA is down. http://status.apache.org/ I've sent an email to infra team in a few minutes ago. Regards, - Tsuyoshi On Fri, Jun 5, 2015 at 3:08 PM, Vinayakumar B vinayakum...@apache.org wrote: I am getting 502 error from Jira. Does anybody know whom to contact to resolve it? Regards, Vinay
RE: IMPORTANT: testing patches for branches
Thanks Allen for the great work. I tried in HADOOP-11847 (branch HDFS-7285) and it went well, very helpfully! Regards, Kai -Original Message- From: Allen Wittenauer [mailto:a...@altiscale.com] Sent: Thursday, April 23, 2015 7:22 PM To: common-dev@hadoop.apache.org Cc: hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: IMPORTANT: testing patches for branches On Apr 22, 2015, at 11:34 PM, Zheng, Kai kai.zh...@intel.com wrote: Hi Allen, This sounds great. Naming a patch foo-HDFS-7285.00.patch should get tested on the HDFS-7285 branch. Does it happen locally in developer's machine when running test-patch.sh, or also mean something in Hadoop Jenkins building when a JIRA becoming patch available? Thanks. Both, now that a fix has been committed last night (there was a bug in the Jenkins handling). Given a patch name or URL, Jenkins and even running locally will try a few different methods to figure out which branch to use out. Note that a branch name of 'gitX' where X is a valid git reference also works to force a patch to start at a particular commit. For local use, you'll want to use a 'spare' copy of the source tree via the -basedir option and use the -resetrepo flag. That will enable Jenkins-like behavior and gives it permission to make modifications and effectively nuke any changes in the source tree you point it at. (Basically the opposite of the -dirty-workspace flag). If you want to force a branch (for whatever reason, including where the branch can't be figured out), you can use the -branch option. If you don't use -resetrepo, test-patch.sh will warn that it thinks the wrong branch is being used but will push on anyway. In any case, the result of what it thinks the branch is/should be will be in the summary output at the bottom along with the git ref that it specifically used for the test.
RE: IMPORTANT: testing patches for branches
Hi Allen, This sounds great. Naming a patch foo-HDFS-7285.00.patch should get tested on the HDFS-7285 branch. Does it happen locally in developer's machine when running test-patch.sh, or also mean something in Hadoop Jenkins building when a JIRA becoming patch available? Thanks. Regards, Kai -Original Message- From: Allen Wittenauer [mailto:a...@altiscale.com] Sent: Thursday, April 23, 2015 3:35 AM To: common-dev@hadoop.apache.org Cc: yarn-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org Subject: IMPORTANT: testing patches for branches Hey gang, Just so everyone is aware, if you are working on a patch for either a feature branch or a major branch, if you name the patch with the branch name following the spec in HowToContribute (and a few other ways... test-patch tries to figure it out!), test-patch.sh *should* be switching the repo over to that branch for testing. For example, naming a patch foo-branch-2.01.patch should get tested on branch-2. Naming a patch foo-HDFS-7285.00.patch should get tested on the HDFS-7285 branch. This hopefully means that there should really be no more 'blind' +1's to patches that go to branches. The we only test against trunk argument is no longer valid. :)
RE: [RFE] Support MIT Kerberos localauth plugin API
Hello Leo/Liou And the plugin interface can be as simple as this function (error handling ignored here) ... I thought it's good to have the pluggable allowing to customize the method how to perform the mapping. You could open a JIRA for this. If you'd like to work on it and need help, please feel free to ask (me or the community), or discuss in the JIRA, as the community do. With the pluggable interface, you could provide a native implementation leveraging the MIT localauth plugin via JNI, just as it's done for user groups mapping provider. If you're looking for something pure in Java, as Allen said, the localauth plugin support isn't available in JRE as Java would not be so quick to catch up with latest Kerberos features. One possibility would be to leverage Apache Kerby, you can fire an issue request there and let's see how it works out then. https://issues.apache.org/jira/browse/DIRKRB-102 Regards, Kai -Original Message- From: Sunny Cheung [mailto:sunny.che...@centrify.com] Sent: Thursday, March 05, 2015 3:42 PM To: common-dev@hadoop.apache.org Cc: Leo Liou Subject: RE: [RFE] Support MIT Kerberos localauth plugin API Sorry I was not clear enough about the problem. Let me explain more here. Our problem is that normal user principal names can be very different from their Unix login. Some customers simply have arbitrary mapping between their Kerberos principals and Unix user accounts. For example, one customer has over 200K users on AD with Kerberos principals in format first name.last name@REALM (e.g. john@example.com). But their Unix names are in format userID or just ID (e.g. user123456, 123456). So, when Kerberos security is enabled on Hadoop clusters, how should we configure to authenticate these users from Hadoop clients? The current way is to use the hadoop.security.auth_to_local setting, e.g. from core-site.xml: property namehadoop.security.auth_to_local/name value RULE:[2:$1@$0]([jt]t@.*EXAMPLE.COM)s/.*/mapred/ RULE:[2:$1@$0]([nd]n@.*EXAMPLE.COM)s/.*/hdfs/ RULE:[2:$1@$0](hm@.*EXAMPLE.COM)s/.*/hbase/ RULE:[2:$1@$0](rs@.*EXAMPLE.COM)s/.*/hbase/ DEFAULT/value descriptionThe mapping from kerberos principal names to local OS user names./description /property These name translation rules can handle cases like mapping service accounts' principals (e.g. nn/host@REALM or dn/host@REALM to hdfs). But that is not scalable for normal users. There are just too many users to handle (as compared to the finite amount of service accounts). Therefore, we would like to ask if alternative name resolution plugin interface can be supported by Hadoop. It could be similar to the way alternative authentication plugin is supported for HTTP web-consoles [1]: property namehadoop.http.authentication.type/name valueorg.my.subclass.of.AltKerberosAuthenticationHandler/value /property And the plugin interface can be as simple as this function (error handling ignored here): String auth_to_local (String krb5Principal) { ... return unixName; } If this plugin interface is supported by Hadoop, then everyone can provide a plugin to support arbitrary mapping. This will be extremely useful when administrators need to tighten security on Hadoop with existing Kerberos infrastructure. References: [1] Authentication for Hadoop HTTP web-consoles http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/HttpAuthentication.html -Original Message- From: Allen Wittenauer [mailto:a...@altiscale.com] Sent: Tuesday, February 24, 2015 12:47 AM To: common-dev@hadoop.apache.org Subject: Re: [RFE] Support MIT Kerberos localauth plugin API The big question is whether or not Java's implementation of Kerberos supports it. If so, which JDK release. Java's implementation tends to run a bit behind MIT. Additionally, there is a general reluctance to move Hadoop's baseline Java version to something even supported until user outcry demands it. So I'd expect support to be a long way off. It's worth noting that trunk exposes the hadoop kerbname command to help out with auth_to_local mapping, BTW. On Feb 23, 2015, at 2:12 AM, Sunny Cheung sunny.che...@centrify.com wrote: Hi Hadoop Common developers, I am writing to seek your opinion about a feature request: support MIT Kerberos localauth plugin API [1]. Hadoop currently provides the hadoop.security.auth_to_local setting to map Kerberos principal to OS user account [2][3]. However, the regex-based mappings (which mimics krb5.conf auth_to_local) could be difficult to use in complex scenarios. Therefore, MIT Kerberos 1.12 added a plugin interface to control krb5_aname_to_localname and krb5_kuserok behavior. And system daemon SSSD (RHEL/Fedora) has already implemented a plugin to leverage this feature [4]. Is that possible for Hadoop to
RE: 2.7 status
Thanks Vinod for the hints. I have updated the both patches aligning with latest codes, and added more unit tests. The building results look reasonable. Thanks anyone that would give them more review and I would update in timely manner. Regards, Kai -Original Message- From: Vinod Kumar Vavilapalli [mailto:vino...@hortonworks.com] Sent: Tuesday, March 03, 2015 11:31 AM To: Zheng, Kai Cc: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; Hadoop Common; yarn-...@hadoop.apache.org Subject: Re: 2.7 status Kai, please ping the reviewers that were already looking at your patches before. If the patches go in by end of this week, we can include them. Thanks, +Vinod On Mar 2, 2015, at 7:04 PM, Zheng, Kai kai.zh...@intel.com wrote: Is it interested to get the following issues in the release ? Thanks ! HADOOP-10670 HADOOP-10671 Regards, Kai -Original Message- From: Yongjun Zhang [mailto:yzh...@cloudera.com] Sent: Monday, March 02, 2015 4:46 AM To: hdfs-...@hadoop.apache.org Cc: Vinod Kumar Vavilapalli; Hadoop Common; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: 2.7 status Hi, Thanks for working on 2.7 release. Currently the fallback from KerberosAuthenticator to PseudoAuthenticator is enabled by default in a hardcoded way. HAOOP-10895 changes the default and requires applications (such as oozie) to set a config property or call an API to enable the fallback. This jira has been reviewed, and almost ready to get in. However, there is a concern that we have to change the relevant applications. Please see my comment here: https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14 321823page=com.atlassian.jira.plugin.system.issuetabpanels:comment-ta bpanel#comment-14321823 Any of your comments will be highly appreciated. This jira was postponed from 2.6. I think it should be no problem to skip 2.7. But your comments would help us to decide what to do with this jira for future releases. Thanks. --Yongjun On Sun, Mar 1, 2015 at 11:58 AM, Arun Murthy a...@hortonworks.com wrote: Sounds good, thanks for the help Vinod! Arun From: Vinod Kumar Vavilapalli Sent: Sunday, March 01, 2015 11:43 AM To: Hadoop Common; Jason Lowe; Arun Murthy Subject: Re: 2.7 status Agreed. How about we roll an RC end of this week? As a Java 7+ release with features, patches that already got in? Here's a filter tracking blocker tickets - https://issues.apache.org/jira/issues/?filter=12330598. Nine open now. +Arun Arun, I'd like to help get 2.7 out without further delay. Do you mind me taking over release duties? Thanks, +Vinod From: Jason Lowe jl...@yahoo-inc.com.INVALID Sent: Friday, February 13, 2015 8:11 AM To: common-dev@hadoop.apache.org Subject: Re: 2.7 status I'd like to see a 2.7 release sooner than later. It has been almost 3 months since Hadoop 2.6 was released, and there have already been 634 JIRAs committed to 2.7. That's a lot of changes waiting for an official release. https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2 C hdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resolut i on%3DFixed Jason From: Sangjin Lee sj...@apache.org To: common-dev@hadoop.apache.org common-dev@hadoop.apache.org Sent: Tuesday, February 10, 2015 1:30 PM Subject: 2.7 status Folks, What is the current status of the 2.7 release? I know initially it started out as a java-7 only release, but looking at the JIRAs that is very much not the case. Do we have a certain timeframe for 2.7 or is it time to discuss it? Thanks, Sangjin
RE: Looking to a Hadoop 3 release
Might I have some comments for this, just providing my thought. Thanks. If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time. Not only for down streamers to align with the long term release, but also for contributors like me to align with their future effort, maybe. In addition to the JDK8 support and classpath isolation, might we add more possible candidate considerations. How would you like this one, HADOOP-9797 Pluggable and compatible UGI change ? https://issues.apache.org/jira/browse/HADOOP-9797 The benefits: 1) allow multiple login sessions/contexts and authentication methods to be used in the same Java application/process without conflicts, providing good isolation by getting rid of globals and statics. 2) allow to pluggable new authentication methods for UGI, in modular, manageable and maintainable manner. Another, we would also push the first release of Apache Kerby, preparing for a strong dedicated and clean Kerberos library in Java for both client and KDC sides, and by leveraging the library, update Hadoop-MiniKDC and perform more security tests. https://issues.apache.org/jira/browse/DIRKRB-102 Hope this makes sense. Thanks. Regards, Kai -Original Message- From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack Sent: Thursday, March 05, 2015 2:47 AM To: common-dev@hadoop.apache.org Cc: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Looking to a Hadoop 3 release In general +1 on 3.0.0. Its time. If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time. While two big items have been called out as possible incompatible changes, and there is ongoing discussion as to whether they are or not*, is there any chance of getting a longer list of big differences between the branches? In particular I'd be interested in improvements that are 'off' by default that would be better defaulted 'on'. Thanks, St.Ack * Let me note that 'compatible' around these parts is a trampled concept seemingly open to interpretation with a definition that is other than prevails elsewhere in software. See Allen's list above, and in our downstream project, the recent HBASE-13149 HBase server MR tools are broken on Hadoop 2.5+ Yarn, among others. Let 3.x be incompatible with 2.x if only so we can leave behind all current notions of 'compatibility' and just start over (as per Allen). On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
RE: Looking to a Hadoop 3 release
JDK8 support is in the consideration, looks like many issues were reported and resolved already. https://issues.apache.org/jira/browse/HADOOP-11090 -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, March 03, 2015 7:20 AM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Looking to a Hadoop 3 release Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
RE: 2.7 status
Is it interested to get the following issues in the release ? Thanks ! HADOOP-10670 HADOOP-10671 Regards, Kai -Original Message- From: Yongjun Zhang [mailto:yzh...@cloudera.com] Sent: Monday, March 02, 2015 4:46 AM To: hdfs-...@hadoop.apache.org Cc: Vinod Kumar Vavilapalli; Hadoop Common; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: 2.7 status Hi, Thanks for working on 2.7 release. Currently the fallback from KerberosAuthenticator to PseudoAuthenticator is enabled by default in a hardcoded way. HAOOP-10895 changes the default and requires applications (such as oozie) to set a config property or call an API to enable the fallback. This jira has been reviewed, and almost ready to get in. However, there is a concern that we have to change the relevant applications. Please see my comment here: https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14321823page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14321823 Any of your comments will be highly appreciated. This jira was postponed from 2.6. I think it should be no problem to skip 2.7. But your comments would help us to decide what to do with this jira for future releases. Thanks. --Yongjun On Sun, Mar 1, 2015 at 11:58 AM, Arun Murthy a...@hortonworks.com wrote: Sounds good, thanks for the help Vinod! Arun From: Vinod Kumar Vavilapalli Sent: Sunday, March 01, 2015 11:43 AM To: Hadoop Common; Jason Lowe; Arun Murthy Subject: Re: 2.7 status Agreed. How about we roll an RC end of this week? As a Java 7+ release with features, patches that already got in? Here's a filter tracking blocker tickets - https://issues.apache.org/jira/issues/?filter=12330598. Nine open now. +Arun Arun, I'd like to help get 2.7 out without further delay. Do you mind me taking over release duties? Thanks, +Vinod From: Jason Lowe jl...@yahoo-inc.com.INVALID Sent: Friday, February 13, 2015 8:11 AM To: common-dev@hadoop.apache.org Subject: Re: 2.7 status I'd like to see a 2.7 release sooner than later. It has been almost 3 months since Hadoop 2.6 was released, and there have already been 634 JIRAs committed to 2.7. That's a lot of changes waiting for an official release. https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2C hdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resoluti on%3DFixed Jason From: Sangjin Lee sj...@apache.org To: common-dev@hadoop.apache.org common-dev@hadoop.apache.org Sent: Tuesday, February 10, 2015 1:30 PM Subject: 2.7 status Folks, What is the current status of the 2.7 release? I know initially it started out as a java-7 only release, but looking at the JIRAs that is very much not the case. Do we have a certain timeframe for 2.7 or is it time to discuss it? Thanks, Sangjin
RE: Looking to a Hadoop 3 release
Sorry for the bad. I thought it was sending to my colleagues. By the way, for the JDK8 support, we (Intel) would like to investigate further and help, thanks. Regards, Kai -Original Message- From: Zheng, Kai Sent: Tuesday, March 03, 2015 8:49 AM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: RE: Looking to a Hadoop 3 release JDK8 support is in the consideration, looks like many issues were reported and resolved already. https://issues.apache.org/jira/browse/HADOOP-11090 -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, March 03, 2015 7:20 AM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Looking to a Hadoop 3 release Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
RE: Anyone know how to mock a secured hdfs for unit test?
Hi Chris, Thanks for your great info. I would paste it in the JIRA for future reference if I or somebody else get the chance to work on it. Regards, Kai -Original Message- From: Chris Nauroth [mailto:cnaur...@hortonworks.com] Sent: Saturday, June 28, 2014 4:27 AM To: secur...@hadoop.apache.org Cc: yarn-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; hdfs-iss...@hadoop.apache.org; yarn-iss...@hadoop.apache.org; mapreduce-...@hadoop.apache.org Subject: Re: Anyone know how to mock a secured hdfs for unit test? Hi David and Kai, There are a couple of challenges with this, but I just figured out a pretty decent setup while working on HDFS-2856. That code isn't committed yet, but if you open patch version 5 attached to that issue and look for the TestSaslDataTransfer class, then you'll see how it works. Most of the logic for bootstrapping a MiniKDC and setting up the right HDFS configuration properties is in an abstract base class named SaslDataTransferTestCase. I hope this helps. There are a few other open issues out there related to tests in secure mode. I know of HDFS-4312 and HDFS-5410. It would be great to get more regular test coverage with something that more closely approximates a secured deployment. Chris Nauroth Hortonworks http://hortonworks.com/ On Thu, Jun 26, 2014 at 7:27 AM, Zheng, Kai kai.zh...@intel.com wrote: Hi David, Quite some time ago I opened HADOOP-9952 and planned to create secured MiniClusters by making use of MiniKDC. Unfortunately since then I didn't get the chance to work on it yet. If you need something like that and would contribute, please let me know and see if anything I can help with. Thanks. Regards, Kai -Original Message- From: Liu, David [mailto:liujion...@gmail.com] Sent: Thursday, June 26, 2014 10:12 PM To: hdfs-...@hadoop.apache.org; hdfs-iss...@hadoop.apache.org; yarn-...@hadoop.apache.org; yarn-iss...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; secur...@hadoop.apache.org Subject: Anyone know how to mock a secured hdfs for unit test? Hi all, I need to test my code which read data from secured hdfs, is there any library to mock secured hdfs, can minihdfscluster do the work? Any suggestion is appreciated. Thanks -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
RE: [DISCUSS] Security Efforts and Branching
think it works for us to have HAS as a community effort as the TokenAuth framework and we both contribute on the implementation? To proceed, I would try to align between us, complementing your proposal and addressing your concerns as follows. = Iteration Endstate = Besides what you mentioned from user view, how about adding this consideration: Additionally, the initial iteration would also lay down the ground TokenAuth framework with fine defined APIs, protocols, flows and core facilities for implementations. The framework should avoid rework and big change for future implementations. = Terminology and Naming = It would be great if we can unify the related terminologies in this effort, at least in the framework level. This could be probably achieved in the process of defining relevant APIs for the TokenAuth framework. = Project scope = It's great we have the common list in scope for the first iteration as you mentioned as follows: Usecases: client types: REST, CLI, UI authentication types: Simple, Kerberos, authentication/LDAP, federation/SAML We might also consider OAuth 2.0 support. Anyway please note by defining this in-scope list we know what's required as must-have in the iteration as enforcement of our consensus, however it should not limit any relevant parties to contribute more meanwhile unless it does not be appropriate at the time. = Branch = As you mentioned we may have different branches for different features considering merge. Another approach is just having one branch with relevant security features, the review and merge work can still be JIRA based. 1. Based on your proposal, how about the following as the branch(es) scope: 1) Pluggable Authentication and Token based SSO 2) CryptoFS for volume level encryption (HCFS) 3) Pluggable UGI change 4) Key management system 5) Unified authorization 2. With the above scope in mind, a candidate branch name could be like 'security-branch' instead of 'tokenauth-branch'. How about creating the branch now if we don't have other concerns? 3. Check-in philosophy. Agree with your proposal with slightly concerns: In terms of check-in philosophy, we should take a review then check-in approach to the branch with lazy consensus - wherein we do not need to explicitly +1 every check-in to the branch but we will honor any -1's with discussion to resolve before checking in. This will provide us each with the opportunity to track the work being done and ensure that we understand it and find that it meets the intended goals. We might need explicit +1 otherwise we would need define a time window pending to wait when to check-in. One issue we would like to clarify, does voting also include the security branch committers. = JIRA = We might not need additional umbrella JIRA for now since we already have HADOOP-9392 and HADOOP-9533. By the way I would suggest we use existing feature JIRAs to discuss relevant and specific issues on the going. Leveraging these JIRAs we might avoid too much details in the common-dev thread and it's also easy to track relevant discussions. I agree it's a good point to start with an inventory of the existing JIRAs. We can do that if there're no other concerns. We would provide the full list of breakdown JIRAs and attach it in HADOOP-9392 then for further collaboration. Regards, Kai From: larry mccay [mailto:larry.mc...@gmail.com] Sent: Wednesday, September 18, 2013 6:27 AM To: Zheng, Kai; Chen, Haifeng; common-dev@hadoop.apache.org Subject: Re: [DISCUSS] Security Efforts and Branching All - I apologize for not following up sooner. I have been heads down on some other matters that required my attention. It seems that it may be easier to move forward by gaining consensus a little bit at a time rather than trying to hit the ground running where the other thread left off. Would it be agreeable to everyone to start with an inventory of the existing Jiras that have patches available or nearly available so that we can determine what concrete bits we have to start with? Once we get that done, we can try and frame a set of goals to to make up the initial iteration and determine what from the inventory will be leverage in that iteration. Does this sound reasonable to everyone? Would anyone like to propose another starting point? thanks, --larry On Wed, Sep 4, 2013 at 4:26 PM, larry mccay larry.mc...@gmail.commailto:larry.mc...@gmail.com wrote: It doesn't look like the PDF made it all the way through to the archives and maybe even to recipients - so the following is the text version of the iteration-1 draft: Iteration 1: Pluggable User Authentication and Federation Introduction The intent of this effort is to bootstrap the development of pluggable token-based authentication mechanisms to support certain goals of enterprise authentication integrations. By restricting the scope of this effort, we hope to provide immediate benefit to the community while keeping the initial contribution
RE: [DISCUSS] Security Efforts and Branching
central servers like HAS server and HSSO server? If not, do you think it works for us to have HAS as a community effort as the TokenAuth framework and we both contribute on the implementation? To proceed, I would try to align between us, complementing your proposal and addressing your concerns as follows. = Iteration Endstate = Besides what you mentioned from user view, how about adding this consideration: Additionally, the initial iteration would also lay down the ground TokenAuth framework with fine defined APIs, protocols, flows and core facilities for implementations. The framework should avoid rework and big change for future implementations. = Terminology and Naming = It would be great if we can unify the related terminologies in this effort, at least in the framework level. This could be probably achieved in the process of defining relevant APIs for the TokenAuth framework. = Project scope = It's great we have the common list in scope for the first iteration as you mentioned as follows: Usecases: client types: REST, CLI, UI authentication types: Simple, Kerberos, authentication/LDAP, federation/SAML We might also consider OAuth 2.0 support. Anyway please note by defining this in-scope list we know what's required as must-have in the iteration as enforcement of our consensus, however it should not limit any relevant parties to contribute more meanwhile unless it does not be appropriate at the time. = Branch = As you mentioned we may have different branches for different features considering merge. Another approach is just having one branch with relevant security features, the review and merge work can still be JIRA based. 1. Based on your proposal, how about the following as the branch(es) scope: 1) Pluggable Authentication and Token based SSO 2) CryptoFS for volume level encryption (HCFS) 3) Pluggable UGI change 4) Key management system 5) Unified authorization 2. With the above scope in mind, a candidate branch name could be like 'security-branch' instead of 'tokenauth-branch'. How about creating the branch now if we don't have other concerns? 3. Check-in philosophy. Agree with your proposal with slightly concerns: In terms of check-in philosophy, we should take a review then check-in approach to the branch with lazy consensus - wherein we do not need to explicitly +1 every check-in to the branch but we will honor any -1's with discussion to resolve before checking in. This will provide us each with the opportunity to track the work being done and ensure that we understand it and find that it meets the intended goals. We might need explicit +1 otherwise we would need define a time window pending to wait when to check-in. One issue we would like to clarify, does voting also include the security branch committers. = JIRA = We might not need additional umbrella JIRA for now since we already have HADOOP-9392 and HADOOP-9533. By the way I would suggest we use existing feature JIRAs to discuss relevant and specific issues on the going. Leveraging these JIRAs we might avoid too much details in the common-dev thread and it's also easy to track relevant discussions. I agree it's a good point to start with an inventory of the existing JIRAs. We can do that if there're no other concerns. We would provide the full list of breakdown JIRAs and attach it in HADOOP-9392 then for further collaboration. Regards, Kai From: larry mccay [mailto:larry.mc...@gmail.com] Sent: Wednesday, September 18, 2013 6:27 AM To: Zheng, Kai; Chen, Haifeng; common-dev@hadoop.apache.org Subject: Re: [DISCUSS] Security Efforts and Branching All - I apologize for not following up sooner. I have been heads down on some other matters that required my attention. It seems that it may be easier to move forward by gaining consensus a little bit at a time rather than trying to hit the ground running where the other thread left off. Would it be agreeable to everyone to start with an inventory of the existing Jiras that have patches available or nearly available so that we can determine what concrete bits we have to start with? Once we get that done, we can try and frame a set of goals to to make up the initial iteration and determine what from the inventory will be leverage in that iteration. Does this sound reasonable to everyone? Would anyone like to propose another starting point? thanks, --larry On Wed, Sep 4, 2013 at 4:26 PM, larry mccay larry.mc...@gmail.commailto:larry.mc...@gmail.com wrote: It doesn't look like the PDF made it all the way through to the archives and maybe even to recipients - so the following is the text version of the iteration-1 draft: Iteration 1: Pluggable User Authentication and Federation Introduction The intent of this effort is to bootstrap the development of pluggable token-based authentication mechanisms to support certain goals of enterprise authentication integrations. By restricting the scope of this effort, we hope to provide immediate
RE: [DISCUSS] Hadoop SSO/Token Server Components
Got it Suresh. So I guess HADOOP-9797 (and the family) for the UGI change would be a fit to this rule right. The refactoring is improving and cleaning UGI, also preparing for TokenAuth feature. According to this rule the changes would be in trunk first. Thanks for your guidance. Regards, Kai -Original Message- From: Suresh Srinivas [mailto:sur...@hortonworks.com] Sent: Thursday, September 05, 2013 2:42 PM To: common-dev@hadoop.apache.org Subject: Re: [DISCUSS] Hadoop SSO/Token Server Components One aside: if you come across a bug, please try to fix it upstream and then merge into the feature branch rather than cherry-picking patches or only fixing it on the branch. It becomes very awkward to track. -C Related to this, when refactoring the code, generally required for large feature development, consider first refactoring in trunk and then make additional changes for the feature in the feature branch. This helps a lot in being able to merge the trunk to feature branch periodically. This will also help in keeping the change for merging feature to trunk small and easier reviews. Regards, Suresh -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
RE: Need help for building haddop source on windows 7
Perhaps you could try the 'branch-trunk-win' branch, and look for some building instructions in it. Not sure this works or not, though. -Original Message- From: Ranjan Dutta [mailto:rdbmsdata.ran...@gmail.com] Sent: Thursday, September 05, 2013 3:43 PM To: common-dev@hadoop.apache.org Subject: Need help for building haddop source on windows 7 Hi , I want to build hadoop source on Windows 7 . Can anybody share a doucument related to source build. Thanks Ranjan
RE: [DISCUSS] Hadoop SSO/Token Server Components
approach or an HSSO vs TAS discussion. Your latest design revision actually makes it clear that you are now targeting exactly what was described as HSSO - so comparing and contrasting is not going to add any value. What we need you to do at this point, is to look at those high-level components described on this thread and comment on whether we need additional components or any that are listed that don't seem necessary to you and why. In other words, we need to define and agree on the work that has to be done. We also need to determine those components that need to be done before anything else can be started. I happen to agree with Brian that #4 Hadoop SSO Tokens are central to all the other components and should probably be defined and POC'd in short order. Personally, I think that continuing the separation of 9533 and 9392 will do this effort a disservice. There doesn't seem to be enough differences between the two to justify separate jiras anymore. It may be best to file a new one that reflects a single vision without the extra cruft that has built up in either of the existing ones. We would certainly reference the existing ones within the new one. This approach would align with the spirit of the discussions up to this point. I am prepared to start a discussion around the shape of the two Hadoop SSO tokens: identity and access. If this is what others feel the next topic should be. If we can identify a jira home for it, we can do it there - otherwise we can create another DISCUSS thread for it. thanks, --larry On Jul 3, 2013, at 2:39 PM, Zheng, Kai kai.zh...@intel.com wrote: Hi Larry, Thanks for the update. Good to see that with this update we are now aligned on most points. I have also updated our TokenAuth design in HADOOP-9392. The new revision incorporates feedback and suggestions in related discussion with the community, particularly from Microsoft and others attending the Security design lounge session at the Hadoop summit. Summary of the changes: 1.Revised the approach to now use two tokens, Identity Token plus Access Token, particularly considering our authorization framework and compatibility with HSSO; 2.Introduced Authorization Server (AS) from our authorization framework into the flow that issues access tokens for clients with identity tokens to access services; 3.Refined proxy access token and the proxy/impersonation flow; 4.Refined the browser web SSO flow regarding access to Hadoop web services; 5.Added Hadoop RPC access flow regard -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) -- Alejandro Iteration1PluggableUserAuthenticationandFederation.pdf -- Alejandro -- Alejandro -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
RE: [DISCUSS] Hadoop SSO/Token Server Components
for sure, there were some key differences that didn't just disappear. Subsequent discussion will make that clear. I also disagree with your characterization that we have simply endorsed all of the design decisions of the so-called HSSO, this is taking a mile from an inch. We are here to engage in a collaborative process as peers. I've been encouraged by the spirit of the discussions up to this point and hope that can continue beyond one design summit. On Wed, Jul 3, 2013 at 1:10 PM, Larry McCay lmc...@hortonworks.com wrote: Hi Kai - I think that I need to clarify something... This is not an update for 9533 but a continuation of the discussions that are focused on a fresh look at a SSO for Hadoop. We've agreed to leave our previous designs behind and therefore we aren't really seeing it as an HSSO layered on top of TAS approach or an HSSO vs TAS discussion. Your latest design revision actually makes it clear that you are now targeting exactly what was described as HSSO - so comparing and contrasting is not going to add any value. What we need you to do at this point, is to look at those high-level components described on this thread and comment on whether we need additional components or any that are listed that don't seem necessary to you and why. In other words, we need to define and agree on the work that has to be done. We also need to determine those components that need to be done before anything else can be started. I happen to agree with Brian that #4 Hadoop SSO Tokens are central to all the other components and should probably be defined and POC'd in short order. Personally, I think that continuing the separation of 9533 and 9392 will do this effort a disservice. There doesn't seem to be enough differences between the two to justify separate jiras anymore. It may be best to file a new one that reflects a single vision without the extra cruft that has built up in either of the existing ones. We would certainly reference the existing ones within the new one. This approach would align with the spirit of the discussions up to this point. I am prepared to start a discussion around the shape of the two Hadoop SSO tokens: identity and access. If this is what others feel the next topic should be. If we can identify a jira home for it, we can do it there - otherwise we can create another DISCUSS thread for it. thanks, --larry On Jul 3, 2013, at 2:39 PM, Zheng, Kai kai.zh...@intel.com wrote: Hi Larry, Thanks for the update. Good to see that with this update we are now aligned on most points. I have also updated our TokenAuth design in HADOOP-9392. The new revision incorporates feedback and suggestions in related discussion with the community, particularly from Microsoft and others attending the Security design lounge session at the Hadoop summit. Summary of the changes: 1.Revised the approach to now use two tokens, Identity Token plus Access Token, particularly considering our authorization framework and compatibility with HSSO; 2.Introduced Authorization Server (AS) from our authorization framework into the flow that issues access tokens for clients with identity tokens to access services; 3.Refined proxy access token and the proxy/impersonation flow; 4.Refined the browser web SSO flow regarding access to Hadoop web services; 5.Added Hadoop RPC access flow regard -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) -- Alejandro
RE: [DISCUSS] Hadoop SSO/Token Server Components
Hi Larry, Our design from its first revision focuses on and provides comprehensive support to allow pluggable authentication mechanisms based on a common token, trying to address single sign on issues across the ecosystem to support access to Hadoop services via RPC, REST, and web browser SSO flow. The updated design doc adds even more texts and flows to explain or illustrate these existing items in details as requested by some on the JIRA. Additional to the identity token we had proposed, we adopted access token and adapted the approach not only for sake of making TokenAuth compatible with HSSO, but also for better support of fine grained access control, and seamless integration with our authorization framework and even 3rd party authorization service like OAuth Authorization Server. We regard these as important because Hadoop is evolving into an enterprise and cloud platform that needs a complete authN and authZ solution and without this support we would need future rework to complete the solution. Since you asked about the differences between TokenAuth and HSSO, here are some key ones: TokenAuth supports TAS federation to allow clients to access multiple clusters without a centralized SSO server while HSSO provides a centralized SSO server for multiple clusters. TokenAuth integrates authorization framework with auditing support in order to provide a complete solution for enterprise data access security. This allows administrators to administrate security polices centrally and have the polices be enforced consistently across components in the ecosystem in a pluggable way that supports different authorization models like RBAC, ABAC and even XACML standards. TokenAuth targets support for domain based authN authZ to allow multi-tenant deployments. Authentication and authorization rules can be configured and enforced per domain, which allows organizations to manage their individual policies separately while sharing a common large pool of resources. TokenAuth addresses proxy/impersonation case with flow as Tianyou mentioned, where a service can proxy client to access another service in a secured and constrained way. Regarding token based authentication plus SSO and unified authorization framework, HADOOP-9392 and HADOOP-9466 let's continue to use these as umbrella JIRAs for these efforts. HSSO targets support for centralized SSO server for multiple clusters and as we have pointed out before is a nice subset of the work proposed on HADOOP-9392. Let's align these two JIRAs and address the question Kevin raised multiple times in 9392/9533 JIRAs, How can HSSO and TAS work together? What is the relationship?. The design update I provided was meant to provide the necessary details so we can nail down that relationship and collaborate on the implementation of these JIRAs. As you have also confirmed, this design aligns with related community discussions, so let's continue our collaborative effort to contribute code to these JIRAs. Regards, Kai -Original Message- From: Larry McCay [mailto:lmc...@hortonworks.com] Sent: Thursday, July 04, 2013 4:10 AM To: Zheng, Kai Cc: common-dev@hadoop.apache.org Subject: Re: [DISCUSS] Hadoop SSO/Token Server Components Hi Kai - I think that I need to clarify something... This is not an update for 9533 but a continuation of the discussions that are focused on a fresh look at a SSO for Hadoop. We've agreed to leave our previous designs behind and therefore we aren't really seeing it as an HSSO layered on top of TAS approach or an HSSO vs TAS discussion. Your latest design revision actually makes it clear that you are now targeting exactly what was described as HSSO - so comparing and contrasting is not going to add any value. What we need you to do at this point, is to look at those high-level components described on this thread and comment on whether we need additional components or any that are listed that don't seem necessary to you and why. In other words, we need to define and agree on the work that has to be done. We also need to determine those components that need to be done before anything else can be started. I happen to agree with Brian that #4 Hadoop SSO Tokens are central to all the other components and should probably be defined and POC'd in short order. Personally, I think that continuing the separation of 9533 and 9392 will do this effort a disservice. There doesn't seem to be enough differences between the two to justify separate jiras anymore. It may be best to file a new one that reflects a single vision without the extra cruft that has built up in either of the existing ones. We would certainly reference the existing ones within the new one. This approach would align with the spirit of the discussions up to this point. I am prepared to start a discussion around the shape of the two Hadoop SSO tokens: identity and access. If this is what others feel the next topic should
RE: [DISCUSS] Hadoop SSO/Token Server Components
Hi Larry, Thanks for the update. Good to see that with this update we are now aligned on most points. I have also updated our TokenAuth design in HADOOP-9392. The new revision incorporates feedback and suggestions in related discussion with the community, particularly from Microsoft and others attending the Security design lounge session at the Hadoop summit. Summary of the changes: 1.Revised the approach to now use two tokens, Identity Token plus Access Token, particularly considering our authorization framework and compatibility with HSSO; 2.Introduced Authorization Server (AS) from our authorization framework into the flow that issues access tokens for clients with identity tokens to access services; 3.Refined proxy access token and the proxy/impersonation flow; 4.Refined the browser web SSO flow regarding access to Hadoop web services; 5.Added Hadoop RPC access flow regarding CLI clients accessing Hadoop services via RPC/SASL; 6.Added client authentication integration flow to illustrate how desktop logins can be integrated into the authentication process to TAS to exchange identity token; 7.Introduced fine grained access control flow from authorization framework, I have put it in appendices section for the reference; 8.Added a detailed flow to illustrate Hadoop Simple authentication over TokenAuth, in the appendices section; 9.Added secured task launcher in appendices as possible solutions for Windows platform; 10.Removed low level contents, and not so relevant parts into appendices section from the main body. As we all think about how to layer HSSO on TAS in TokenAuth framework, please take some time to look at the doc and then let's discuss the gaps we might have. I would like to discuss these gaps with focus on the implementations details so we are all moving towards getting code done. Let's continue this part of the discussion in HADOOP-9392 to allow for better tracking on the JIRA itself. For discussions related to Centralized SSO server, suggest we continue to use HADOOP-9533 to consolidate all discussion related to that JIRA. That way we don't need extra umbrella JIRAs. I agree we should speed up these discussions, agree on some of the implementation specifics so both us can get moving on the code while not stepping on each other in our work. Look forward to your comments and comments from others in the community. Thanks. Regards, Kai -Original Message- From: Larry McCay [mailto:lmc...@hortonworks.com] Sent: Wednesday, July 03, 2013 4:04 AM To: common-dev@hadoop.apache.org Subject: [DISCUSS] Hadoop SSO/Token Server Components All - As a follow up to the discussions that were had during Hadoop Summit, I would like to introduce the discussion topic around the moving parts of a Hadoop SSO/Token Service. There are a couple of related Jira's that can be referenced and may or may not be updated as a result of this discuss thread. https://issues.apache.org/jira/browse/HADOOP-9533 https://issues.apache.org/jira/browse/HADOOP-9392 As the first aspect of the discussion, we should probably state the overall goals and scoping for this effort: * An alternative authentication mechanism to Kerberos for user authentication * A broader capability for integration into enterprise identity and SSO solutions * Possibly the advertisement/negotiation of available authentication mechanisms * Backward compatibility for the existing use of Kerberos * No (or minimal) changes to existing Hadoop tokens (delegation, job, block access, etc) * Pluggable authentication mechanisms across: RPC, REST and webui enforcement points * Continued support for existing authorization policy/ACLs, etc * Keeping more fine grained authorization policies in mind - like attribute based access control - fine grained access control is a separate but related effort that we must not preclude with this effort * Cross cluster SSO In order to tease out the moving parts here are a couple high level and simplified descriptions of SSO interaction flow: +--+ +--+ credentials 1 | SSO | |CLIENT|--|SERVER| +--+ :tokens +--+ 2 | | access token V :requested resource +---+ |HADOOP | |SERVICE| +---+ The above diagram represents the simplest interaction model for an SSO service in Hadoop. 1. client authenticates to SSO service and acquires an access token a. client presents credentials to an authentication service endpoint exposed by the SSO server (AS) and receives a token representing the authentication event and verified identity b. client then presents the identity token from 1.a. to the token endpoint exposed by the SSO server (TGS) to request an access token to a particular Hadoop service and receives an access token 2. client presents the Hadoop access
RE: Fostering a Hadoop security dev community
In my view it should be for the whole ecosystem. One inspiration of this is to ease the collaboration and discussion for the work on going about token based authentication and SSO, which absolutely targets the ecosystem, although the coming up libraries and facilities might reside in hadoop common umbrella. -Original Message- From: Alejandro Abdelnur [mailto:t...@cloudera.com] Sent: Friday, June 21, 2013 1:32 AM To: common-dev@hadoop.apache.org Subject: Re: Fostering a Hadoop security dev community This sounds great, Is this restricted to the Hadoop project itself or the intention is to cover the whole Hadoop ecosystem? If the later, how are you planning to engage and sync up with the different projects? Thanks. On Thu, Jun 20, 2013 at 9:45 AM, Larry McCay lmc...@hortonworks.com wrote: It would be great to have dedicated resources like these. One thing missing for cross cutting concerns like security is a source of truth for a holistic view of the entire model. A dedicated wiki space would allow for this view and facilitate the filing of Jiras that align with the big picture. On Thu, Jun 20, 2013 at 12:31 PM, Kevin Minder kevin.min...@hortonworks.com wrote: Hi PMCs Everyone, There are a number of significant, complex and overlapping efforts underway to improve the Hadoop security model. Many involved are struggling to form this into a cohesive whole across the numerous Jiras and within the traffic of common-dev. There has been a suggestion made that having two additional pieces of infrastructure might help. 1) Establish a security-dev mailing list similar to hdfs-dev, yarn-dev, mapreduce-dev, etc. that would help us have more focused interaction on non-vulnerability security topics. I understand that this might devalue common-dev somewhat but the benefits might outweigh that. 2) Establish a corner of the wiki were cross cutting security design could be worked out more collaboratively than a doc rev upload mechanism. I fear if we don't have this we will end up collaborating outside Apache infrastructure which seems inappropriate. I understand the risk of losing context in the individual Jiras but again my sense is that the cohesiveness provided will outweigh the risk. I'm open to and interested in other suggestions for how others have solved these types of cross cutting collaboration challenges. Thanks. Kevin. -- Alejandro
RE: [jira] [Updated] (HADOOP-9477) posixGroups support for LDAP groups mapping service
Hi, Can anyone help take a look at this why the patch submitting won't trigger the checking of HADOOP-QA? Thanks. Regards, Kai -Original Message- From: Dapeng Sun (JIRA) [mailto:j...@apache.org] Sent: Sunday, May 05, 2013 11:00 PM To: Zheng, Kai Subject: [jira] [Updated] (HADOOP-9477) posixGroups support for LDAP groups mapping service [ https://issues.apache.org/jira/browse/HADOOP-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dapeng Sun updated HADOOP-9477: --- Target Version/s: 2.0.4-alpha Affects Version/s: 2.0.4-alpha Status: Patch Available (was: Open) posixGroups support for LDAP groups mapping service --- Key: HADOOP-9477 URL: https://issues.apache.org/jira/browse/HADOOP-9477 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Kai Zheng Assignee: Kai Zheng Fix For: 2.0.5-beta Attachments: HADOOP-9477.patch Original Estimate: 168h Remaining Estimate: 168h It would be nice to support posixGroups for LdapGroupsMapping service. Below is from current description for the provider: hadoop.security.group.mapping.ldap.search.filter.group: An additional filter to use when searching for LDAP groups. This should be changed when resolving groups against a non-Active Directory installation. posixGroups are currently not a supported group class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
RE: [jira] [Updated] (HADOOP-9477) posixGroups support for LDAP groups mapping service
Thank you Tianhong. Dapeng you can resubmit your patch again as Tianhong told. Thanks. Regards, Kai -Original Message- From: Wang Tianhong [mailto:wangt...@linux.vnet.ibm.com] Sent: Monday, May 06, 2013 3:47 PM To: common-dev@hadoop.apache.org Subject: RE: [jira] [Updated] (HADOOP-9477) posixGroups support for LDAP groups mapping service Hi Zheng You can resubmit the patch again. Sometimes the Jenkins may not work well. On Mon, 2013-05-06 at 07:36 +, Zheng, Kai wrote: Hi, Can anyone help take a look at this why the patch submitting won't trigger the checking of HADOOP-QA? Thanks. Regards, Kai -Original Message- From: Dapeng Sun (JIRA) [mailto:j...@apache.org] Sent: Sunday, May 05, 2013 11:00 PM To: Zheng, Kai Subject: [jira] [Updated] (HADOOP-9477) posixGroups support for LDAP groups mapping service [ https://issues.apache.org/jira/browse/HADOOP-9477?page=com.atlassian.j ira.plugin.system.issuetabpanels:all-tabpanel ] Dapeng Sun updated HADOOP-9477: --- Target Version/s: 2.0.4-alpha Affects Version/s: 2.0.4-alpha Status: Patch Available (was: Open) posixGroups support for LDAP groups mapping service --- Key: HADOOP-9477 URL: https://issues.apache.org/jira/browse/HADOOP-9477 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Kai Zheng Assignee: Kai Zheng Fix For: 2.0.5-beta Attachments: HADOOP-9477.patch Original Estimate: 168h Remaining Estimate: 168h It would be nice to support posixGroups for LdapGroupsMapping service. Below is from current description for the provider: hadoop.security.group.mapping.ldap.search.filter.group: An additional filter to use when searching for LDAP groups. This should be changed when resolving groups against a non-Active Directory installation. posixGroups are currently not a supported group class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
About FileBasedGroupMapping provider and Virtual Groups
Hi everyone, Before I open a JIRA, I'd like to know how you like it, a file based group mapping provider. The idea is as follows. 1. Have a new user group mapping provider such as FileBasedGroupMapping, which consumes a mapping file like below: $HADOOP_CONF/groupsMapping.txt: group1:user1,user2 group2:usuer3,user4 groupX:user5 group1 groupY:user6 group2 ... According to this file, the provider will get groups list for the users as: user1-group1,groupX #same for user2 user3-group2,groupY #same for user4 user5-groupX user6-groupY Note for user1, it gets group1 directly as above mapping file; then, since group1 belongs to groupX, user1 must also belong to groupX, so groupX is also user1's group. 2. So what's the benefits 1) It opens a door to role based access control for Hadoop. As you can see, in the mapping file we can define virtual groups (or roles) like groupX, groupY to hold users and other groups. Such virtual groups can just be used as real groups, for example, assign to HDFS file as owner group, assign to MR queue level acl list, or in HBase/Hive, grant them some privileges on databases, tables. 2) It makes it possible that in HDFS allows users from more than one groups to read/write some file/folder while disallows others not to. For example, if we want to allow only user1 plus users in group1, group2 to read/write into /data/secure, we can define a virtual group in the mapping file as secureGroup:user1 group1,group2, then chgrp for the folder to be secureGroup, and chmod for the folder as g+rw. 3) As told above, this makes much sense and not just try to resolve a corner case. As you may know, Hive supports HDFS as backend storage, and role based access control. Using Hive one can create a database and then grant some users/groups/roles with CREATE privilege on it. After that,some granted user (granted directly or via granted group or role) runs a cmd to create table in that database. It can pass the access control check in Hive but still may be failed by HDFS when Hive tries to create a file for the table in the database folder for the user, just due that the user hasn't write permission to the folder! To resolve such issues, we can easily achieve using this provider. 3) Minor but very convinent, we can use this mapping file and provider to define some users, groups for test purpose, when don't want to involve ShellBasedGroupMapping or LdapGroupMapping. Thanks for your feedback! Kai
Questions and possible improvements for LdapGroupsMapping
Hi All, Regarding LdapGroupsMapping, I have following questions: 1. Is it possible to use ShellBasedUnixGroupsMapping for Hadoop service principals/users, and LdapGroupsMapping for end user accounts? In our environment, normal end users (along with their groups info) for Hadoop cluster are from AD, and for them we prefer to use the ldap mapping; but for hdfs/mapred service principals, the default shell based one is enough, and we don't want to create the user/group entries in AD just for that. Seems in current implementation, only one user group mapping provider can be configured. 2. Can we support multiple ADs? Hadoop users might come from more than ONE AD in big org. 3. Is there any technical issue not to support LDAPs like OpenLDAP? In my understanding, one possible difficulity might be that it's not easy to extract common group lookup mechanism with common filters/configurations both applied for AD and OpenLDAP like, right? I'm wondering if these are just limits for current implementation, and if so if we need to improve that. Might the community has already been going for that?
RE: Questions and possible improvements for LdapGroupsMapping
Just got reply from user mailing list from Natty, as follows. And I'd like to discuss further here since it's more appropriate. Hi Natty, 1. It's great idea that we just write a customized group mapping service to handle different mapping for AD user and service principal; 2. OK, I'd like to improve it to support multiple ADs; 3. Great to know it. I will try the group mapping with OpenLDAP making use of the current configuration properties. And further, to support to do different mapping for different user/principal, and support multiple ADs, we also need extra properties to configure what kind of user/principal (regarding domain/realm is an option) should use which group mapping mechanism. To improve such things, I'm going to fire a JIRA for these. It would be great if you could continue to comment on it. Thanks regards, Kai From: Jonathan Natkins [mailto:na...@cloudera.com] Sent: Friday, October 19, 2012 8:58 AM To: u...@hadoop.apache.org Subject: Re: Secure hadoop and group permission on HDFS Hi Kai, 1. To the best of my knowledge, you can only use one group mapping service at a time. In order to do what you're suggesting, you'd have to write a customized group mapping service. 2. Currently multiple ADs are not supported, but it's certainly an improvement that could be made. 3. The LdapGroupsMapping already supports OpenLDAP. It's pretty heavily configurable for the purpose of supporting multiple types of LDAP implementations. The defaults just happen to be geared towards Active Directory. Thanks, Natty -Original Message- From: Zheng, Kai [mailto:kai.zh...@intel.com] Sent: Friday, October 19, 2012 8:32 AM To: common-dev@hadoop.apache.org Subject: Questions and possible improvements for LdapGroupsMapping Hi All, Regarding LdapGroupsMapping, I have following questions: 1. Is it possible to use ShellBasedUnixGroupsMapping for Hadoop service principals/users, and LdapGroupsMapping for end user accounts? In our environment, normal end users (along with their groups info) for Hadoop cluster are from AD, and for them we prefer to use the ldap mapping; but for hdfs/mapred service principals, the default shell based one is enough, and we don't want to create the user/group entries in AD just for that. Seems in current implementation, only one user group mapping provider can be configured. 2. Can we support multiple ADs? Hadoop users might come from more than ONE AD in big org. 3. Is there any technical issue not to support LDAPs like OpenLDAP? In my understanding, one possible difficulity might be that it's not easy to extract common group lookup mechanism with common filters/configurations both applied for AD and OpenLDAP like, right? I'm wondering if these are just limits for current implementation, and if so if we need to improve that. Might the community has already been going for that?
RE: Questions and possible improvements for LdapGroupsMapping
JIRA is opened for this: https://issues.apache.org/jira/browse/HADOOP-8943 -Original Message- From: Zheng, Kai [mailto:kai.zh...@intel.com] Sent: Friday, October 19, 2012 10:17 AM To: common-dev@hadoop.apache.org; na...@cloudera.com Subject: RE: Questions and possible improvements for LdapGroupsMapping Just got reply from user mailing list from Natty, as follows. And I'd like to discuss further here since it's more appropriate. Hi Natty, 1. It's great idea that we just write a customized group mapping service to handle different mapping for AD user and service principal; 2. OK, I'd like to improve it to support multiple ADs; 3. Great to know it. I will try the group mapping with OpenLDAP making use of the current configuration properties. And further, to support to do different mapping for different user/principal, and support multiple ADs, we also need extra properties to configure what kind of user/principal (regarding domain/realm is an option) should use which group mapping mechanism. To improve such things, I'm going to fire a JIRA for these. It would be great if you could continue to comment on it. Thanks regards, Kai From: Jonathan Natkins [mailto:na...@cloudera.com] Sent: Friday, October 19, 2012 8:58 AM To: u...@hadoop.apache.org Subject: Re: Secure hadoop and group permission on HDFS Hi Kai, 1. To the best of my knowledge, you can only use one group mapping service at a time. In order to do what you're suggesting, you'd have to write a customized group mapping service. 2. Currently multiple ADs are not supported, but it's certainly an improvement that could be made. 3. The LdapGroupsMapping already supports OpenLDAP. It's pretty heavily configurable for the purpose of supporting multiple types of LDAP implementations. The defaults just happen to be geared towards Active Directory. Thanks, Natty -Original Message- From: Zheng, Kai [mailto:kai.zh...@intel.com] Sent: Friday, October 19, 2012 8:32 AM To: common-dev@hadoop.apache.org Subject: Questions and possible improvements for LdapGroupsMapping Hi All, Regarding LdapGroupsMapping, I have following questions: 1. Is it possible to use ShellBasedUnixGroupsMapping for Hadoop service principals/users, and LdapGroupsMapping for end user accounts? In our environment, normal end users (along with their groups info) for Hadoop cluster are from AD, and for them we prefer to use the ldap mapping; but for hdfs/mapred service principals, the default shell based one is enough, and we don't want to create the user/group entries in AD just for that. Seems in current implementation, only one user group mapping provider can be configured. 2. Can we support multiple ADs? Hadoop users might come from more than ONE AD in big org. 3. Is there any technical issue not to support LDAPs like OpenLDAP? In my understanding, one possible difficulity might be that it's not easy to extract common group lookup mechanism with common filters/configurations both applied for AD and OpenLDAP like, right? I'm wondering if these are just limits for current implementation, and if so if we need to improve that. Might the community has already been going for that?