from:"Zheng, Kai"

RE: Apache Hadoop 2.8.3 Release Plan

2017-11-20 Thread Zheng, Kai

Thanks Andrew for the comments.

Yes, if we're "strictly" following the "maintenance release" practice, that'd 
be great and it's never my intent to overload it and cause mess.

>> If we're struggling with being able to deliver new features in a safe and 
>> timely fashion, let's try to address that...

This is interesting. Do you aware any means to do that? Thanks!

Regards,
Kai

-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Tuesday, November 21, 2017 2:22 PM
To: Zheng, Kai <kai.zh...@intel.com>
Cc: Junping Du <j...@hortonworks.com>; common-dev@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Apache Hadoop 2.8.3 Release Plan

I'm against including new features in maintenance releases, since they're meant 
to be bug-fix only.

If we're struggling with being able to deliver new features in a safe and 
timely fashion, let's try to address that, not overload the meaning of 
"maintenance release".

Best,
Andrew

On Mon, Nov 20, 2017 at 5:20 PM, Zheng, Kai <kai.zh...@intel.com> wrote:

> Hi Junping,
>
> Thank you for making 2.8.2 happen and now planning the 2.8.3 release.
>
> I have an ask, is it convenient to include the back port work for OSS 
> connector module? We have some Hadoop users that wish to have it by 
> default for convenience, though in the past they used it by back 
> porting themselves. I have raised this and got thoughts from Chris and 
> Steve. Looks like this is more wanted for 2.9 but I wanted to ask 
> again here for broad feedback and thoughts by this chance. The back 
> port patch is available for
> 2.8 and the one for branch-2 was already in. IMO, 2.8.x is promising 
> as we can see some shift from 2.7.x, hence it's worth more important 
> features and efforts. How would you think? Thanks!
>
> https://issues.apache.org/jira/browse/HADOOP-14964
>
> Regards,
> Kai
>
> -Original Message-
> From: Junping Du [mailto:j...@hortonworks.com]
> Sent: Tuesday, November 14, 2017 9:02 AM
> To: common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
> mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
> Subject: Apache Hadoop 2.8.3 Release Plan
>
> Hi,
> We have several important fixes get landed on branch-2.8 and I 
> would like to cut off branch-2.8.3 now to start 2.8.3 release work.
> So far, I don't see any pending blockers on 2.8.3, so my current 
> plan is to cut off 1st RC of 2.8.3 in next several days:
>  -  For all coming commits to land on branch-2.8, please mark 
> the fix version as 2.8.4.
>  -  If there is a really important fix for 2.8.3 and getting 
> closed, please notify me ahead before landing it on branch-2.8.3.
> Please let me know if you have any thoughts or comments on the plan.
>
> Thanks,
>
> Junping
> 
> From: dujunp...@gmail.com <dujunp...@gmail.com> on behalf of 俊平堵 < 
> junping...@apache.org>
> Sent: Friday, October 27, 2017 3:33 PM
> To: gene...@hadoop.apache.org
> Subject: [ANNOUNCE] Apache Hadoop 2.8.2 Release.
>
> Hi all,
>
> It gives me great pleasure to announce that the Apache Hadoop 
> community has voted to release Apache Hadoop 2.8.2, which is now 
> available for download from Apache mirrors[1]. For download 
> instructions please refer to the Apache Hadoop Release page [2].
>
> Apache Hadoop 2.8.2 is the first GA release of Apache Hadoop 2.8 line 
> and our newest stable release for entire Apache Hadoop project. For 
> major changes incuded in Hadoop 2.8 line, please refer Hadoop 2.8.2 main 
> page[3].
>
> This release has 315 resolved issues since previous 2.8.1 release with 
> following
> breakdown:
>- 91 in Hadoop Common
>- 99 in HDFS
>- 105 in YARN
>- 20 in MapReduce
> Please read the log of CHANGES[4] and RELEASENOTES[5] for more details.
>
> The release news is posted on the Hadoop website too, you can go to 
> the downloads section directly [6].
>
> Thank you all for contributing to the Apache Hadoop release!
>
>
> Cheers,
>
> Junping
>
>
> [1] http://www.apache.org/dyn/closer.cgi/hadoop/common
>
> [2] http://hadoop.apache.org/releases.html
>
> [3] http://hadoop.apache.org/docs/r2.8.2/index.html
>
> [4]
> http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/
> hadoop-common/release/2.8.2/CHANGES.2.8.2.html
>
> [5]
> http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/
> hadoop-common/release/2.8.2/RELEASENOTES.2.8.2.html
>
> [6] http://hadoop.apache.org/releases.html#Download
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>

RE: Apache Hadoop 2.8.3 Release Plan

2017-11-20 Thread Zheng, Kai

Hi Junping,

Thank you for making 2.8.2 happen and now planning the 2.8.3 release. 

I have an ask, is it convenient to include the back port work for OSS connector 
module? We have some Hadoop users that wish to have it by default for 
convenience, though in the past they used it by back porting themselves. I have 
raised this and got thoughts from Chris and Steve. Looks like this is more 
wanted for 2.9 but I wanted to ask again here for broad feedback and thoughts 
by this chance. The back port patch is available for 2.8 and the one for 
branch-2 was already in. IMO, 2.8.x is promising as we can see some shift from 
2.7.x, hence it's worth more important features and efforts. How would you 
think? Thanks!

https://issues.apache.org/jira/browse/HADOOP-14964

Regards,
Kai

-Original Message-
From: Junping Du [mailto:j...@hortonworks.com] 
Sent: Tuesday, November 14, 2017 9:02 AM
To: common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Apache Hadoop 2.8.3 Release Plan

Hi,
We have several important fixes get landed on branch-2.8 and I would like 
to cut off branch-2.8.3 now to start 2.8.3 release work. 
So far, I don't see any pending blockers on 2.8.3, so my current plan is to 
cut off 1st RC of 2.8.3 in next several days: 
 -  For all coming commits to land on branch-2.8, please mark the fix 
version as 2.8.4.
 -  If there is a really important fix for 2.8.3 and getting closed, 
please notify me ahead before landing it on branch-2.8.3.
Please let me know if you have any thoughts or comments on the plan.

Thanks,

Junping

From: dujunp...@gmail.com  on behalf of 俊平堵 

Sent: Friday, October 27, 2017 3:33 PM
To: gene...@hadoop.apache.org
Subject: [ANNOUNCE] Apache Hadoop 2.8.2 Release.

Hi all,

It gives me great pleasure to announce that the Apache Hadoop community has 
voted to release Apache Hadoop 2.8.2, which is now available for download from 
Apache mirrors[1]. For download instructions please refer to the Apache Hadoop 
Release page [2].

Apache Hadoop 2.8.2 is the first GA release of Apache Hadoop 2.8 line and our 
newest stable release for entire Apache Hadoop project. For major changes 
incuded in Hadoop 2.8 line, please refer Hadoop 2.8.2 main page[3].

This release has 315 resolved issues since previous 2.8.1 release with following
breakdown:
   - 91 in Hadoop Common
   - 99 in HDFS
   - 105 in YARN
   - 20 in MapReduce
Please read the log of CHANGES[4] and RELEASENOTES[5] for more details.

The release news is posted on the Hadoop website too, you can go to the 
downloads section directly [6].

Thank you all for contributing to the Apache Hadoop release!


Cheers,

Junping


[1] http://www.apache.org/dyn/closer.cgi/hadoop/common

[2] http://hadoop.apache.org/releases.html

[3] http://hadoop.apache.org/docs/r2.8.2/index.html

[4]
http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-common/release/2.8.2/CHANGES.2.8.2.html

[5]
http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-common/release/2.8.2/RELEASENOTES.2.8.2.html

[6] http://hadoop.apache.org/releases.html#Download


-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org


-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

RE: Backporting OSS module to branch 2.x

2017-11-15 Thread Zheng, Kai

>> We did not allow a backport of ADLS to branch-2.7 when it was released in 
>> 2.8.0. There were technical reasons-...

Ok, I'm clear now branch-2.7 is already in maintenance mode and allows none of 
new features to be included. 

>> Moreover, one should be able to use a jar compiled for 2.9 in a 2.7 cluster, 
>> so the value of releasing this module with 2.7.5 or 2.8.3 is questionable.

This sounds a good suggestion as a workaround for 2.7. For 2.8, as I'm still 
wondering if 2.8.3 is the last 2.8 release or not. If it is, I agree; 
otherwise, putting it in branch-2.8 and releasing it along with some other nice 
things in 2.8.4 would still be desirable. I'm thinking 2.8 releases would be 
the next one of popular favorites after 2.7 in line with 3.x. It could be too 
early to stop it, sure that also depends on potential interests and takings as 
you said in previous emails. Very likely I'm missed in the full picture but I 
want to catch up so that help in the future.

>> Did anyone raise the Aliyun OSS backport during the 2.9.0 release 
>> discussion? I don't recall seeing it in the wiki or in any thread on the 
>> topic, but I may well have missed it. Since the vote on RC3 closes on Friday 
>> and looks likely to pass, this is very late to propose a new feature. Please 
>> raise this on the 2.9 release thread, so we can figure out how to handle it.

Indeed not yet. Yes it looks rather late as we could see RC3 is being voted and 
goes fine. My idea is to put the work in branch-2.9 first and expect some new 
release after the 2.9.0 one. Sure let me raise it on the 2.9 release thread 
when it's the right time. 

Thanks Chris again for the education and the thoughts.

Regards,
Kai

-Original Message-
From: Zheng, Kai [mailto:kai.zh...@intel.com] 
Sent: Thursday, November 16, 2017 10:18 AM
To: common-dev@hadoop.apache.org
Subject: Backporting OSS module to branch 2.x

There was some discussion about backporting OSS module to branch 2.x and per 
Chris's suggestion we should do it in the dev list.

-Original Message-

From: Chris Douglas [mailto:cdoug...@apache.org]

Sent: Thursday, November 16, 2017 1:20 AM

To: Zheng, Kai <kai.zh...@intel.com<mailto:kai.zh...@intel.com>>

Cc: Junping Du <j...@hortonworks.com<mailto:j...@hortonworks.com>>; Konstantin 
Shvachko <shv.had...@gmail.com<mailto:shv.had...@gmail.com>>; 
s...@apache.org<mailto:s...@apache.org>; Jason Lowe 
<jl...@oath.com<mailto:jl...@oath.com>>; Steve Loughran 
<steve.lough...@gmail.com<mailto:steve.lough...@gmail.com>>; Jonathan Hung 
<jyhung2...@gmail.com<mailto:jyhung2...@gmail.com>>; Arun Suresh 
<asur...@apache.org<mailto:asur...@apache.org>>; Vinod Kumar Vavilapalli 
<vino...@apache.org<mailto:vino...@apache.org>>; 
secur...@hadoop.apache.org<mailto:secur...@hadoop.apache.org>

Subject: Re: Potential security issue of XXE in Hadoop

We should move this part of the thread back to the dev list.

On Wed, Nov 15, 2017 at 2:33 AM, Zheng, Kai 
<kai.zh...@intel.com<mailto:kai.zh...@intel.com>> wrote:

> We have some wish to backport Ali OSS support for some releases based on 
> 2.7/2.8/2.9. So per the discussion 2.9 should be fine; for 2.7 and 2.8, as we 
> haven't cut the 2.7.5 and 2.8.3 yet, I'm hoping we could still be able to do 
> that. We Intel folks would like to do some taking like the testing and 
> verifying. The backport work is tracked in [1] and currently Steve has some 
> concerns for 2.7 and 2.8, we're working the best to solve the concerns, 
> basically we'd avoid any package change (like httpclient) and make the 
> changes self-contained just in the Hadoop oss connector module. The backport 
> patches will be available soon.

We did not allow a backport of ADLS to branch-2.7 when it was released in 
2.8.0. There were technical reasons- new dependencies could conflict with 
existing 2.7 client code, patch releases would release at a slower cadence, 
etc.- but popularity of an older release is not a sufficient reason to change 
our version policy on features. We tried to get away with that in 0.16 (and a 
few other times) and it's never gone well. Moreover, one should be able to use 
a jar compiled for 2.9 in a 2.7 cluster, so the value of releasing this module 
with 2.7.5 or

2.8.3 is questionable.

Did anyone raise the Aliyun OSS backport during the 2.9.0 release discussion? I 
don't recall seeing it in the wiki or in any thread on the topic, but I may 
well have missed it. Since the vote on RC3 closes on Friday and looks likely to 
pass, this is very late to propose a new feature. Please raise this on the 2.9 
release thread, so we can figure out how to handle it. Version numbers are 
cheap, but cutting 2.10 only to include this module will create an annoying 
maintenance burden for a low payoff. Correspondi

Backporting OSS module to branch 2.x

2017-11-15 Thread Zheng, Kai

There was some discussion about backporting OSS module to branch 2.x and per 
Chris's suggestion we should do it in the dev list.

-Original Message-

From: Chris Douglas [mailto:cdoug...@apache.org]

Sent: Thursday, November 16, 2017 1:20 AM

To: Zheng, Kai <kai.zh...@intel.com<mailto:kai.zh...@intel.com>>

Cc: Junping Du <j...@hortonworks.com<mailto:j...@hortonworks.com>>; Konstantin 
Shvachko <shv.had...@gmail.com<mailto:shv.had...@gmail.com>>; 
s...@apache.org<mailto:s...@apache.org>; Jason Lowe 
<jl...@oath.com<mailto:jl...@oath.com>>; Steve Loughran 
<steve.lough...@gmail.com<mailto:steve.lough...@gmail.com>>; Jonathan Hung 
<jyhung2...@gmail.com<mailto:jyhung2...@gmail.com>>; Arun Suresh 
<asur...@apache.org<mailto:asur...@apache.org>>; Vinod Kumar Vavilapalli 
<vino...@apache.org<mailto:vino...@apache.org>>; 
secur...@hadoop.apache.org<mailto:secur...@hadoop.apache.org>

Subject: Re: Potential security issue of XXE in Hadoop

We should move this part of the thread back to the dev list.

On Wed, Nov 15, 2017 at 2:33 AM, Zheng, Kai 
<kai.zh...@intel.com<mailto:kai.zh...@intel.com>> wrote:

> We have some wish to backport Ali OSS support for some releases based on 
> 2.7/2.8/2.9. So per the discussion 2.9 should be fine; for 2.7 and 2.8, as we 
> haven't cut the 2.7.5 and 2.8.3 yet, I'm hoping we could still be able to do 
> that. We Intel folks would like to do some taking like the testing and 
> verifying. The backport work is tracked in [1] and currently Steve has some 
> concerns for 2.7 and 2.8, we're working the best to solve the concerns, 
> basically we'd avoid any package change (like httpclient) and make the 
> changes self-contained just in the Hadoop oss connector module. The backport 
> patches will be available soon.

We did not allow a backport of ADLS to branch-2.7 when it was released in 
2.8.0. There were technical reasons- new dependencies could conflict with 
existing 2.7 client code, patch releases would release at a slower cadence, 
etc.- but popularity of an older release is not a sufficient reason to change 
our version policy on features. We tried to get away with that in 0.16 (and a 
few other times) and it's never gone well. Moreover, one should be able to use 
a jar compiled for 2.9 in a 2.7 cluster, so the value of releasing this module 
with 2.7.5 or

2.8.3 is questionable.

Did anyone raise the Aliyun OSS backport during the 2.9.0 release discussion? I 
don't recall seeing it in the wiki or in any thread on the topic, but I may 
well have missed it. Since the vote on RC3 closes on Friday and looks likely to 
pass, this is very late to propose a new feature. Please raise this on the 2.9 
release thread, so we can figure out how to handle it. Version numbers are 
cheap, but cutting 2.10 only to include this module will create an annoying 
maintenance burden for a low payoff. Correspondingly, a 2.9.1 release with 
"only a few" new features is a repeat of history we should avoid. -C

> @Konstantin, would you let me know when you'd cut the 2.7.5 release? Sounds 
> good to have the oss backport work? Note the module has been in trunk for 
> quite some time and the codes have been production exercised. Is there 
> anything we could take and help with? Our pleasure to do. Thanks!

>

> @Junping, for 2.8.3, my similar ask and we would also help.

>

> [1]  https://issues.apache.org/jira/browse/HADOOP-14964

>

> Regards,

> Kai

>

RE: [VOTE] Merge yarn-native-services branch into trunk

2017-11-09 Thread Zheng, Kai

Cool to have this feature! Thanks Jian and all.

Regards,
Kai

-Original Message-
From: Vinod Kumar Vavilapalli [mailto:vino...@apache.org] 
Sent: Tuesday, November 07, 2017 7:20 AM
To: Jian He 
Cc: yarn-...@hadoop.apache.org; common-dev@hadoop.apache.org; Hdfs-dev 
; mapreduce-...@hadoop.apache.org
Subject: Re: [VOTE] Merge yarn-native-services branch into trunk

Congratulations to all the contributors involved, this is a great step forward!

+Vinod

> On Nov 6, 2017, at 2:40 PM, Jian He  wrote:
> 
> Okay, I just merged the branch to trunk (108 commits in total !) 
> Again, thanks for all who contributed to this feature!
> 
> Jian
> 
> On Nov 6, 2017, at 1:26 PM, Jian He 
> > wrote:
> 
> Here’s +1 from myself.
> The vote passes with 7 (+1) bindings and 2 (+1) non-bindings.
> 
> Thanks for all who voted. I’ll merge to trunk by the end of today.
> 
> Jian
> 
> On Nov 6, 2017, at 8:38 AM, Billie Rinaldi 
> > wrote:
> 
> +1 (binding)
> 
> On Mon, Oct 30, 2017 at 1:06 PM, Jian He 
> > wrote:
> Hi All,
> 
> I would like to restart the vote for merging yarn-native-services to trunk.
> Since last vote, we have been working on several issues in documentation, 
> DNS, CLI modifications etc. We believe now the feature is in a much better 
> shape.
> 
> Some back ground:
> At a high level, the following are the key feautres implemented.
> - YARN-5079[1]. A native YARN framework (ApplicationMaster) to orchestrate 
> existing services to YARN either docker or non-docker based.
> - YARN-4793[2]. A Rest API service embeded in RM (optional)  for user 
> to deploy a service via a simple JSON spec
> - YARN-4757[3]. Extending today's service registry with a simple DNS 
> service to enable users to discover services deployed on YARN via 
> standard DNS lookup
> - YARN-6419[4]. UI support for native-services on the new YARN UI All 
> these new services are optional and are sitting outside of the existing 
> system, and have no impact on existing system if disabled.
> 
> Special thanks to a team of folks who worked hard towards this: Billie 
> Rinaldi, Gour Saha, Vinod Kumar Vavilapalli, Jonathan Maron, Rohith Sharma K 
> S, Sunil G, Akhil PB, Eric Yang. This effort could not be possible without 
> their ideas and hard work.
> Also thanks Allen for some review and verifications.
> 
> Thanks,
> Jian
> 
> [1] https://issues.apache.org/jira/browse/YARN-5079
> [2] https://issues.apache.org/jira/browse/YARN-4793
> [3] https://issues.apache.org/jira/browse/YARN-4757
> [4] https://issues.apache.org/jira/browse/YARN-6419
> 
> 
> 


-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org


-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

RE: [DISCUSS] A final minor release off branch-2?

2017-11-06 Thread Zheng, Kai

Thanks Vinod.

>> Of the top of my head, one of the biggest areas is application 
>> compatibility. When folks move from 2.x to 3.x, are their apps binary 
>> compatible? Source compatible? Or need changes?
I thought these are good concerns from overall perspective. On the other hand, 
I've discussed with quite a few 3.0 potential users, it looks like most of them 
are interested in the erasure coding feature and a major scenario for that is 
to back up their large volume of data to save storage cost. They might run 
analytics workload using Hive, Spark, Impala and Kylin on the new cluster based 
on the version, but it's not a must at the first time. They understand there 
might be some gaps so they'd migrate their workloads incrementally. For the 
major analytics workload, we've performed lots of benchmark and integration 
tests as well as other sides I believe, we did find some issues but they should 
be fixed in downstream projects. I thought the release of GA will accelerate 
the progress and expose the issues if any. We couldn't wait for it being 
matured. There isn't perfectness.

>> The main goal of the bridging release is to ease transition on stuff that is 
>> guaranteed to be broken.
This sounds a good consideration. I'm thinking if I'm a Hadoop user, for 
example, I'm using 2.7.4 or 2.8.2 or whatever 2.x version, would I first 
upgrade to this bridging release then use the bridge support to upgrade to 3.x 
version? I'm not sure. On the other hand, I might tend to look for some guides 
or supports in 3.x docs about how to upgrade from 2.7 to 3.x. 

Frankly speaking, working on some bridging release not targeting any feature 
isn't so attractive to me as a contributor. Overall, the final minor release 
off branch-2 is good, we should also give 3.x more time to evolve and mature, 
therefore it looks to me we would have to work on two release lines meanwhile 
for some time. I'd like option C), and suggest we focus on the recent releases.

Just some thoughts.

Regards,
Kai

-Original Message-
From: Vinod Kumar Vavilapalli [mailto:vino...@apache.org] 
Sent: Tuesday, November 07, 2017 9:43 AM
To: Andrew Wang 
Cc: Arun Suresh ; common-dev@hadoop.apache.org; 
yarn-...@hadoop.apache.org; Hdfs-dev ; 
mapreduce-...@hadoop.apache.org
Subject: Re: [DISCUSS] A final minor release off branch-2?

The main goal of the bridging release is to ease transition on stuff that is 
guaranteed to be broken.

Of the top of my head, one of the biggest areas is application compatibility. 
When folks move from 2.x to 3.x, are their apps binary compatible? Source 
compatible? Or need changes?

In 1.x -> 2.x upgrade, we did a bunch of work to atleast make old apps be 
source compatible. This means relooking at the API compatibility in 3.x and 
their impact of migrating applications. We will have to revist and un-deprecate 
old APIs, un-delete old APIs and write documentation on how apps can be 
migrated.

Most of this work will be in 3.x line. The bridging release on the other hand 
will have deprecation for APIs that cannot be undeleted. This may be already 
have been done in many places. But we need to make sure and fill gaps if any.

Other areas that I can recall from the old days
 - Config migration: Many configs are deprecated or deleted. We need 
documentation to help folks to move. We also need deprecations in the bridging 
release for configs that cannot be undeleted.
 - You mentioned rolling-upgrades: It will be good to exactly outline the type 
of testing. For e.g., the rolling-upgrades orchestration order has direct 
implication on the testing done.
 - Story for downgrades?
 - Copying data between 2.x clusters and 3.x clusters: Does this work already? 
Is it broken anywhere that we cannot fix? Do we need bridging features for this 
work?

+Vinod

> On Nov 6, 2017, at 12:49 PM, Andrew Wang  wrote:
> 
> What are the known gaps that need bridging between 2.x and 3.x?
> 
> From an HDFS perspective, we've tested wire compat, rolling upgrade, 
> and rollback.
> 
> From a YARN perspective, we've tested wire compat and rolling upgrade. 
> Arun just mentioned an NM rollback issue that I'm not familiar with.
> 
> Anything else? External to this discussion, these should be documented 
> as known issues for 3.0.
> 
> Best.
> Andrew
> 
> On Sun, Nov 5, 2017 at 1:46 PM, Arun Suresh  wrote:
> 
>> Thanks for starting this discussion VInod.
>> 
>> I agree (C) is a bad idea.
>> I would prefer (A) given that ATM, branch-2 is still very close to
>> branch-2.9 - and it is a good time to make a collective decision to 
>> lock down commits to branch-2.
>> 
>> I think we should also clearly define what the 'bridging' release 
>> should be.
>> I assume it means the following:
>> * Any 2.x user wanting to move to 3.x must first upgrade to the 
>> bridging release first and then upgrade to the 3.x release.
>> * With

RE: [VOTE] HADOOP-12756 - Aliyun OSS Support branch merge

2016-10-08 Thread Zheng, Kai

I'd like to conclude this vote.

+1 bindings:
Lei Xu,
Kai Zheng,
Gangumalla, Uma

+1:
Hao Cheng

No 0 and -1 votes. 

The VOTE passed. I will merge the branch into trunk accordingly.

Thanks for the time reviewing the work and casting your votes. Also thanks 
Steve for providing the review comments in HADOOP-12756 and Genmao Yu for 
addressing them, doing the complete "hadoop fs" tests.

Regards,
Kai

-Original Message-----
From: Zheng, Kai [mailto:kai.zh...@intel.com] 
Sent: Wednesday, September 28, 2016 10:35 AM
To: common-dev@hadoop.apache.org
Subject: [VOTE] HADOOP-12756 - Aliyun OSS Support branch merge

Hi all,

I would like to propose a merge vote for HADOOP-12756 branch to trunk. This 
branch develops support for Aliyun OSS (another cloud storage) in Hadoop.

The voting starts now and will run for 7 days till Oct 5, 2016 07:00 PM PDT.

Aliyun OSS is widely used among China's cloud users, and currently it is not 
easy to access data in Aliyun OSS from Hadoop. The branch develops a new module 
hadoop-aliyun and provides support for accessing data in Aliyun OSS cloud 
storage, which will enable more use cases and bring better use experience for 
Hadoop users. Like the existing s3a support, AliyunOSSFileSystem a new 
implementation of FileSystem backed by Aliyun OSS is provided. During the 
implementation, the contributors refer to the s3a support, keeping the same 
coding style and project structure.

. The updated architecture document is here.
   
[https://issues.apache.org/jira/secure/attachment/12829541/Aliyun-OSS-integration-v2.pdf]

. The merge patch that is a diff against trunk is posted here, which builds 
cleanly with manual testing results posted in HADOOP-13584.
   
[https://issues.apache.org/jira/secure/attachment/12829738/HADOOP-13584.004.patch]

. The user documentation is also provided as part of the module.

HADOOP-12756 has a set of sub-tasks and they are ordered in the same sequence 
as they were committed to HADOOP-12756. Hopefully this will make it easier for 
reviewing.

What I want to emphasize is: this is a fundamental implementation aiming at 
guaranteeing functionality and stability. The major functionality has been 
running in production environments for some while. There're definitely 
performance optimizations that we can do like the community have done for the 
existing s3a and azure supports. Merging this to trunk would serve as a very 
good beginning for the following optimizations aligning with the related 
efforts together.

The new hadoop-aliyun modlue is made possible owing to many people. Thanks to 
the contributors Mingfei Shi, Genmao Yu and Ling Zhou; thanks to Cheng Hao, 
Steve Loughran, Chris Nauroth, Yi Liu, Lei (Eddy) Xu, Uma Maheswara Rao G and 
Allen Wittenauer for their kind reviewing and guidance. Also thanks Arpit 
Agarwal, Andrew Wang and Anu Engineer for the great process discussions to 
bring this up.

Please kindly vote. Thanks in advance!

Regards,
Kai


-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

RE: [VOTE] HADOOP-12756 - Aliyun OSS Support branch merge

2016-10-05 Thread Zheng, Kai

Could I extend this a bit longer considering the PRC holiday (during Oct 1 and 
Oct 7)? If sounds good I'd like to have another week (next Wednesday) for this. 
Please advise if you'd like to think otherwise. Thanks.

Regards,
Kai 

-Original Message-
From: Zheng, Kai [mailto:kai.zh...@intel.com] 
Sent: Wednesday, September 28, 2016 10:35 AM
To: common-dev@hadoop.apache.org
Subject: [VOTE] HADOOP-12756 - Aliyun OSS Support branch merge

Hi all,

I would like to propose a merge vote for HADOOP-12756 branch to trunk. This 
branch develops support for Aliyun OSS (another cloud storage) in Hadoop.

The voting starts now and will run for 7 days till Oct 5, 2016 07:00 PM PDT.

Aliyun OSS is widely used among China's cloud users, and currently it is not 
easy to access data in Aliyun OSS from Hadoop. The branch develops a new module 
hadoop-aliyun and provides support for accessing data in Aliyun OSS cloud 
storage, which will enable more use cases and bring better use experience for 
Hadoop users. Like the existing s3a support, AliyunOSSFileSystem a new 
implementation of FileSystem backed by Aliyun OSS is provided. During the 
implementation, the contributors refer to the s3a support, keeping the same 
coding style and project structure.

. The updated architecture document is here.
   
[https://issues.apache.org/jira/secure/attachment/12829541/Aliyun-OSS-integration-v2.pdf]

. The merge patch that is a diff against trunk is posted here, which builds 
cleanly with manual testing results posted in HADOOP-13584.
   
[https://issues.apache.org/jira/secure/attachment/12829738/HADOOP-13584.004.patch]

. The user documentation is also provided as part of the module.

HADOOP-12756 has a set of sub-tasks and they are ordered in the same sequence 
as they were committed to HADOOP-12756. Hopefully this will make it easier for 
reviewing.

What I want to emphasize is: this is a fundamental implementation aiming at 
guaranteeing functionality and stability. The major functionality has been 
running in production environments for some while. There're definitely 
performance optimizations that we can do like the community have done for the 
existing s3a and azure supports. Merging this to trunk would serve as a very 
good beginning for the following optimizations aligning with the related 
efforts together.

The new hadoop-aliyun modlue is made possible owing to many people. Thanks to 
the contributors Mingfei Shi, Genmao Yu and Ling Zhou; thanks to Cheng Hao, 
Steve Loughran, Chris Nauroth, Yi Liu, Lei (Eddy) Xu, Uma Maheswara Rao G and 
Allen Wittenauer for their kind reviewing and guidance. Also thanks Arpit 
Agarwal, Andrew Wang and Anu Engineer for the great process discussions to 
bring this up.

Please kindly vote. Thanks in advance!

Regards,
Kai


-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

RE: desc error on official site http://hadoop.apache.org/

2016-09-28 Thread Zheng, Kai

Thank you for the catch. This should go to the common-dev mailing list.

Would you fire an issue to fix this?

Regards,
Kai

-Original Message-
From: 444...@qq.com [mailto:444...@qq.com] 
Sent: Wednesday, September 28, 2016 9:10 AM
To: general 
Subject: desc error on official site http://hadoop.apache.org/

It is designed to scale up from single servers to thousands of machines, each 
offering local computation and storage.
===>
It is designed to scale up from single server to thousands of machines, each 
offering local computation and storage.

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[VOTE] HADOOP-12756 - Aliyun OSS Support branch merge

2016-09-27 Thread Zheng, Kai

Hi all,

I would like to propose a merge vote for HADOOP-12756 branch to trunk. This 
branch develops support for Aliyun OSS (another cloud storage) in Hadoop.

The voting starts now and will run for 7 days till Oct 5, 2016 07:00 PM PDT.

Aliyun OSS is widely used among China's cloud users, and currently it is not 
easy to access data in Aliyun OSS from Hadoop. The branch develops a new module 
hadoop-aliyun and provides support for accessing data in Aliyun OSS cloud 
storage, which will enable more use cases and bring better use experience for 
Hadoop users. Like the existing s3a support, AliyunOSSFileSystem a new 
implementation of FileSystem backed by Aliyun OSS is provided. During the 
implementation, the contributors refer to the s3a support, keeping the same 
coding style and project structure.

. The updated architecture document is here.
   
[https://issues.apache.org/jira/secure/attachment/12829541/Aliyun-OSS-integration-v2.pdf]

. The merge patch that is a diff against trunk is posted here, which builds 
cleanly with manual testing results posted in HADOOP-13584.
   
[https://issues.apache.org/jira/secure/attachment/12829738/HADOOP-13584.004.patch]

. The user documentation is also provided as part of the module.

HADOOP-12756 has a set of sub-tasks and they are ordered in the same sequence 
as they were committed to HADOOP-12756. Hopefully this will make it easier for 
reviewing.

What I want to emphasize is: this is a fundamental implementation aiming at 
guaranteeing functionality and stability. The major functionality has been 
running in production environments for some while. There're definitely 
performance optimizations that we can do like the community have done for the 
existing s3a and azure supports. Merging this to trunk would serve as a very 
good beginning for the following optimizations aligning with the related 
efforts together.

The new hadoop-aliyun modlue is made possible owing to many people. Thanks to 
the contributors Mingfei Shi, Genmao Yu and Ling Zhou; thanks to Cheng Hao, 
Steve Loughran, Chris Nauroth, Yi Liu, Lei (Eddy) Xu, Uma Maheswara Rao G and 
Allen Wittenauer for their kind reviewing and guidance. Also thanks Arpit 
Agarwal, Andrew Wang and Anu Engineer for the great process discussions to 
bring this up.

Please kindly vote. Thanks in advance!

Regards,
Kai

RE: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0

2016-09-02 Thread Zheng, Kai

Thanks Sammi.

My non-binding +1 to make the release candidate.

Regards,
Kai

-Original Message-
From: Chen, Sammi 
Sent: Friday, September 02, 2016 4:59 PM
To: Zheng, Kai <kai.zh...@intel.com>; Andrew Wang <andrew.w...@cloudera.com>; 
Arun Suresh <asur...@apache.org>
Cc: common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org; Chen, Sammi 
<sammi.c...@intel.com>
Subject: RE: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0

+1 (non-binding).

Thanks for driving this Andrew!

* Download and built from source.
* Setup a 10 node cluster (1 name node + 9 data nodes)
* Verified normal HDFS file put/get operation with 3x replication
* With 2 data nodes failure, verified HDFS file put/get operation with 3x 
replication, file integrity is OK
* Enable Erasure Code policy "RS-DEFAULT-6-3-64k", verified HDFS file put/get 
operation
* Enable Erasure Code policy "RS-DEFAULT-6-3-64k", with 3 data nodes failure, 
verified HDFS file put/get operation, file integrity is OK

Cheers
-Sammi

-Original Message-
From: Zheng, Kai
Sent: Friday, September 02, 2016 3:25 PM
To: Chen, Sammi
Subject: FW: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0

Hi Sammi,

Could you help provide our feedback? I know you did lots of tests. Thanks!

Regards,
Kai

-Original Message-
From: Arun Suresh [mailto:asur...@apache.org]
Sent: Friday, September 02, 2016 11:33 AM
To: Andrew Wang <andrew.w...@cloudera.com>
Cc: common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Re: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0

+1 (binding).

Thanks for driving this Andrew..

* Download and built from source.
* Setup a 5 mode cluster.
* Verified that MR works with opportunistic containers
* Verified that the AMRMClient supports 'allocationRequestId'

Cheers
-Arun

On Thu, Sep 1, 2016 at 4:31 PM, Aaron Fabbri <fab...@cloudera.com> wrote:

> +1, non-binding.
>
> I built everything on OS X and ran the s3a contract tests successfully:
>
> mvn test -Dtest=org.apache.hadoop.fs.contract.s3a.\*
>
> ...
>
> Results :
>
>
> Tests run: 78, Failures: 0, Errors: 0, Skipped: 1
>
>
> [INFO]
> --
> --
>
> [INFO] BUILD SUCCESS
>
> [INFO]
> --
> --
>
> On Thu, Sep 1, 2016 at 3:39 PM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
>
> > Good point Allen, I forgot about `hadoop version`. Since it's 
> > populated
> by
> > a version-info.properties file, people can always cat that file.
> >
> > On Thu, Sep 1, 2016 at 3:21 PM, Allen Wittenauer <
> a...@effectivemachines.com
> > >
> > wrote:
> >
> > >
> > > > On Sep 1, 2016, at 3:18 PM, Allen Wittenauer <
> a...@effectivemachines.com
> > >
> > > wrote:
> > > >
> > > >
> > > >> On Sep 1, 2016, at 2:57 PM, Andrew Wang 
> > > >> <andrew.w...@cloudera.com>
> > > wrote:
> > > >>
> > > >> Steve requested a git hash for this release. This led us into a
> brief
> > > >> discussion of our use of git tags, wherein we realized that 
> > > >> although release tags are immutable (start with "rel/"), RC tags are 
> > > >> not.
> This
> > is
> > > >> based on the HowToRelease instructions.
> > > >
> > > >   We should probably embed the git hash in one of the files 
> > > > that
> > > gets gpg signed.  That's an easy change to create-release.
> > >
> > >
> > > (Well, one more easily accessible than 'hadoop version')
> >
>

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

RE: [DISCUSS] The order of classpath isolation work and updating/shading dependencies on trunk

2016-07-22 Thread Zheng, Kai

For the leveldb thing, wouldn't we have an alternative option in Java for the 
platforms where leveldb isn't supported yet due to whatever reasons. IMO, 
native library would be best to be used for optimization and production for 
performance. For development and pure Java platform, by default pure Java 
approach should still be provided and used. That is to say, if no Hadoop native 
is used, all the functionalities should still work and not break. 

HDFS erasure coding goes in the way. For that, we spent much effort in 
developing an ISA-L compatible erasure coder in pure Java that's used by 
default, though for performance the ISA-L native one is recommended in 
production deployment.

Regards,
Kai

-Original Message-
From: Allen Wittenauer [mailto:a...@effectivemachines.com] 
Sent: Saturday, July 23, 2016 8:16 AM
To: Sangjin Lee 
Cc: Sean Busbey ; common-dev@hadoop.apache.org; 
yarn-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
mapreduce-...@hadoop.apache.org
Subject: Re: [DISCUSS] The order of classpath isolation work and 
updating/shading dependencies on trunk


But if I don't use ApplicationClassLoader, my java app is basically screwed 
then, right?

Also:  right now, the non-Linux and/or non-x86 platforms have to supply their 
own leveldbjni jar (or at least the C level library?) in order to make YARN 
even functional.  How is that going to work with the class path manipulation?


> On Jul 22, 2016, at 9:57 AM, Sangjin Lee  wrote:
> 
> The work on HADOOP-13070 and the ApplicationClassLoader are generic and go 
> beyond YARN. It can be used in any JVM that uses hadoop. The current use 
> cases are MR containers, hadoop's RunJar (as in "hadoop jar"), and the YARN 
> node manager auxiliary services. I'm not sure if that's what you were asking, 
> but I hope it helps.
> 
> Regards,
> Sangjin
> 
> On Fri, Jul 22, 2016 at 9:16 AM, Sean Busbey  wrote:
> My work on HADOOP-11804 *only* helps processes that sit outside of 
> YARN. :)
> 
> On Fri, Jul 22, 2016 at 10:48 AM, Allen Wittenauer 
>  wrote:
> >
> > Does any of this work actually help processes that sit outside of YARN?
> >
> >> On Jul 21, 2016, at 12:29 PM, Sean Busbey  wrote:
> >>
> >> thanks for bringing this up! big +1 on upgrading dependencies for 3.0.
> >>
> >> I have an updated patch for HADOOP-11804 ready to post this week. 
> >> I've been updating HBase's master branch to try to make use of it, 
> >> but could use some other reviews.
> >>
> >> On Thu, Jul 21, 2016 at 4:30 AM, Tsuyoshi Ozawa  wrote:
> >>> Hi developers,
> >>>
> >>> I'd like to discuss how to make an advance towards dependency 
> >>> management in Apache Hadoop trunk code since there has been lots 
> >>> work about updating dependencies in parallel. Summarizing recent 
> >>> works and activities as follows:
> >>>
> >>> 0) Currently, we have merged minimum update dependencies for 
> >>> making Hadoop JDK-8 compatible(compilable and runnable on JDK-8).
> >>> 1) After that, some people suggest that we should update the other 
> >>> dependencies on trunk(e.g. protobuf, netty, jackthon etc.).
> >>> 2) In parallel, Sangjin and Sean are working on classpath isolation:
> >>> HADOOP-13070, HADOOP-11804 and HADOOP-11656.
> >>>
> >>> Main problems we try to solve in the activities above is as follows:
> >>>
> >>> * 1) tries to solve dependency hell between user-level jar and 
> >>> system(Hadoop)-level jar.
> >>> * 2) tries to solve updating old libraries.
> >>>
> >>> IIUC, 1) and 2) looks not related, but it's related in fact. 2) 
> >>> tries to separate class loader between client-side dependencies 
> >>> and server-side dependencies in Hadoop, so we can the change 
> >>> policy of updating libraries after doing 2). We can also decide 
> >>> which libraries can be shaded after 2).
> >>>
> >>> Hence, IMHO, a straight way we should go to is doing 2 at first.
> >>> After that, we can update both client-side and server-side 
> >>> dependencies based on new policy(maybe we should discuss what kind 
> >>> of incompatibility is acceptable, and the others are not).
> >>>
> >>> Thoughts?
> >>>
> >>> Thanks,
> >>> - Tsuyoshi
> >>>
> >>> --
> >>> --- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> >>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> >>>
> >>
> >>
> >>
> >> --
> >> busbey
> >>
> >> ---
> >> -- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >>
> >
> >
> > 
> > - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> >
> 
> 
> 
> --

RE: Setting JIRA fix versions for 3.0.0 releases

2016-07-21 Thread Zheng, Kai

My humble feeling is almost the same regarding the urgent need of a 3.0 alpha 
release.

Considering EC, shell-script rewriting and etc. are significant changes and 
there are interested users that want to evaluate EC storage method, an alpha 
3.0 release will definitely help a lot allowing users to try the new features 
and then expose critical bugs or gaps. This may take quite some time, and 
should be very important to build confidence preparing for a solid 3.0 release. 
I understand Vinod's concern and the need of lining up the release efforts, so 
if it's to work on multiple 2.x releases it should be avoided. Mentioning 3.0 
alpha, it's different and the best would be to allow parallel going to speed up 
EC and the like, meanwhile any 2.x release won't be in a hurry pushed by 3.0 
release. 

Thanks for any consideration.

Regards,
Kai

-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Friday, July 22, 2016 3:33 AM
To: Vinod Kumar Vavilapalli 
Cc: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Setting JIRA fix versions for 3.0.0 releases

I really, really want a 3.0.0-alpha1 ASAP, since it's basically impossible for 
downstreams to test incompat changes and new features without a release 
artifact. I've been doing test builds, and branch-3.0.0-alpha1 is ready for an 
RC besides possibly this fix version issue.

I'm not too worried about splitting community bandwidth, for the following
reasons:

* 3.0.0-alpha1 is very explicitly an alpha, which means no quality or 
compatibility guarantees. It needs less vetting than a 2.x release.
* Given that 3.0.0 is still in alpha, there aren't many true showstopper bugs. 
Most blockers I see are also apply to both 2.x as well as 3.0.0.
* Community bandwidth isn't zero-sum. This particularly applies to people 
working on features that are only present in trunk, like EC, shell script 
rewrite, etc.

Longer-term, I assume the 2.x line is not ending with 2.8. So we'd still have 
the issue of things committed for 2.9.0 that will be appearing for the first 
time in 3.0.0-alpha1. Assuming a script exists to fix up 2.9 JIRAs, it's only 
incrementally more work to also fix up 2.8 and other unreleased versions too.

Best,
Andrew

On Thu, Jul 21, 2016 at 11:53 AM, Vinod Kumar Vavilapalli < vino...@apache.org> 
wrote:

> The L & N fixes just went out, I’m working to push out 2.7.3 - running 
> into a Nexus issue. Once that goes out, I’ll immediately do a 2.8.0.
>
> Like I requested before in one of the 3.x threads, can we just line up
> 3.0.0-alpha1 right behind 2.8.0?
>
> That simplifies most of this confusion, we can avoid splitting the 
> bandwidth from the community on fixing blockers / vetting these 
> concurrent releases. Waiting a little more for 3.0.0 alpha to avoid 
> most of this is worth it, IMO.
>
> Thanks
> +Vinod
>
> > On Jul 21, 2016, at 11:34 AM, Andrew Wang 
> wrote:
> >
> > Hi all,
> >
> > Since we're planning to spin releases off of both branch-2 and 
> > trunk, the changelog for 3.0.0-alpha1 based on JIRA information 
> > isn't accurate. This is because historically, we've only set 2.x fix 
> > versions, and 2.8.0 and
> > 2.9.0 and etc have not been released. So there's a whole bunch of 
> > changes which will show up for the first time in 3.0.0-alpha1.
> >
> > I think I can write a script to (carefully) add 3.0.0-alpha1 to 
> > these JIRAs, but I figured I'd give a heads up here in case anyone 
> > felt differently. I can also update the HowToCommit page to match.
> >
> > Thanks,
> > Andrew
>
>

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

RE: Not being able to add HDFS contributors

2016-07-19 Thread Zheng, Kai

Thanks Andrew for the work around!! It works great ...

Regards,
Kai

-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Wednesday, July 20, 2016 8:10 AM
To: Chris Douglas <chris.doug...@gmail.com>
Cc: Zheng, Kai <kai.zh...@intel.com>; common-dev@hadoop.apache.org
Subject: Re: Not being able to add HDFS contributors

What works for me is pasting in the JIRA userID, checking "ignore popups from 
this page" to quash the browser alerts, and then hitting the "update"
button.

What's broken is the username auto-complete, actually saving works fine.

On Tue, Jul 19, 2016 at 5:08 PM, Chris Douglas <chris.doug...@gmail.com>
wrote:

> I had the same problem. Infra was able to add them, but I kept getting 
> an error. -C
>
> On Tue, Jul 19, 2016 at 2:29 PM, Zheng, Kai <kai.zh...@intel.com> wrote:
> > Hi,
> >
> > I tried many times in the week at different time but just found it's 
> > not
> possible to add more HDFS contributors. I can add some Hadoop ones, though.
> It becomes an issue because without adding someone and assigning 
> issues to him first, he won't be able to work on it and upload patches ...
> >
> > Could anyone help look at this? Thx a lot!
> >
> > Regards,
> > Kai
> >
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>

Not being able to add HDFS contributors

2016-07-19 Thread Zheng, Kai

Hi,

I tried many times in the week at different time but just found it's not 
possible to add more HDFS contributors. I can add some Hadoop ones, though. It 
becomes an issue because without adding someone and assigning issues to him 
first, he won't be able to work on it and upload patches ...

Could anyone help look at this? Thx a lot!

Regards,
Kai

Clean up target/fix versions

2016-07-18 Thread Zheng, Kai

Hi,

I noticed it's pretty hard to opt a version (say 3.0-alpha1) in the fix/target 
version box in the JIRA system since the list is pretty long and not well 
sorted. Could we clean it up or resort/re-list them in order for the most 
possibly used ones to be displayed first.

Regards,
Kai

RE: Different JIRA permissions for HADOOP and HDFS

2016-06-20 Thread Zheng, Kai

Yeah, this would be great, so some guys like me won't need to trouble you 
asking the question again and again :). Thanks a lot.

Regards,
Kai

-Original Message-
From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp] 
Sent: Monday, June 20, 2016 3:17 PM
To: Zheng, Kai <kai.zh...@intel.com>; common-dev@hadoop.apache.org
Subject: Re: Different JIRA permissions for HADOOP and HDFS

There is no doc.

1. Login to ASF JIRA
2. Go to the project page (e.g. 
https://issues.apache.org/jira/browse/HADOOP ) 3. Hit "Administration" tab 4. 
Hit "Roles" tab in left side 5. Add administrators/committers/contributors

I'll document this in https://wiki.apache.org/hadoop/HowToCommit

Regards,
Akira

On 6/20/16 16:08, Zheng, Kai wrote:
> Thanks Akira for the nice info. So where is the link to do it or any how to 
> doc? Sorry I browsed the existing wiki doc but didn't find how to add 
> contributors.
>
> Regards,
> Kai
>
> -Original Message-
> From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp]
> Sent: Monday, June 20, 2016 12:22 PM
> To: Zheng, Kai <kai.zh...@intel.com>; common-dev@hadoop.apache.org
> Subject: Re: Different JIRA permissions for HADOOP and HDFS
>
> Yes, the role allows committers to add/remove all the roles.
>
> Now about 400 accounts have contributors roles in Hadoop common, and about 
> 1000 contributors in history.
>
> Regards,
> Akira
>
> On 6/19/16 19:43, Zheng, Kai wrote:
>> Thanks Akira for the work.
>>
>> What the committer role can do in addition to the committing codes? Can the 
>> role allow to add/remove a contributor? As I said in my last email, I want 
>> to have some contributor(s) back and may add more in some time later.
>>
>> Not sure if we need to clean up long time no active contributors. It may be 
>> nice to know how many contributors the project has in its history. If the 
>> list is too long, maybe we can put them in another list, like 
>> OLD_CONTRIBUTORS.
>>
>> Regards,
>> Kai
>>
>> -Original Message-
>> From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp]
>> Sent: Saturday, June 18, 2016 12:56 PM
>> To: Zheng, Kai <kai.zh...@intel.com>; common-dev@hadoop.apache.org
>> Subject: Re: Different JIRA permissions for HADOOP and HDFS
>>
>> I'm doing the following steps to reduce the number of contributors:
>>
>> 1. Find committers who have only contributor role 2. Add them into 
>> committer role 3. Remove them from contributor role
>>
>> However, this is a temporary solution.
>> Probably we need to do one of the followings in the near future.
>>
>> * Create contributor2 role to increase the limit
>> * Remove contributors who have not been active for a long time
>>
>> Regards,
>> Akira
>>
>> On 6/18/16 10:24, Zheng, Kai wrote:
>>> Hi Akira,
>>>
>>> Some contributors (not committer) I know were found lost and we can't 
>>> assign tasks to. Any way I can add them or have to trouble others for that 
>>> each time when there is a new one? Thanks!
>>>
>>> Regards,
>>> Kai
>>>
>>> -Original Message-
>>> From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp]
>>> Sent: Monday, June 06, 2016 12:47 AM
>>> To: common-dev@hadoop.apache.org
>>> Subject: Re: Different JIRA permissions for HADOOP and HDFS
>>>
>>> Now I can't add any more contributors in HADOOP Common, so I'll remove the 
>>> contributors who have committers role to make the group smaller.
>>> Please tell me if you have lost your roles by mistake.
>>>
>>> Regards,
>>> Akira
>>>
>>> On 5/18/16 13:48, Akira AJISAKA wrote:
>>>> In HADOOP/HDFS/MAPREDUCE/YARN, I removed the administrators from 
>>>> contributor group. After that, added Varun into contributor roles.
>>>> # Ray is already added into contributor roles.
>>>>
>>>> Hi contributors/committers, please tell me if you have lost your 
>>>> roles by mistake.
>>>>
>>>>> just remove a big chunk of the committers from all the lists
>>>> In Apache Hadoop Project bylaws, "A Committer is considered 
>>>> emeritus by their own declaration or by not contributing in any 
>>>> form to the project for over six months." Therefore we can remove 
>>>> them from the list, but I'm thinking this is the last option.
>>>>
>>>> Regards,
>>>> Akira
>>>>
>>>> On 5/18/16 09:07, Allen Wittenauer wrote:
>>>>>
>>>>

RE: Different JIRA permissions for HADOOP and HDFS

2016-06-20 Thread Zheng, Kai

Thanks Akira for the nice info. So where is the link to do it or any how to 
doc? Sorry I browsed the existing wiki doc but didn't find how to add 
contributors.

Regards,
Kai

-Original Message-
From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp] 
Sent: Monday, June 20, 2016 12:22 PM
To: Zheng, Kai <kai.zh...@intel.com>; common-dev@hadoop.apache.org
Subject: Re: Different JIRA permissions for HADOOP and HDFS

Yes, the role allows committers to add/remove all the roles.

Now about 400 accounts have contributors roles in Hadoop common, and about 1000 
contributors in history.

Regards,
Akira

On 6/19/16 19:43, Zheng, Kai wrote:
> Thanks Akira for the work.
>
> What the committer role can do in addition to the committing codes? Can the 
> role allow to add/remove a contributor? As I said in my last email, I want to 
> have some contributor(s) back and may add more in some time later.
>
> Not sure if we need to clean up long time no active contributors. It may be 
> nice to know how many contributors the project has in its history. If the 
> list is too long, maybe we can put them in another list, like 
> OLD_CONTRIBUTORS.
>
> Regards,
> Kai
>
> -Original Message-
> From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp]
> Sent: Saturday, June 18, 2016 12:56 PM
> To: Zheng, Kai <kai.zh...@intel.com>; common-dev@hadoop.apache.org
> Subject: Re: Different JIRA permissions for HADOOP and HDFS
>
> I'm doing the following steps to reduce the number of contributors:
>
> 1. Find committers who have only contributor role 2. Add them into 
> committer role 3. Remove them from contributor role
>
> However, this is a temporary solution.
> Probably we need to do one of the followings in the near future.
>
> * Create contributor2 role to increase the limit
> * Remove contributors who have not been active for a long time
>
> Regards,
> Akira
>
> On 6/18/16 10:24, Zheng, Kai wrote:
>> Hi Akira,
>>
>> Some contributors (not committer) I know were found lost and we can't assign 
>> tasks to. Any way I can add them or have to trouble others for that each 
>> time when there is a new one? Thanks!
>>
>> Regards,
>> Kai
>>
>> -Original Message-
>> From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp]
>> Sent: Monday, June 06, 2016 12:47 AM
>> To: common-dev@hadoop.apache.org
>> Subject: Re: Different JIRA permissions for HADOOP and HDFS
>>
>> Now I can't add any more contributors in HADOOP Common, so I'll remove the 
>> contributors who have committers role to make the group smaller.
>> Please tell me if you have lost your roles by mistake.
>>
>> Regards,
>> Akira
>>
>> On 5/18/16 13:48, Akira AJISAKA wrote:
>>> In HADOOP/HDFS/MAPREDUCE/YARN, I removed the administrators from 
>>> contributor group. After that, added Varun into contributor roles.
>>> # Ray is already added into contributor roles.
>>>
>>> Hi contributors/committers, please tell me if you have lost your 
>>> roles by mistake.
>>>
>>>> just remove a big chunk of the committers from all the lists
>>> In Apache Hadoop Project bylaws, "A Committer is considered emeritus 
>>> by their own declaration or by not contributing in any form to the 
>>> project for over six months." Therefore we can remove them from the 
>>> list, but I'm thinking this is the last option.
>>>
>>> Regards,
>>> Akira
>>>
>>> On 5/18/16 09:07, Allen Wittenauer wrote:
>>>>
>>>> We should probably just remove a big chunk of the committers from 
>>>> all the lists.  Most of them have disappeared from Hadoop anyway.
>>>> (The 55% growth in JIRA issues in patch available state in the past 
>>>> year alone a pretty good testament to that fact.)
>>>>
>>>>> On May 17, 2016, at 4:40 PM, Akira Ajisaka <aajis...@apache.org> wrote:
>>>>>
>>>>>> Is there some way for us to add a "Contributors2" group with the 
>>>>>> same permissions as a workaround?  Or we could try to clean out 
>>>>>> contributors who are no longer active, but that might be hard to figure 
>>>>>> out.
>>>>>
>>>>> Contributors2 seems fine. AFAIK, committers sometimes cleaned out 
>>>>> contributors who are no longer active.
>>>>> http://search-hadoop.com/m/uOzYt77s6mnzcRu1/v=threaded
>>>>>
>>>>> Another option: Can we remove committers from contributor group to 
>>>>> reduce the number of contributor

RE: A top container module like hadoop-cloud for cloud integration modules

2016-06-19 Thread Zheng, Kai

Thanks Steve for the feedback and thoughts. 

Looks like people don't want to move around the related modules as it may not 
add much real value. It's fine. I may provide better thoughts later when learn 
the aspect deeper.

Regards,
Kai

-Original Message-
From: Steve Loughran [mailto:ste...@hortonworks.com] 
Sent: Wednesday, June 15, 2016 6:16 PM
To: Zheng, Kai <kai.zh...@intel.com>
Cc: common-dev@hadoop.apache.org
Subject: Re: A top container module like hadoop-cloud for cloud integration 
modules

> On 13 Jun 2016, at 14:02, Zheng, Kai <kai.zh...@intel.com> wrote:
> 
> Hi,
> 
> Noticed it's an obvious trend Hadoop is supporting more and more cloud 
> platforms, I suggest we have a top container module to hold such integration 
> modules, like the ones for aws, openstack, azure and upcoming one aliyun. The 
> rational is simple besides the trend:

I'm kind of =0 right now

> 
> 1.   Existing modules are mixed in Hadoop-tools that becomes a little big 
> being of 18 modules now. Cloud specific ones can be grouped together and 
> separated out, making more sense;

the reason for having separate hadoop-aws, hadoop-openstack modules was always 
to permit the modules to use APIs exclusive to cloud infrastructures, structure 
the downstream dependencies, *and* allow people like the EMR team to swap in 
their own closed-source version. I don't think anyone does that though.

It also lets us completely isolate testing: each module's tests only run if you 
have the credentials.

> 
> 2.   Future abstraction and common specs & codes sharing could be easier 
> or thereafter allowed;

Right now hadoop-common is where cross FS work and tests go. (Hint, reviewers 
for HADOOP-12807 needed.). I think we could start there with 
org.apache.hadoop.cloud package and only split it out if compilation ordering 
merits it —or it adds any dependencies to hadoop-common.

> 
> 3.   Common testing approach could be defined together, for example, some 
> mechanisms as discussed by Chris, Steve and Allen in HADOOP-12756;
> 

In SPARK-7481 I've added downstream tests for S3a and azure in spark; this 
shows up that S3a in Hadoop 2.6 gets its blocksize wrong (0) in listings, so 
the splits are all 1 byte wrong; work dies. I think downstream tests in: Spark, 
Hive, etc would really round out cloud infra testing, but we can't put those 
into Hadoop as the build DAG prevents it. (Reviews for SPARK-7481 needed too, 
BTW). System tests of Aliyun and perhaps GFS connectors would need to go in 
there or in bigtop —which is the other place I've discussed having cloud 
integration tests.

> 4.   Documentation for "Hadoop on Cloud"? Not sure it's needed, as we 
> already have a section for "Hadoop compatible File Systems".

Again, we can stick this in common

> 
> If sounds good, the change would be a good fit for Hadoop 3.0, even though 
> the change should not involve big impact, as it can avoid affecting the 
> artifacts. It may cause some inconveniences for the current development 
> efforts, though.
> 

I think it would make sense if other features went in. A good committer against 
object stores would be an example here: it depends on the MR libraries, so 
can't go into common.Today it'd have to go into hadoop-mapreduce. This isn't 
too bad, as long as the APIs it uses are all in hadoop-common. It's only as 
things get more complex that it matters.

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

RE: Different JIRA permissions for HADOOP and HDFS

2016-06-19 Thread Zheng, Kai

Thanks Akira for the work. 

What the committer role can do in addition to the committing codes? Can the 
role allow to add/remove a contributor? As I said in my last email, I want to 
have some contributor(s) back and may add more in some time later.

Not sure if we need to clean up long time no active contributors. It may be 
nice to know how many contributors the project has in its history. If the list 
is too long, maybe we can put them in another list, like OLD_CONTRIBUTORS.

Regards,
Kai

-Original Message-
From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp] 
Sent: Saturday, June 18, 2016 12:56 PM
To: Zheng, Kai <kai.zh...@intel.com>; common-dev@hadoop.apache.org
Subject: Re: Different JIRA permissions for HADOOP and HDFS

I'm doing the following steps to reduce the number of contributors:

1. Find committers who have only contributor role 2. Add them into committer 
role 3. Remove them from contributor role

However, this is a temporary solution.
Probably we need to do one of the followings in the near future.

* Create contributor2 role to increase the limit
* Remove contributors who have not been active for a long time

Regards,
Akira

On 6/18/16 10:24, Zheng, Kai wrote:
> Hi Akira,
>
> Some contributors (not committer) I know were found lost and we can't assign 
> tasks to. Any way I can add them or have to trouble others for that each time 
> when there is a new one? Thanks!
>
> Regards,
> Kai
>
> -Original Message-
> From: Akira AJISAKA [mailto:ajisa...@oss.nttdata.co.jp]
> Sent: Monday, June 06, 2016 12:47 AM
> To: common-dev@hadoop.apache.org
> Subject: Re: Different JIRA permissions for HADOOP and HDFS
>
> Now I can't add any more contributors in HADOOP Common, so I'll remove the 
> contributors who have committers role to make the group smaller.
> Please tell me if you have lost your roles by mistake.
>
> Regards,
> Akira
>
> On 5/18/16 13:48, Akira AJISAKA wrote:
>> In HADOOP/HDFS/MAPREDUCE/YARN, I removed the administrators from 
>> contributor group. After that, added Varun into contributor roles.
>> # Ray is already added into contributor roles.
>>
>> Hi contributors/committers, please tell me if you have lost your 
>> roles by mistake.
>>
>>> just remove a big chunk of the committers from all the lists
>> In Apache Hadoop Project bylaws, "A Committer is considered emeritus 
>> by their own declaration or by not contributing in any form to the 
>> project for over six months." Therefore we can remove them from the 
>> list, but I'm thinking this is the last option.
>>
>> Regards,
>> Akira
>>
>> On 5/18/16 09:07, Allen Wittenauer wrote:
>>>
>>> We should probably just remove a big chunk of the committers from 
>>> all the lists.  Most of them have disappeared from Hadoop anyway.  
>>> (The 55% growth in JIRA issues in patch available state in the past 
>>> year alone a pretty good testament to that fact.)
>>>
>>>> On May 17, 2016, at 4:40 PM, Akira Ajisaka <aajis...@apache.org> wrote:
>>>>
>>>>> Is there some way for us to add a "Contributors2" group with the 
>>>>> same permissions as a workaround?  Or we could try to clean out 
>>>>> contributors who are no longer active, but that might be hard to figure 
>>>>> out.
>>>>
>>>> Contributors2 seems fine. AFAIK, committers sometimes cleaned out 
>>>> contributors who are no longer active.
>>>> http://search-hadoop.com/m/uOzYt77s6mnzcRu1/v=threaded
>>>>
>>>> Another option: Can we remove committers from contributor group to 
>>>> reduce the number of contributors? I've already removed myself from 
>>>> contributor group and it works well.
>>>>
>>>> Regards,
>>>> Akira
>>>>
>>>> On 5/18/16 03:16, Robert Kanter wrote:
>>>>> We've also had a related long-standing issue (or at least I have) 
>>>>> where I can't add any more contributors to HADOOP or HDFS because 
>>>>> JIRA times out on looking up their username.  I'm guessing we have 
>>>>> too many contributors for those projects.  I bet YARN and MAPREDUCE are 
>>>>> close.
>>>>> Is there some way for us to add a "Contributors2" group with the 
>>>>> same permissions as a workaround?  Or we could try to clean out 
>>>>> contributors who are no longer active, but that might be hard to figure 
>>>>> out.
>>>>>
>>>>> - Robert
>>>>>
>>>>> On Tue, May 17, 2016

RE: Different JIRA permissions for HADOOP and HDFS

2016-06-17 Thread Zheng, Kai

 HADOOP/MAPREDUCE/YARN.
>>>>> > >
>>>>> > > Regards,
>>>>> > > Akira
>>>>> > >
>>>>> > > On 5/17/16 13:45, Vinayakumar B wrote:
>>>>> > >
>>>>> > >> Hi Junping,
>>>>> > >>
>>>>> > >> It looks like, I too dont have permissions in projects except
>>>>HDFS.
>>>>> > >>
>>>>> > >> Please grant me also to the group.
>>>>> > >>
>>>>> > >> Thanks in advance,
>>>>> > >> -Vinay
>>>>> > >> On 17 May 2016 6:10 a.m., "Sangjin Lee" <sj...@apache.org
>>>><mailto:sj...@apache.org>> wrote:
>>>>> > >>
>>>>> > >> Thanks Junping! It seems to work now.
>>>>> > >>
>>>>> > >> On Mon, May 16, 2016 at 5:22 PM, Junping Du
>>>><j...@hortonworks.com <mailto:j...@hortonworks.com>>
>>>>> > wrote:
>>>>> > >>
>>>>> > >> Someone fix the permission issue so that Administrator,
>>>>committer and
>>>>> > >>> reporter can edit the issue now.
>>>>> > >>>
>>>>> > >>> Sangjin, it sounds like you were not in JIRA's committer
>>>>list before
>>>>> > and
>>>>> > >>> I
>>>>> > >>> just add you into committer roles for 4 projects. Hope it
>>>>works for
>>>>> > >>> you now.
>>>>> > >>>
>>>>> > >>>
>>>>> > >>> Thanks,
>>>>> > >>>
>>>>> > >>>
>>>>> > >>> Junping
>>>>> > >>> --
>>>>> > >>> *From:* sjl...@gmail.com <mailto:sjl...@gmail.com>
>>>><sjl...@gmail.com <mailto:sjl...@gmail.com>> on behalf of Sangjin
>>>>> Lee <
>>>>> > >>> sj...@apache.org <mailto:sj...@apache.org>>
>>>>> > >>> *Sent:* Monday, May 16, 2016 11:43 PM
>>>>> > >>> *To:* Zhihai Xu
>>>>> > >>> *Cc:* Junping Du; Arun Suresh; Zheng, Kai; Andrew Wang;
>>>>> > >>> common-dev@hadoop.apache.org
>>>><mailto:common-dev@hadoop.apache.org>; yarn-...@hadoop.apache.org
>>>><mailto:yarn-...@hadoop.apache.org>
>>>>> > >>>
>>>>> > >>> *Subject:* Re: Different JIRA permissions for HADOOP and 
>>>> HDFS
>>>>> > >>>
>>>>> > >>> I also find myself unable to edit most of the JIRA fields,
>>>>and that
>>>>> is
>>>>> > >>> across projects (HADOOP, YARN, MAPREDUCE, and HDFS).
>>>>Commenting and
>>>>> the
>>>>> > >>> workflow buttons seem to work, however.
>>>>> > >>>
>>>>> > >>> On Mon, May 16, 2016 at 8:14 AM, Zhihai Xu <zhi...@uber.com
>>>><mailto:zhi...@uber.com>> wrote:
>>>>> > >>>
>>>>> > >>> Great, Thanks Junping! Yes, the JIRA assignment works for me
>>>>now.
>>>>> > >>>>
>>>>> > >>>> zhihai
>>>>> > >>>>
>>>>> > >>>> On Mon, May 16, 2016 at 5:29 AM, Junping Du
>>>><j...@hortonworks.com <mailto:j...@hortonworks.com>>
>>>>> > >>>> wrote:
>>>>> > >>>>
>>>>> > >>>> Zhihai, I just set you with committer permissions on
>>>>MAPREDUCE JIRA.
>>>>> > >>>>>
>>>>> > >>>> Would
>>>>> > >>>>
>>>>> > >>>>> you try if the JIRA assignment works now? I cannot help on
>>>>Hive
>>>>> > >

A top container module like hadoop-cloud for cloud integration modules

2016-06-13 Thread Zheng, Kai

Hi,

Noticed it's an obvious trend Hadoop is supporting more and more cloud 
platforms, I suggest we have a top container module to hold such integration 
modules, like the ones for aws, openstack, azure and upcoming one aliyun. The 
rational is simple besides the trend:

1.   Existing modules are mixed in Hadoop-tools that becomes a little big 
being of 18 modules now. Cloud specific ones can be grouped together and 
separated out, making more sense;

2.   Future abstraction and common specs & codes sharing could be easier or 
thereafter allowed;

3.   Common testing approach could be defined together, for example, some 
mechanisms as discussed by Chris, Steve and Allen in HADOOP-12756;

4.   Documentation for "Hadoop on Cloud"? Not sure it's needed, as we 
already have a section for "Hadoop compatible File Systems".

If sounds good, the change would be a good fit for Hadoop 3.0, even though the 
change should not involve big impact, as it can avoid affecting the artifacts. 
It may cause some inconveniences for the current development efforts, though.

Comments are welcome. Thanks!

Regards,
Kai

RE: About fix versions

2016-05-29 Thread Zheng, Kai

Thanks Allen a lot. This is comprehensive. 

>> So if a patch has been committed to branch-2.8, branch-2, and trunk, then 
>> the fix version should be 2.8.0 and only 2.8.0.  
This sounds like the right rule I seem to need and want to know, but guess it 
may change around the 3.0 release.

Regards,
Kai

-Original Message-
From: Allen Wittenauer [mailto:allenwittena...@yahoo.com] 
Sent: Sunday, May 29, 2016 12:17 PM
To: Zheng, Kai <kai.zh...@intel.com>
Cc: common-dev@hadoop.apache.org
Subject: Re: About fix versions

> On May 28, 2016, at 5:13 PM, Zheng, Kai <kai.zh...@intel.com> wrote:
> 
> Hi,
> 
> This may be a stupid question but I want to make sure. What fix versions 
> would we fill with when a committer just wants to commit a patch to trunk or 
> branch-2 branch?

This is covered on the https://wiki.apache.org/hadoop/HowToCommit page:

== snip ==
Resolve the issue as fixed, thanking the contributor. Always set the "Fix 
Version" at this point, but please only set a single fix version, the earliest 
release in which the change will appear. Special case- when committing to a 
non-mainline branch (such as branch-0.22 or branch-0.23 ATM), please set 
fix-version to either 2.x.x or 3.x.x appropriately too.
== snip ==

So if a patch has been committed to branch-2.8, branch-2, and trunk, 
then the fix version should be 2.8.0 and only 2.8.0.  

Bear in mind that this field determines what changes appear in the 
changelog and release notes.  If this field is filled out incorrectly, then 
this commit will effectively be missing for end users or appear in the wrong 
version as ‘new’.

>  I remembered it's a release manager's role to decide which jira/patch to 
> include when working on a release. Would anyone help clarify a bit about 
> this, thanks!

This is when cutting the release and has no bearing on what committers 
should be putting in JIRA.  If an RM decides that a commit shouldn’t be in a 
release, they are responsible for reverting the commit and changing the fix 
version to whatever is appropriate.

About fix versions

2016-05-28 Thread Zheng, Kai

Hi,

This may be a stupid question but I want to make sure. What fix versions would 
we fill with when a committer just wants to commit a patch to trunk or branch-2 
branch? I remembered it's a release manager's role to decide which jira/patch 
to include when working on a release. Would anyone help clarify a bit about 
this, thanks!

Regards,
Kai

RE: Release numbering for 3.x leading up to GA

2016-05-24 Thread Zheng, Kai

Thanks for thinking about this, Andrew and Zhe. I updated the patch today for 
HADOOP-13010 (rather large) and would be great if we could get it in the week. 

Regards,
Kai

-Original Message-
From: Zhe Zhang [mailto:zhe.zhang.resea...@gmail.com] 
Sent: Friday, May 20, 2016 2:38 PM
To: Andrew Wang 
Cc: Gangumalla, Uma ; Roman Shaposhnik 
; Karthik Kambatla ; 
common-dev@hadoop.apache.org
Subject: Re: Release numbering for 3.x leading up to GA

Thanks Andrew. I also had a talk with Kai offline. Agreed that we should try 
our best to finalize coder config changes for alpha1.

On Tue, May 17, 2016 at 5:34 PM Andrew Wang 
wrote:

> The sooner the better for incompatible changes, but at this point we 
> are explicitly not guaranteeing any compatibility between alpha releases.
>
> For EC, my understanding is that we're still working on the coder 
> configuration. Given that we're still working on L changes, I think 
> it's possible that the coder configuration will be finished in time.
>
> On Tue, May 17, 2016 at 4:42 PM, Zhe Zhang 
> 
> wrote:
>
> > Naming convention looks good. Thanks Andrew for driving this!
> >
> > Could you explain a little more on the criteria of cutting alpha1 / 
> > alpha2? What are the goals we want to achieve for alpha1? From EC's 
> > perspective, maybe we should target on having all 
> > compatibility-related changes in alpha1, like new config keys and fsimage 
> > format?
> >
> > Thanks,
> >
> > On Thu, May 12, 2016 at 11:35 AM Andrew Wang 
> > 
> > wrote:
> >
> >> Hi folks,
> >>
> >> I think it's working, though it takes some time for the rename to 
> >> propagate in JIRA. JIRA is also currently being hammered by 
> >> spammers, which might
> be
> >> related.
> >>
> >> Anyway, the new "3.0.0-alpha1" version should be live for all four 
> >> subprojects, so have at it!
> >>
> >> Best,
> >> Andrew
> >>
> >> On Thu, May 12, 2016 at 11:01 AM, Gangumalla, Uma < 
> >> uma.ganguma...@intel.com>
> >> wrote:
> >>
> >> > Thanks Andrew for driving. Sounds good. Go ahead please.
> >> >
> >> > Good luck :-)
> >> >
> >> > Regards,
> >> > Uma
> >> >
> >> > On 5/12/16, 10:52 AM, "Andrew Wang"  wrote:
> >> >
> >> > >Hi all,
> >> > >
> >> > >Sounds like we have general agreement on this release numbering
> scheme
> >> for
> >> > >3.x.
> >> > >
> >> > >I'm going to attempt some mvn and JIRA invocations to get the 
> >> > >version numbers lined up for alpha1, wish me luck.
> >> > >
> >> > >Best,
> >> > >Andrew
> >> > >
> >> > >On Tue, May 3, 2016 at 9:52 AM, Roman Shaposhnik <
> ro...@shaposhnik.org
> >> >
> >> > >wrote:
> >> > >
> >> > >> On Tue, May 3, 2016 at 8:18 AM, Karthik Kambatla <
> ka...@cloudera.com
> >> >
> >> > >> wrote:
> >> > >> > The naming scheme sounds good. Since we want to start out 
> >> > >> > sooner,
> >> I am
> >> > >> > assuming we are not limiting ourselves to two alphas as the 
> >> > >> > email
> >> > >>might
> >> > >> > indicate.
> >> > >> >
> >> > >> > Also, as the release manager, can you elaborate on your
> >> definitions of
> >> > >> > alpha and beta? Specifically, when do we expect downstream
> >> projects to
> >> > >> try
> >> > >> > and integrate and when we expect Hadoop users to try out the
> bits?
> >> > >>
> >> > >> Not to speak of all the downstream PMC,s but Bigtop project 
> >> > >> will
> jump
> >> > >> on the first alpha the same way we jumped on the first alpha 
> >> > >> back in the 1 -> 2 transition period.
> >> > >>
> >> > >> Given that Bigtop currently integrates quite a bit of Hadoop
> >> ecosystem
> >> > >> that work is going to produce valuable feedback that we plan 
> >> > >>to communicate  to the individual PMCs. What PMCs do with that 
> >> > >>feedback, of course,
> >> will
> >> > >> be up to them (obviously Bigtop can't take the ownership of 
> >> > >> issues
> >> that
> >> > >> go outside of integration work between projects in the Hadoop
> >> ecoystem).
> >> > >>
> >> > >> Thanks,
> >> > >> Roman.
> >> > >>
> >> >
> >> >
> >> > -
> >> >  To unsubscribe, e-mail: 
> >> > common-dev-unsubscr...@hadoop.apache.org
> >> > For additional commands, e-mail: 
> >> > common-dev-h...@hadoop.apache.org
> >> >
> >> >
> >>
> > --
> > Zhe Zhang
> > Apache Hadoop Committer
> > http://zhe-thoughts.github.io/about/ | @oldcap
> >
>
--
Zhe Zhang
Apache Hadoop Committer
http://zhe-thoughts.github.io/about/ | @oldcap

RE: Different JIRA permissions for HADOOP and HDFS

2016-05-16 Thread Zheng, Kai

It works for me now, thanks Andrew!

Regards,
Kai

-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Monday, May 16, 2016 12:14 AM
To: Zheng, Kai <kai.zh...@intel.com>
Cc: common-dev@hadoop.apache.org
Subject: Re: Different JIRA permissions for HADOOP and HDFS

I just gave you committer permissions on JIRA, try now?

On Mon, May 16, 2016 at 12:03 AM, Zheng, Kai <kai.zh...@intel.com> wrote:

> I just ran into the bad situation that I committed HDFS-8449 but can't 
> resolve the issue due to lacking the required permission to me. Am not 
> sure if it's caused by my setup or environment change (temporally 
> working in a new time zone). Would anyone help resolve the issue for 
> me to avoid bad state? Thanks!
>
> -----Original Message-
> From: Zheng, Kai [mailto:kai.zh...@intel.com]
> Sent: Sunday, May 15, 2016 3:20 PM
> To: Allen Wittenauer <allenwittena...@yahoo.com.INVALID>
> Cc: common-dev@hadoop.apache.org
> Subject: RE: Different JIRA permissions for HADOOP and HDFS
>
> Thanks Allen for illustrating this in details. I understand. The left 
> question is, is it intended only JIRA owner (not sure about admin 
> users) can do the operations like updating a patch?
>
> Regards,
> Kai
>
> -Original Message-
> From: Allen Wittenauer [mailto:allenwittena...@yahoo.com.INVALID]
> Sent: Saturday, May 14, 2016 9:38 AM
> To: Zheng, Kai <kai.zh...@intel.com>
> Cc: common-dev@hadoop.apache.org
> Subject: Re: Different JIRA permissions for HADOOP and HDFS
>
>
> > On May 14, 2016, at 7:07 AM, Zheng, Kai <kai.zh...@intel.com> wrote:
> >
> > Hi,
> >
> > Noticed this difference but not sure if it’s intended. YARN is 
> > similar
> with HDFS. It’s not convenient. Any clarifying?
>
>
> Under JIRA, different projects (e.g., HADOOP, YARN, MAPREDUCE, 
> HDFS, YETUS, HBASE, ACCUMULO, etc) may have different settings.  At 
> one point in time, all of the Hadoop subprojects were under one JIRA 
> project (HADOOP). But then a bunch of folks decided they didn’t want 
> to see the other sub projects issues so they split them up…. and thus 
> setting the stage for duplicate code and operational divergence in the source.
>
> Since people don’t realize or care that they are separate, 
> people will file INFRA tickets or whatever to change “their project” 
> and not the rest. This leads to the JIRA projects also diverging… 
> which ultimately drives those of us who actually look at the project as a 
> whole bonkers.
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

RE: Different JIRA permissions for HADOOP and HDFS

2016-05-16 Thread Zheng, Kai

I just ran into the bad situation that I committed HDFS-8449 but can't resolve 
the issue due to lacking the required permission to me. Am not sure if it's 
caused by my setup or environment change (temporally working in a new time 
zone). Would anyone help resolve the issue for me to avoid bad state? Thanks!

-Original Message-
From: Zheng, Kai [mailto:kai.zh...@intel.com] 
Sent: Sunday, May 15, 2016 3:20 PM
To: Allen Wittenauer <allenwittena...@yahoo.com.INVALID>
Cc: common-dev@hadoop.apache.org
Subject: RE: Different JIRA permissions for HADOOP and HDFS

Thanks Allen for illustrating this in details. I understand. The left question 
is, is it intended only JIRA owner (not sure about admin users) can do the 
operations like updating a patch?

Regards,
Kai

-Original Message-
From: Allen Wittenauer [mailto:allenwittena...@yahoo.com.INVALID] 
Sent: Saturday, May 14, 2016 9:38 AM
To: Zheng, Kai <kai.zh...@intel.com>
Cc: common-dev@hadoop.apache.org
Subject: Re: Different JIRA permissions for HADOOP and HDFS

> On May 14, 2016, at 7:07 AM, Zheng, Kai <kai.zh...@intel.com> wrote:
> 
> Hi,
>  
> Noticed this difference but not sure if it’s intended. YARN is similar with 
> HDFS. It’s not convenient. Any clarifying?

Under JIRA, different projects (e.g., HADOOP, YARN, MAPREDUCE, HDFS, 
YETUS, HBASE, ACCUMULO, etc) may have different settings.  At one point in 
time, all of the Hadoop subprojects were under one JIRA project (HADOOP). But 
then a bunch of folks decided they didn’t want to see the other sub projects 
issues so they split them up…. and thus setting the stage for duplicate code 
and operational divergence in the source. 

Since people don’t realize or care that they are separate, people will 
file INFRA tickets or whatever to change “their project” and not the rest. This 
leads to the JIRA projects also diverging… which ultimately drives those of us 
who actually look at the project as a whole bonkers.
-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

RE: Different JIRA permissions for HADOOP and HDFS

2016-05-15 Thread Zheng, Kai

Thanks Allen for illustrating this in details. I understand. The left question 
is, is it intended only JIRA owner (not sure about admin users) can do the 
operations like updating a patch?

Regards,
Kai

-Original Message-
From: Allen Wittenauer [mailto:allenwittena...@yahoo.com.INVALID] 
Sent: Saturday, May 14, 2016 9:38 AM
To: Zheng, Kai <kai.zh...@intel.com>
Cc: common-dev@hadoop.apache.org
Subject: Re: Different JIRA permissions for HADOOP and HDFS

> On May 14, 2016, at 7:07 AM, Zheng, Kai <kai.zh...@intel.com> wrote:
> 
> Hi,
>  
> Noticed this difference but not sure if it’s intended. YARN is similar with 
> HDFS. It’s not convenient. Any clarifying?

Under JIRA, different projects (e.g., HADOOP, YARN, MAPREDUCE, HDFS, 
YETUS, HBASE, ACCUMULO, etc) may have different settings.  At one point in 
time, all of the Hadoop subprojects were under one JIRA project (HADOOP). But 
then a bunch of folks decided they didn’t want to see the other sub projects 
issues so they split them up…. and thus setting the stage for duplicate code 
and operational divergence in the source. 

Since people don’t realize or care that they are separate, people will 
file INFRA tickets or whatever to change “their project” and not the rest. This 
leads to the JIRA projects also diverging… which ultimately drives those of us 
who actually look at the project as a whole bonkers.
-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

RE: Different JIRA permissions for HADOOP and HDFS

2016-05-14 Thread Zheng, Kai

Yeah, kinds of embarrassing. Thanks Ted.

Or simply, with login access some HADOOP or HDFS JIRAs like the following 
below, note the allowed operations are quite different, and most usable 
operations like attaching for HDFS JIRAs are not showing. Wondering they’re 
disabled recently for some reason?
https://issues.apache.org/jira/browse/HADOOP-12782
https://issues.apache.org/jira/browse/HDFS-10285

Regards,
Kai

From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Saturday, May 14, 2016 7:28 AM
To: Zheng, Kai <kai.zh...@intel.com>
Cc: common-dev@hadoop.apache.org
Subject: Re: Different JIRA permissions for HADOOP and HDFS

Looks like you attached some images which didn't go through.

Consider using 3rd party image site.

Cheers

On Sat, May 14, 2016 at 7:07 AM, Zheng, Kai 
<kai.zh...@intel.com<mailto:kai.zh...@intel.com>> wrote:
Hi,

Noticed this difference but not sure if it’s intended. YARN is similar with 
HDFS. It’s not convenient. Any clarifying? Thanks. -kai

Different JIRA permissions for HADOOP and HDFS

2016-05-14 Thread Zheng, Kai

Hi,

Noticed this difference but not sure if it's intended. YARN is similar with 
HDFS. It's not convenient. Any clarifying? Thanks. -kai

[cid:image001.png@01D1ADAF.16160940]


[cid:image002.png@01D1ADAF.16160940]

RE: Release numbering for 3.x leading up to GA

2016-05-02 Thread Zheng, Kai

Ok, got it. Thanks for the explanation.

Regards,
Kai

From: Andrew Wang [mailto:andrew.w...@cloudera.com]
Sent: Tuesday, May 03, 2016 5:41 AM
To: Zheng, Kai <kai.zh...@intel.com>
Cc: common-dev@hadoop.apache.org
Subject: Re: Release numbering for 3.x leading up to GA

>> but I'm going to spend time on our first RC this week.
Sorry what does this mean? Did you mean the first RC version or 3.0.0-alpha1 
will be cut out this week?
Anyway will try to get some tasks done sooner.
First RC for whatever we name the first 3.0 alpha release.

There's no need to rush things to make this first alpha, since there are more 
alphas planned.

That said, if you have changes that affect compatibility, the sooner the better 
:)

RE: Release numbering for 3.x leading up to GA

2016-05-02 Thread Zheng, Kai

Thanks for driving this, Andrew. Sounds great.

>> but I'm going to spend time on our first RC this week.
Sorry what does this mean? Did you mean the first RC version or 3.0.0-alpha1 
will be cut out this week?
Anyway will try to get some tasks done sooner.

Regards,
Kai

-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Tuesday, May 03, 2016 5:07 AM
To: Roman Shaposhnik 
Cc: common-dev@hadoop.apache.org
Subject: Re: Release numbering for 3.x leading up to GA

On Mon, May 2, 2016 at 2:00 PM, Roman Shaposhnik 
wrote:

> On Mon, May 2, 2016 at 1:50 PM, Andrew Wang 
> wrote:
> > Hi all,
> >
> > I wanted to confirm my version numbering plan for Hadoop 3.x. We had 
> > a related thread on this topic about a year ago, mostly focusing on 
> > the
> > branch-2 maintenance releases:
> >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201504.mbox
> /%3CFAA30A53-C2BD-4380-9245-C8DBEC7BF386%40hortonworks.com%3E
> >
> > For Hadoop 3, I wanted to do something like scheme (2) in Vinod's
> original
> > email from the above thread. e.g. leading up to GA, we'd have:
> >
> > 3.0.0-alpha1
> > 3.0.0-alpha2
> > 3.0.0-beta1
> > 3.0.0
>
> +1 on the naming scheme. Also (and I know this is an impossible
> question to answer, but still)
> what are the rough timing expectations on these?
>
> Thanks Roman. Regarding release timing, this is what we discussed on
another thread:

> For exit criteria, how about we time box it? My plan was to do monthly
alphas through the summer, leading up to beta in late August / early Sep.
At that point we freeze and stabilize for GA in Nov/Dec.

As you say, release plans need to be flexibly, but I'm going to spend time on 
our first RC this week.

RE: hadoop 2.7.2 build failure with error "plugin descriptor"

2016-03-29 Thread Zheng, Kai

Looks like you missed some maven plugin. May be helpful to run `mvn install` 
before the building.

-Original Message-
From: ? ? [mailto:yu20...@hotmail.com] 
Sent: Tuesday, March 29, 2016 3:39 PM
To: common-dev@hadoop.apache.org
Subject: hadoop 2.7.2 build failure with error "plugin descriptor"

Hi guys,


When trying to build hadoop 2.7.2  on Ubuntu 5.10, I met following problem.

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main . SUCCESS [  0.281 s] 
[INFO] Apache Hadoop Project POM .. SUCCESS [  0.422 s] 
[INFO] Apache Hadoop Annotations .. SUCCESS [  0.388 s] 
[INFO] Apache Hadoop Project Dist POM . SUCCESS [  0.056 s] 
[INFO] Apache Hadoop Assemblies ... SUCCESS [  0.078 s] 
[INFO] Apache Hadoop Maven Plugins  SUCCESS [  0.179 s] 
[INFO] Apache Hadoop MiniKDC .. SUCCESS [  0.388 s] 
[INFO] Apache Hadoop Auth . SUCCESS [  0.150 s] 
[INFO] Apache Hadoop Auth Examples  SUCCESS [  0.074 s] 
[INFO] Apache Hadoop Common ... FAILURE [  0.002 s] 
[INFO] Apache Hadoop NFS .. SKIPPED [INFO] 
Apache Hadoop KMS .. SKIPPED [INFO] Apache 
Hadoop Common Project ... SKIPPED [INFO] Apache Hadoop HDFS 
. SKIPPED [INFO] Apache Hadoop HttpFS 
... SKIPPED [INFO] Apache Hadoop HDFS BookKeeper 
Journal .. SKIPPED [INFO] Apache Hadoop HDFS-NFS 
. SKIPPED [INFO] Apache Hadoop HDFS Project 
. SKIPPED



[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 4.249 s
[INFO] Finished at: 2016-03-29T15:35:05+08:00 [INFO] Final Memory: 39M/215M 
[INFO] 
[ERROR] Failed to parse plugin descriptor for 
org.apache.hadoop:hadoop-maven-plugins:2.7.2 
(/home/opensrc/hadoop-2.7.2-src/hadoop-maven-plugins/target/classes): No plugin 
descriptor found at META-INF/maven/plugin.xml -> [Help 1]
org.apache.maven.plugin.PluginDescriptorParsingException: Failed to parse 
plugin descriptor for org.apache.hadoop:hadoop-maven-plugins:2.7.2 
(/home/opensrc/hadoop-2.7.2-src/hadoop-maven-plugins/target/classes): No plugin 
descriptor found at META-INF/maven/plugin.xml
at 
org.apache.maven.plugin.internal.DefaultMavenPluginManager.extractPluginDescriptor(DefaultMavenPluginManager.java:249)
at 
org.apache.maven.plugin.internal.DefaultMavenPluginManager.getPluginDescriptor(DefaultMavenPluginManager.java:184)
at 
org.apache.maven.plugin.internal.DefaultMavenPluginManager.getMojoDescriptor(DefaultMavenPluginManager.java:298)
at 
org.apache.maven.plugin.DefaultBuildPluginManager.getMojoDescriptor(DefaultBuildPluginManager.java:241)
at 
org.apache.maven.lifecycle.internal.DefaultLifecycleExecutionPlanCalculator.setupMojoExecution(DefaultLifecycleExecutionPlanCalculator.java:169)
at 
org.apache.maven.lifecycle.internal.DefaultLifecycleExecutionPlanCalculator.setupMojoExecutions(DefaultLifecycleExecutionPlanCalculator.java:155)
at 
org.apache.maven.lifecycle.internal.DefaultLifecycleExecutionPlanCalculator.calculateExecutionPlan(DefaultLifecycleExecutionPlanCalculator.java:131)
at 
org.apache.maven.lifecycle.internal.DefaultLifecycleExecutionPlanCalculator.calculateExecutionPlan(DefaultLifecycleExecutionPlanCalculator.java:145)
at 
org.apache.maven.lifecycle.internal.builder.BuilderCommon.resolveBuildPlan(BuilderCommon.java:96)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:109)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:862)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:286)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:197)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at

RE: [VOTE] Accept Chimera as new Apache Commons Component

2016-03-23 Thread Zheng, Kai

Nice proposal. The high performance crypto offering is nice to be broadly 
accessible. Thanks.

Non-binding +1.

Regards,
Kai

-Original Message-
From: Chris Nauroth [mailto:cnaur...@hortonworks.com] 
Sent: Wednesday, March 23, 2016 5:47 AM
To: common-dev@hadoop.apache.org; Commons Developers List 

Subject: Re: [VOTE] Accept Chimera as new Apache Commons Component

+1 (non-binding)

--Chris Nauroth




On 3/21/16, 1:45 AM, "Benedikt Ritter"  wrote:

>Hi all,
>
>after long discussions I think we have gathered enough information to 
>decide whether we want to accept the Chimera project as a new Apache 
>Commons component.
>
>Proposed name: Apache Commons Crypto
>Proposal text:
>https://github.com/intel-hadoop/chimera/blob/master/PROPOSAL.html
>Initial Code Base:  https://github.com/intel-hadoop/chimera/
>Initial Committers (Names in alphabetical order):
>- Aaron T. Myers (a...@apache.org, Apache Hadoop PMC, one of the 
>original Crypto dev team in Apache Hadoop)
>- Andrew Wang (w...@apache.org, Apache Hadoop PMC, one of the original 
>Crypto dev team in Apache Hadoop)
>- Chris Nauroth (cnaur...@apache.org, Apache Hadoop PMC and active
>reviewer)
>- Colin P. McCabe (cmcc...@apache.org, Apache Hadoop PMC, one of the 
>original Crypto dev team in Apache Hadoop)
>- Dapeng Sun (s...@apache.org, Apache Sentry Committer, Chimera
>contributor)
>- Dian Fu (dia...@apache.org, Apache Sqoop Committer, Chimera 
>contributor)
>- Dong Chen (do...@apache.org, Apache Hive Committer,interested on
>Chimera)
>- Ferdinand Xu (x...@apache.org, Apache Hive Committer, Chimera
>contributor)
>- Haifeng Chen (haifengc...@apache.org, Chimera lead and code 
>contributor)
>- Marcelo Vanzin (Apache Spark Committer, Chimera contributor)
>- Uma Maheswara Rao G (umamah...@apache.org, Apache Hadoop PMC, One of 
>the original Crypto dev/review team in Apache Hadoop)
>- Yi Liu (y...@apache.org, Apache Hadoop PMC, One of the original 
>Crypto dev/review team in Apache Hadoop)
>
>Please review the proposal and vote.
>This vote will close no sooner than 72 hours from now, i.e. after 0900 
>GMT 24-Mar 2016
>
>  [ ] +1 Accept Chimera as new Apache Commons Component  [ ] +0 OK, 
> but...
>  [ ] -0 OK, but really should fix...
>  [ ] -1 I oppose this because...
>
>Thank you!
>Benedikt

RE: Style checking related to getters

2016-02-29 Thread Zheng, Kai

Thanks Uma. HADOOP-12859 was created for this.

To correct, actually it's related to class setters.

Regards,
Kai

-Original Message-
From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] 
Sent: Tuesday, March 01, 2016 8:22 AM
To: common-dev@hadoop.apache.org
Subject: Re: Style checking related to getters

+1 for disabling them.

Regards,
Uma

On 2/29/16, 11:16 AM, "Andrew Wang" <andrew.w...@cloudera.com> wrote:

>Hi Kai,
>
>Could you file a JIRA and post patch to disable that checkstyle rule? 
>You can look at HADOOP-12713 for an example. Ping me and I'll review.
>
>Best,
>Andrew
>
>On Sun, Feb 28, 2016 at 11:28 PM, Zheng, Kai <kai.zh...@intel.com> wrote:
>
>> Hi,
>>
>> I'm wondering if we could get rid of the style checking in getters 
>> like the following (from HDFS-9733). It's annoying because it's a 
>> common Java practice and widely used in the project.
>>
>>
>> void setBlockLocations(LocatedBlocks blockLocations) {:42:
>> 'blockLocations' hides a field.
>>
>> void setTimeout(int timeout) {:25: 'timeout' hides a field.
>>
>> void setLocatedBlocks(List locatedBlocks) {:46:
>> 'locatedBlocks' hides a field.
>>
>> void setRemaining(long remaining) {:28: 'remaining' hides a field.
>>
>> void setBytesPerCRC(int bytesPerCRC) {:29: 'bytesPerCRC' hides a field.
>>
>> void setCrcType(DataChecksum.Type crcType) {:39: 'crcType' hides a 
>>field.
>>
>> void setCrcPerBlock(long crcPerBlock) {:30: 'crcPerBlock' hides a field.
>>
>> void setRefetchBlocks(boolean refetchBlocks) {:35: 'refetchBlocks'
>>hides a
>> field.
>>
>> void setLastRetriedIndex(int lastRetriedIndex) {:34: 'lastRetriedIndex'
>> hides a field.
>>
>> Regards,
>> Kai
>>

RE: Style checking related to getters

2016-02-29 Thread Zheng, Kai

Thanks Andrew for the confirm. Yes I will do that.

Regards,
Kai

-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Tuesday, March 01, 2016 3:16 AM
To: common-dev@hadoop.apache.org
Subject: Re: Style checking related to getters

Hi Kai,

Could you file a JIRA and post patch to disable that checkstyle rule? You can 
look at HADOOP-12713 for an example. Ping me and I'll review.

Best,
Andrew

On Sun, Feb 28, 2016 at 11:28 PM, Zheng, Kai <kai.zh...@intel.com> wrote:

> Hi,
>
> I'm wondering if we could get rid of the style checking in getters 
> like the following (from HDFS-9733). It's annoying because it's a 
> common Java practice and widely used in the project.
>
>
> void setBlockLocations(LocatedBlocks blockLocations) {:42:
> 'blockLocations' hides a field.
>
> void setTimeout(int timeout) {:25: 'timeout' hides a field.
>
> void setLocatedBlocks(List locatedBlocks) {:46:
> 'locatedBlocks' hides a field.
>
> void setRemaining(long remaining) {:28: 'remaining' hides a field.
>
> void setBytesPerCRC(int bytesPerCRC) {:29: 'bytesPerCRC' hides a field.
>
> void setCrcType(DataChecksum.Type crcType) {:39: 'crcType' hides a field.
>
> void setCrcPerBlock(long crcPerBlock) {:30: 'crcPerBlock' hides a field.
>
> void setRefetchBlocks(boolean refetchBlocks) {:35: 'refetchBlocks' 
> hides a field.
>
> void setLastRetriedIndex(int lastRetriedIndex) {:34: 'lastRetriedIndex'
> hides a field.
>
> Regards,
> Kai
>

Style checking related to getters

2016-02-28 Thread Zheng, Kai

Hi,

I'm wondering if we could get rid of the style checking in getters like the 
following (from HDFS-9733). It's annoying because it's a common Java practice 
and widely used in the project.


void setBlockLocations(LocatedBlocks blockLocations) {:42: 'blockLocations' 
hides a field.

void setTimeout(int timeout) {:25: 'timeout' hides a field.

void setLocatedBlocks(List locatedBlocks) {:46: 'locatedBlocks' 
hides a field.

void setRemaining(long remaining) {:28: 'remaining' hides a field.

void setBytesPerCRC(int bytesPerCRC) {:29: 'bytesPerCRC' hides a field.

void setCrcType(DataChecksum.Type crcType) {:39: 'crcType' hides a field.

void setCrcPerBlock(long crcPerBlock) {:30: 'crcPerBlock' hides a field.

void setRefetchBlocks(boolean refetchBlocks) {:35: 'refetchBlocks' hides a 
field.

void setLastRetriedIndex(int lastRetriedIndex) {:34: 'lastRetriedIndex' hides a 
field.

Regards,
Kai

RE: Introduce Apache Kerby to Hadoop

2016-02-27 Thread Zheng, Kai

Hi Haohui,

I'm glad to know GRPC and it sounds cool. I think it's a good proposal to 
suggest Hadoop IPC/RPC upgrading to GRPC. 

We haven't evaluated GRPC for the question of RPC encryption optimization 
because it's another story. It's not an overlap for the optimization work 
because even if we use GRPC, the RPC protocol messages still need to go through 
the stack of SASL/GSSAPI/Kerberos. What's desired here is not to re-implement 
any RPC layer, or the stack, but is to optimize the stack, by possibly 
implementing and plugin-ing new SASL or GSSAPI mechanism. Hope this clarifying 
helps. Thanks.

Regards,
Kai

-Original Message-
From: Haohui Mai [mailto:ricet...@gmail.com] 
Sent: Sunday, February 28, 2016 3:02 AM
To: common-dev@hadoop.apache.org
Subject: Re: Introduce Apache Kerby to Hadoop

Have we evaluated GRPC? A robust RPC requires significant effort. Migrating to 
GRPC can save ourselves a lot of headache.

Haohui
On Sat, Feb 27, 2016 at 1:35 AM Andrew Purtell <andrew.purt...@gmail.com>
wrote:

> I get a excited thinking about the prospect of better performance with 
> auth-conf QoP. HBase RPC is an increasingly distant fork but still 
> close enough to Hadoop in that respect. Our bulk data transfer 
> protocol isn't a separate thing like in HDFS, which avoids a SASL 
> wrapped implementation, so we really suffer when auth-conf is 
> negotiated. You'll see the same impact where there might be a high 
> frequency of NameNode RPC calls or similar still. Throughput drops 3-4x, or 
> worse.
>
> > On Feb 22, 2016, at 4:56 PM, Zheng, Kai <kai.zh...@intel.com> wrote:
> >
> > Thanks for the confirm and further inputs, Steve.
> >
> >>> the latter would dramatically reduce the cost of wire-encrypting IPC.
> > Yes to optimize Hadoop IPC/RPC encryption is another opportunity 
> > Kerby
> can help with, it's possible because we may hook Chimera or AES-NI 
> thing into the Kerberos layer by leveraging the Kerberos library. As 
> it may be noted, HADOOP-12725 is on the going for this aspect. There 
> may be good result and further update on this recently.
> >
> >>> For now, I'd like to see basic steps -upgrading minkdc to krypto, 
> >>> see
> how it works.
> > Yes, starting with this initial steps upgrading MiniKDC to use Kerby 
> > is
> the right thing we could do. After some interactions with Kerby 
> project, we may have more ideas how to proceed on the followings.
> >
> >>> Long term, I'd like Hadoop 3 to be Kerby-ized
> > This sounds great! With necessary support from the community like
> feedback and patch reviewing, we can speed up the related work.
> >
> > Regards,
> > Kai
> >
> > -Original Message-
> > From: Steve Loughran [mailto:ste...@hortonworks.com]
> > Sent: Monday, February 22, 2016 6:51 PM
> > To: common-dev@hadoop.apache.org
> > Subject: Re: Introduce Apache Kerby to Hadoop
> >
> >
> >
> > I've discussed this offline with Kai, as part of the "let's fix
> kerberos" project. Not only is it a better Kerberos engine, we can do 
> more diagnostics, get better algorithms and ultimately get better APIs 
> for doing Kerberos and SASL —the latter would dramatically reduce the 
> cost of wire-encrypting IPC.
> >
> > For now, I'd like to see basic steps -upgrading minkdc to krypto, 
> > see
> how it works.
> >
> > Long term, I'd like Hadoop 3 to be Kerby-ized
> >
> >
> >> On 22 Feb 2016, at 06:41, Zheng, Kai <kai.zh...@intel.com> wrote:
> >>
> >> Hi folks,
> >>
> >> I'd like to mention Apache Kerby [1] here to the community and 
> >> propose
> to introduce the project to Hadoop, a sub project of Apache Directory 
> project.
> >>
> >> Apache Kerby is a Kerberos centric project and aims to provide a 
> >> first
> Java Kerberos library that contains both client and server supports. 
> The relevant features include:
> >> It supports full Kerberos encryption types aligned with both MIT 
> >> KDC and MS AD; Client APIs to allow to login via password, 
> >> credential cache, keytab file and etc.; Utilities for generate, 
> >> operate and inspect keytab and credential cache files; A simple KDC 
> >> server that borrows some ideas from Hadoop-MiniKDC and can be used 
> >> in tests but with minimal overhead in external dependencies; A 
> >> brand new token
> mechanism is provided, can be experimentally used, using it a JWT 
> token can be used to exchange a TGT or service ticket; Anonymous 
> PKINIT support, can be experientially used, as the first Java library 
> that supports the Kerberos major ext

RE: Introduce Apache Kerby to Hadoop

2016-02-27 Thread Zheng, Kai

Thanks Andrew for the update on HBase side!

>> Throughput drops 3-4x, or worse.
Hopefully we can avoid much of the encryption overhead. We're prototyping a 
solution working on that.

Regards,
Kai

-Original Message-
From: Andrew Purtell [mailto:andrew.purt...@gmail.com] 
Sent: Saturday, February 27, 2016 5:35 PM
To: common-dev@hadoop.apache.org
Subject: Re: Introduce Apache Kerby to Hadoop

I get a excited thinking about the prospect of better performance with 
auth-conf QoP. HBase RPC is an increasingly distant fork but still close enough 
to Hadoop in that respect. Our bulk data transfer protocol isn't a separate 
thing like in HDFS, which avoids a SASL wrapped implementation, so we really 
suffer when auth-conf is negotiated. You'll see the same impact where there 
might be a high frequency of NameNode RPC calls or similar still. Throughput 
drops 3-4x, or worse. 

> On Feb 22, 2016, at 4:56 PM, Zheng, Kai <kai.zh...@intel.com> wrote:
> 
> Thanks for the confirm and further inputs, Steve. 
> 
>>> the latter would dramatically reduce the cost of wire-encrypting IPC.
> Yes to optimize Hadoop IPC/RPC encryption is another opportunity Kerby can 
> help with, it's possible because we may hook Chimera or AES-NI thing into the 
> Kerberos layer by leveraging the Kerberos library. As it may be noted, 
> HADOOP-12725 is on the going for this aspect. There may be good result and 
> further update on this recently.
> 
>>> For now, I'd like to see basic steps -upgrading minkdc to krypto, see how 
>>> it works.
> Yes, starting with this initial steps upgrading MiniKDC to use Kerby is the 
> right thing we could do. After some interactions with Kerby project, we may 
> have more ideas how to proceed on the followings.
> 
>>> Long term, I'd like Hadoop 3 to be Kerby-ized
> This sounds great! With necessary support from the community like feedback 
> and patch reviewing, we can speed up the related work.
> 
> Regards,
> Kai
> 
> -Original Message-
> From: Steve Loughran [mailto:ste...@hortonworks.com]
> Sent: Monday, February 22, 2016 6:51 PM
> To: common-dev@hadoop.apache.org
> Subject: Re: Introduce Apache Kerby to Hadoop
> 
> 
> 
> I've discussed this offline with Kai, as part of the "let's fix kerberos" 
> project. Not only is it a better Kerberos engine, we can do more diagnostics, 
> get better algorithms and ultimately get better APIs for doing Kerberos and 
> SASL —the latter would dramatically reduce the cost of wire-encrypting IPC.
> 
> For now, I'd like to see basic steps -upgrading minkdc to krypto, see how it 
> works.
> 
> Long term, I'd like Hadoop 3 to be Kerby-ized
> 
> 
>> On 22 Feb 2016, at 06:41, Zheng, Kai <kai.zh...@intel.com> wrote:
>> 
>> Hi folks,
>> 
>> I'd like to mention Apache Kerby [1] here to the community and propose to 
>> introduce the project to Hadoop, a sub project of Apache Directory project.
>> 
>> Apache Kerby is a Kerberos centric project and aims to provide a first Java 
>> Kerberos library that contains both client and server supports. The relevant 
>> features include:
>> It supports full Kerberos encryption types aligned with both MIT KDC 
>> and MS AD; Client APIs to allow to login via password, credential 
>> cache, keytab file and etc.; Utilities for generate, operate and 
>> inspect keytab and credential cache files; A simple KDC server that 
>> borrows some ideas from Hadoop-MiniKDC and can be used in tests but 
>> with minimal overhead in external dependencies; A brand new token mechanism 
>> is provided, can be experimentally used, using it a JWT token can be used to 
>> exchange a TGT or service ticket; Anonymous PKINIT support, can be 
>> experientially used, as the first Java library that supports the Kerberos 
>> major extension.
>> 
>> The project stands alone and is ensured to only depend on JRE for easier 
>> usage. It has made the first release (1.0.0-RC1) and 2nd release (RC2) is 
>> upcoming.
>> 
>> 
>> As an initial step, this proposal suggests using Apache Kerby to upgrade the 
>> existing codes related to ApacheDS for the Kerberos support. The 
>> advantageous:
>> 
>> 1. The kerby-kerb library is all the need, which is purely in Java, 
>> SLF4J is the only dependency, the whole is rather small;
>> 
>> 2. There is a SimpleKDC in the library for test usage, which borrowed 
>> the MiniKDC idea and implemented all the support existing in MiniKDC.
>> We had a POC that rewrote MiniKDC using Kerby SimpleKDC and it works 
>> fine;
>> 
>> 3. Full Kerberos encryption types (many of them are not available in 
>> JRE but supported

RE: Introduce Apache Kerby to Hadoop

2016-02-22 Thread Zheng, Kai

Thanks Larry for your thoughts and inputs.

>> Replacing MiniKDC with kerby certainly makes sense.
Thanks.

>> Kerby-izing Hadoop 3 needs to be defined carefully.
Fully agree. We're still working to make the relevant Kerberos support come to 
the ideal state, either in Kerby project or outside of it. When appropriate and 
sounds good, we can think about what's next steps, come up design and discuss 
this then. Maybe we can discuss about these inputs separately after the initial 
things done?

Regards,
Kai

-Original Message-
From: larry mccay [mailto:lmc...@apache.org] 
Sent: Monday, February 22, 2016 9:05 PM
To: common-dev@hadoop.apache.org
Subject: Re: Introduce Apache Kerby to Hadoop

Replacing MiniKDC with kerby certainly makes sense.

Kerby-izing Hadoop 3 needs to be defined carefully.
As much as a JWT proponent that I am, I don't know that that taking up 
non-standard features such as the JWT token would necessarily serve us well.
If we are talking about client side only uptake in Hadoop 3 as a better 
diagnosable client library that completely makes sense.

Better algorithms and APIs would require server side compliance as well - no?
These decisions would need to align deployment usecases that want to go 
directly to AD/MIT.
Perhaps, it just means careful configuration of algorithms to match the server 
side in those cases.

+1 on the baby step of replacing MiniKDC - as this is really just 
+alignment
with the directory project roadmap anyway.

On Mon, Feb 22, 2016 at 5:51 AM, Steve Loughran <ste...@hortonworks.com>
wrote:

>
>
> I've discussed this offline with Kai, as part of the "let's fix kerberos"
> project. Not only is it a better Kerberos engine, we can do more 
> diagnostics, get better algorithms and ultimately get better APIs for 
> doing Kerberos and SASL —the latter would dramatically reduce the cost 
> of wire-encrypting IPC.
>
> For now, I'd like to see basic steps -upgrading minkdc to krypto, see 
> how it works.
>
> Long term, I'd like Hadoop 3 to be Kerby-ized
>
>
> > On 22 Feb 2016, at 06:41, Zheng, Kai <kai.zh...@intel.com> wrote:
> >
> > Hi folks,
> >
> > I'd like to mention Apache Kerby [1] here to the community and 
> > propose
> to introduce the project to Hadoop, a sub project of Apache Directory 
> project.
> >
> > Apache Kerby is a Kerberos centric project and aims to provide a 
> > first
> Java Kerberos library that contains both client and server supports. 
> The relevant features include:
> > It supports full Kerberos encryption types aligned with both MIT KDC 
> > and
> MS AD;
> > Client APIs to allow to login via password, credential cache, keytab
> file and etc.;
> > Utilities for generate, operate and inspect keytab and credential 
> > cache
> files;
> > A simple KDC server that borrows some ideas from Hadoop-MiniKDC and 
> > can
> be used in tests but with minimal overhead in external dependencies;
> > A brand new token mechanism is provided, can be experimentally used,
> using it a JWT token can be used to exchange a TGT or service ticket;
> > Anonymous PKINIT support, can be experientially used, as the first 
> > Java
> library that supports the Kerberos major extension.
> >
> > The project stands alone and is ensured to only depend on JRE for 
> > easier
> usage. It has made the first release (1.0.0-RC1) and 2nd release (RC2) 
> is upcoming.
> >
> >
> > As an initial step, this proposal suggests using Apache Kerby to 
> > upgrade
> the existing codes related to ApacheDS for the Kerberos support. The
> advantageous:
> >
> > 1. The kerby-kerb library is all the need, which is purely in Java,
> SLF4J is the only dependency, the whole is rather small;
> >
> > 2. There is a SimpleKDC in the library for test usage, which 
> > borrowed
> the MiniKDC idea and implemented all the support existing in MiniKDC. 
> We had a POC that rewrote MiniKDC using Kerby SimpleKDC and it works 
> fine;
> >
> > 3. Full Kerberos encryption types (many of them are not available in 
> > JRE
> but supported by major Kerberos vendors) and more functionalities like 
> credential cache support;
> >
> > 4. Perhaps the most concerned, Hadoop MiniKDC and etc. depend on the 
> > old
> Kerberos implementation in Directory Server project, but the 
> implementation is stopped being maintained. Directory project has a 
> plan to replace the implementation using Kerby. MiniKDC can use Kerby 
> directly to simplify the deps;
> >
> > 5. Extensively tested with all kinds of unit tests, already being 
> > used
> for some time (like PSU), even in production environment;
> >
> > 6. Actively developed, an

Introduce Apache Kerby to Hadoop

2016-02-21 Thread Zheng, Kai

Hi folks,

I'd like to mention Apache Kerby [1] here to the community and propose to 
introduce the project to Hadoop, a sub project of Apache Directory project.

Apache Kerby is a Kerberos centric project and aims to provide a first Java 
Kerberos library that contains both client and server supports. The relevant 
features include:
It supports full Kerberos encryption types aligned with both MIT KDC and MS AD;
Client APIs to allow to login via password, credential cache, keytab file and 
etc.;
Utilities for generate, operate and inspect keytab and credential cache files;
A simple KDC server that borrows some ideas from Hadoop-MiniKDC and can be used 
in tests but with minimal overhead in external dependencies;
A brand new token mechanism is provided, can be experimentally used, using it a 
JWT token can be used to exchange a TGT or service ticket;
Anonymous PKINIT support, can be experientially used, as the first Java library 
that supports the Kerberos major extension.

The project stands alone and is ensured to only depend on JRE for easier usage. 
It has made the first release (1.0.0-RC1) and 2nd release (RC2) is upcoming.


As an initial step, this proposal suggests using Apache Kerby to upgrade the 
existing codes related to ApacheDS for the Kerberos support. The advantageous:

1. The kerby-kerb library is all the need, which is purely in Java, SLF4J is 
the only dependency, the whole is rather small;

2. There is a SimpleKDC in the library for test usage, which borrowed the 
MiniKDC idea and implemented all the support existing in MiniKDC. We had a POC 
that rewrote MiniKDC using Kerby SimpleKDC and it works fine;

3. Full Kerberos encryption types (many of them are not available in JRE but 
supported by major Kerberos vendors) and more functionalities like credential 
cache support;

4. Perhaps the most concerned, Hadoop MiniKDC and etc. depend on the old 
Kerberos implementation in Directory Server project, but the implementation is 
stopped being maintained. Directory project has a plan to replace the 
implementation using Kerby. MiniKDC can use Kerby directly to simplify the deps;

5. Extensively tested with all kinds of unit tests, already being used for some 
time (like PSU), even in production environment;

6. Actively developed, and can be fixed and released in time if necessary, 
separately and independently from other components in Apache Directory project. 
By actively developing Apache Kerby and now applying it to Hadoop, our side 
wish to make the Kerberos deploying, troubleshooting and further enhancement 
can  be much easier and thereafter possible.



Wish this is a good beginning, and eventually Apache Kerby can benefit other 
projects in the ecosystem as well.



This Kerberos related work is actually a long time effort led by Weihua Jiang 
in Intel, and had been kindly encouraged by Andrew Purtell, Steve Loughran, 
Gangumalla Uma, Andrew Wang and etc., thanks a lot for their great discussions 
and inputs in the past.



Your feedback is very welcome. Thanks in advance.



[1] https://github.com/apache/directory-kerby



Regards,

Kai

RE: Looking to a Hadoop 3 release

2016-02-18 Thread Zheng, Kai

Thanks Andrew for driving this. Wonder if it's a good chance for HADOOP-12579 
(Deprecate and remove WriteableRPCEngine) to be in. Note it's not an 
incompatible change, but feel better to be done in the major release.

Regards,
Kai

-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Friday, February 19, 2016 7:04 AM
To: hdfs-...@hadoop.apache.org; Kihwal Lee 
Cc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

Hi Kihwal,

I think there's still value in continuing the 2.x releases. 3.x comes with the 
incompatible bump to a JDK8 runtime, and also the fact that 3.x won't be beta 
or GA for some number of months. In the meanwhile, it'd be good to keep putting 
out regular, stable 2.x releases.

Best,
Andrew


On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee 
wrote:

> Moving Hadoop 3 forward sounds fine. If EC is one of the main 
> motivations, are we getting rid of branch-2.8?
>
> Kihwal
>
>   From: Andrew Wang 
>  To: "common-dev@hadoop.apache.org" 
> Cc: "yarn-...@hadoop.apache.org" ; "
> mapreduce-...@hadoop.apache.org" ;
> hdfs-dev 
>  Sent: Thursday, February 18, 2016 4:35 PM
>  Subject: Re: Looking to a Hadoop 3 release
>
> Hi all,
>
> Reviving this thread. I've seen renewed interest in a trunk release 
> since HDFS erasure coding has not yet made it to branch-2. Along with 
> JDK8, the shell script rewrite, and many other improvements, I think 
> it's time to revisit Hadoop 3.0 release plans.
>
> My overall plan is still the same as in my original email: a series of 
> regular alpha releases leading up to beta and GA. Alpha releases make 
> it easier for downstreams to integrate with our code, and making them 
> regular means features can be included when they are ready.
>
> I know there are some incompatible changes waiting in the wings (i.e. 
> HDFS-6984 making FileStatus a PB rather than Writable, some of
> HADOOP-9991 bumping dependency versions) that would be good to get in. 
> If you have changes like this, please set the target version to 3.0.0 
> and mark them "Incompatible". We can use this JIRA query to track:
>
>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%
> 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado
> op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
>
> There's some release-related stuff that needs to be sorted out 
> (namely, the new CHANGES.txt and release note generation from Yetus), 
> but I'd tentatively like to roll the first alpha a month out, so third 
> week of March.
>
> Best,
> Andrew
>
> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata  wrote:
>
> > Avoiding the use of JDK8 language features (and, presumably, APIs) 
> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK 
> > source version to JDK8.
> >
> > Also, note that releasing from trunk is a way of achieving #3, it's 
> > not a way of abandoning it.
> >
> >
> >
> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang 
> > 
> > wrote:
> > > Hi Raymie,
> > >
> > > Konst proposed just releasing off of trunk rather than cutting a
> > branch-2,
> > > and there was general agreement there. So, consider #3 abandoned. 
> > > 1&2
> can
> > > be achieved at the same time, we just need to avoid using JDK8 
> > > language features in trunk so things can be backported.
> > >
> > > Best,
> > > Andrew
> > >
> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata 
> > > 
> > wrote:
> > >
> > >> In this (and the related threads), I see the following three
> > requirements:
> > >>
> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > >>
> > >> 2. "We'll still be releasing 2.x releases for a while, with 
> > >> similar feature sets as 3.x."
> > >>
> > >> 3. Avoid the "risk of split-brain behavior" by "minimize 
> > >> backporting headaches. Pulling trunk > branch-2 > branch-2.x is already 
> > >> tedious.
> > >> Adding a branch-3, branch-3.x would be obnoxious."
> > >>
> > >> These three cannot be achieved at the same time.  Which do we abandon?
> > >>
> > >>
> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia 
> > >> 
> > >> wrote:
> > >> >
> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth 
> wrote:
> > >> >>
> > >> >> 2) Simplification of configs - potentially separating client 
> > >> >> side
> > >> configs
> > >> >> and those used by daemons. This is another source of perpetual
> > confusion
> > >> >> for users.
> > >> > + 1 on this.
> > >> >
> > >> > sanjay
> > >>
> >
>
>
>

Some questions about DistCp

2016-01-07 Thread Zheng, Kai

Hi,

I recently did some investigation about DistCp and have some questions. I 
thought before diving into JIRA things it would be good to discuss them first 
here.

I read the doc at the following link and regard it as the latest revision that 
corresponds with the trunk codebase.
http://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html
If that's right, then we may need to complement it with the following important 
features because I don't see they are mentioned in the doc.

1.   -diff option, use snapshot diff report to identify the differences 
between source and target to compute the copying list.

2.   -numListstatusThreads option, number of threads to concurrently 
compute the copying list.

3.   -p t, to preserve timestamps.
As above features are great things for user to use in order to speed up the 
time consuming inter or intra cluster sync, not only to add these options in 
the table of command line options, but also better to document them well as we 
did for other functions.

A main use case is that performing copy from source HDFS cluster to target HDFS 
cluster. It was mentioned each NodeManager can reach and communicate with both 
the source and destination file systems. In this case where is recommended to 
run the DistCp command, in the source cluster or target? Might be better to run 
it in the source side so copy mappers can read locally via short circuit (but 
would then write remotely)? Any consideration in this aspect?

In above case (both source and target are HDFS cluster), there was a 
consideration for replicated files that, if the block size and checksum opt are 
not reserved (via -pb), then after copy is done we may skip the file checksums 
comparing and avoid the checksum computing, because in such situation, since 
block size and checksum type may differ, then the file checksums surely differ. 
Sure, in most time source and target clusters may use the same setting, so even 
not preserved, I guess the block size and checksum type may still be the same 
particularly by default values.  So more safely, maybe we can improve this as, 
compare the block size and checksum opt first, if they're the same, then 
compare the file checksums, otherwise not. Makes sense? Note this is partly 
raised in HDFS-9613.

For striped files, we'll need to update the command as well, and probably 
handle it specially. This is currently under discussion in HDFS-8430.

Thanks for the discussion.

Regards,
Kai

RE: Question about subtype exceptions in the thrown list in addition to a more general one

2015-12-30 Thread Zheng, Kai

Thanks Chris for the thoughts and details. I agree it's not easy to change 
those IOException subclasses even not relevant to I/O. 

Regards,
Kai

-Original Message-
From: Chris Nauroth [mailto:cnaur...@hortonworks.com] 
Sent: Wednesday, December 30, 2015 2:52 AM
To: common-dev@hadoop.apache.org
Subject: Re: Question about subtype exceptions in the thrown list in addition 
to a more general one

Hello Kai,

I'm not aware of a specific coding standard we have on this topic, and I don't 
have a strong opinion either way.  I think it can be valuable sometimes for 
public APIs to document each subclass in a separate @throws in the JavaDocs if 
we expect the caller might want to handle each case differently.  As you said, 
this is less relevant for the actual throws clause in the code.

Regarding IOException, there is a great tendency in the Hadoop codebase to 
subclass IOException, even if the failure is not obviously related to I/O.
 This is partly a consequence of the way error handling is implemented in the 
RPC framework.  The RemoteException class is tightly coupled to IOException for 
its "unwrap" logic to pull out the root cause of the exception.  As a result, 
any exception that needs to cross a process boundary over RPC generally ends up 
needing to subclass IOException.  I don't think this is something that can be 
changed easily.

--Chris Nauroth

On 12/28/15, 7:46 PM, "Zheng, Kai" <kai.zh...@intel.com> wrote:

>
>Hi,
>
>Would it be good to add to throw a subtype exception in addition to a 
>more general exception already there in the thrown list? Is it some 
>coding style that's required to follow or developers can do it as they 
>like?
>
>It's often seen that only the general exception is in the thrown list, 
>like in ReadableByteChannel#read in Oracle Java, where in fact the 
>method may throw 4 subtype exceptions.
>http://docs.oracle.com/javase/7/docs/api/java/nio/channels/ReadableByte
>Cha
>nnel.html
>int read(ByteBuffer dst) throws IOException
>
>In some of Hadoop codes, it goes otherwise. For example, in Hdfs.java, 
>ref. the following codes, note that in the thrown list, all the former 
>3 exceptions extends IOException.
>===
>  @Override
>  public RemoteIterator listStatusIterator(final Path f)
>throws AccessControlException, FileNotFoundException,
>UnresolvedLinkException, IOException {
>return new DirListingIterator(f, false) {
>
>  @Override
>  public FileStatus next() throws IOException {
>return getNext().makeQualified(getUri(), f);
>  }
>};
>  }
>===
>
>Doing this way, I thought the benefit could be the caller can see 
>clearly what kinds of exceptions could be thrown, however in this case 
>we can achieve it by adding the Javadoc. Sure there is no hurt but it 
>looks kinds of dummy in some IDE, hinting some information like "There 
>is a more general exception ... in the thrown list already". Given we 
>would list all the possible exceptions, it's a little hard to maintain 
>the codes.
>
>By the way, in the above example, it looks a little weird that 
>AccessControlException and UnresolvedLinkException both extend 
>IOException. There is a little reason to do that for the latter, but 
>for the former, it looks rather like an issue.
>
>Please help clarify if I missed something. Thanks.
>
>Regards,
>Kai

Question about subtype exceptions in the thrown list in addition to a more general one

2015-12-28 Thread Zheng, Kai


Hi,

Would it be good to add to throw a subtype exception in addition to a more 
general exception already there in the thrown list? Is it some coding style 
that's required to follow or developers can do it as they like?

It's often seen that only the general exception is in the thrown list, like in 
ReadableByteChannel#read in Oracle Java, where in fact the method may throw 4 
subtype exceptions.
http://docs.oracle.com/javase/7/docs/api/java/nio/channels/ReadableByteChannel.html
int read(ByteBuffer dst) throws IOException

In some of Hadoop codes, it goes otherwise. For example, in Hdfs.java, ref. the 
following codes, note that in the thrown list, all the former 3 exceptions 
extends IOException.
===
  @Override
  public RemoteIterator listStatusIterator(final Path f)
throws AccessControlException, FileNotFoundException,
UnresolvedLinkException, IOException {
return new DirListingIterator(f, false) {

  @Override
  public FileStatus next() throws IOException {
return getNext().makeQualified(getUri(), f);
  }
};
  }
===

Doing this way, I thought the benefit could be the caller can see clearly what 
kinds of exceptions could be thrown, however in this case we can achieve it by 
adding the Javadoc. Sure there is no hurt but it looks kinds of dummy in some 
IDE, hinting some information like "There is a more general exception ... in 
the thrown list already". Given we would list all the possible exceptions, it's 
a little hard to maintain the codes.

By the way, in the above example, it looks a little weird that 
AccessControlException and UnresolvedLinkException both extend IOException. 
There is a little reason to do that for the latter, but for the former, it 
looks rather like an issue.

Please help clarify if I missed something. Thanks.

Regards,
Kai

RE: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk

2015-09-22 Thread Zheng, Kai

Non-binding +1

According to our extensive performance tests, striping + ISA-L coder based 
erasure coding not only can save storage, but also can increase the throughput 
of a client or a cluster. It will be a great addition to HDFS and its users. 
Based on the latest branch codes, we also observed it's very reliable in the 
concurrent tests. We'll provide the perf test report after it's sorted out and 
hope it helps. 
Thanks!

Regards,
Kai

-Original Message-
From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] 
Sent: Wednesday, September 23, 2015 8:50 AM
To: hdfs-...@hadoop.apache.org; common-dev@hadoop.apache.org
Subject: Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk

+1

Great addition to HDFS. Thanks all contributors for the nice work.

Regards,
Uma

On 9/22/15, 3:40 PM, "Zhe Zhang"  wrote:

>Hi,
>
>I'd like to propose a vote to merge the HDFS-7285 feature branch back 
>to trunk. Since November 2014 we have been designing and developing 
>this feature under the umbrella JIRAs HDFS-7285 and HADOOP-11264, and 
>have committed approximately 210 patches.
>
>The HDFS-7285 feature branch was created to support the first phase of 
>HDFS erasure coding (HDFS-EC). The objective of HDFS-EC is to 
>significantly reduce storage space usage in HDFS clusters. Instead of 
>always creating 3 replicas of each block with 200% storage space 
>overhead, HDFS-EC provides data durability through parity data blocks. 
>With most EC configurations, the storage overhead is no more than 50%. 
>Based on profiling results of production clusters, we decided to 
>support EC with the striped block layout in the first phase, so that 
>small files can be better handled. This means dividing each logical 
>HDFS file block into smaller units (striping cells) and spreading them 
>on a set of DataNodes in round-robin fashion. Parity cells are 
>generated for each stripe of original data cells. We have made changes 
>to NameNode, client, and DataNode to generalize the block concept and 
>handle the mapping between a logical file block and its internal 
>storage blocks. For further details please see the design doc on 
>HDFS-7285.
>HADOOP-11264 focuses on providing flexible and high-performance codec 
>calculation support.
>
>The nightly Jenkins job of the branch has reported several successful 
>runs, and doesn't show new flaky tests compared with trunk. We have 
>posted several versions of the test plan including both unit testing 
>and cluster testing, and have executed most tests in the plan. The most 
>basic functionalities have been extensively tested and verified in 
>several real clusters with different hardware configurations; results 
>have been very stable. We have created follow-on tasks for more 
>advanced error handling and optimization under the umbrella HDFS-8031. 
>We also plan to implement or harden the integration of EC with existing 
>features such as WebHDFS, snapshot, append, truncate, hflush, hsync, 
>and so forth.
>
>Development of this feature has been a collaboration across many 
>companies and institutions. I'd like to thank J. Andreina, Takanobu 
>Asanuma, Vinayakumar B, Li Bo, Takuya Fukudome, Uma Maheswara Rao G, 
>Rui Li, Yi Liu, Colin McCabe, Xinwei Qin, Rakesh R, Gao Rui, Kai 
>Sasaki, Walter Su, Tsz Wo Nicholas Sze, Andrew Wang, Yong Zhang, Jing 
>Zhao, Hui Zheng and Kai Zheng for their code contributions and reviews. 
>Andrew and Kai Zheng also made fundamental contributions to the initial 
>design. Rui Li, Gao Rui, Kai Sasaki, Kai Zheng and many other 
>contributors have made great efforts in system testing. Many thanks go 
>to Weihua Jiang for proposing the JIRA, and ATM, Todd Lipcon, Silvius 
>Rus, Suresh, as well as many others for providing helpful feedbacks.
>
>Following the community convention, this vote will last for 7 days 
>(ending September 29th). Votes from Hadoop committers are binding but 
>non-binding votes are very welcome as well. And here's my non-binding +1.
>
>Thanks,
>---
>Zhe Zhang

RE: Jira down :(

2015-06-05 Thread Zheng, Kai

It's OK now.

Regards,
Kai

-Original Message-
From: Vinayakumar B [mailto:vinayakum...@apache.org] 
Sent: Friday, June 05, 2015 3:12 PM
To: common-dev@hadoop.apache.org
Subject: Re: Jira down :(

Thanks Tsuyoshi.

Regards,
Vinay

On Fri, Jun 5, 2015 at 12:38 PM, Tsuyoshi Ozawa oz...@apache.org wrote:

 Hi Vinay,

 status.apache.org told us that JIRA is down.

 http://status.apache.org/

 I've sent an email to infra team in a few minutes ago.

 Regards,
 - Tsuyoshi

 On Fri, Jun 5, 2015 at 3:08 PM, Vinayakumar B vinayakum...@apache.org
 wrote:
  I am getting 502 error from Jira.

  Does anybody know whom to contact  to resolve it?

  Regards,
  Vinay

RE: IMPORTANT: testing patches for branches

2015-04-29 Thread Zheng, Kai

Thanks Allen for the great work. I tried in HADOOP-11847 (branch HDFS-7285) and 
it went well, very helpfully!

Regards,
Kai

-Original Message-
From: Allen Wittenauer [mailto:a...@altiscale.com] 
Sent: Thursday, April 23, 2015 7:22 PM
To: common-dev@hadoop.apache.org
Cc: hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: IMPORTANT: testing patches for branches

On Apr 22, 2015, at 11:34 PM, Zheng, Kai kai.zh...@intel.com wrote:

 Hi Allen,

 This sounds great. 

 Naming a patch foo-HDFS-7285.00.patch should get tested on the HDFS-7285 
 branch.
 Does it happen locally in developer's machine when running test-patch.sh, or 
 also mean something in Hadoop Jenkins building when a JIRA becoming patch 
 available? Thanks.

Both, now that a fix has been committed last night (there was a bug in 
the Jenkins handling).

Given a patch name or URL, Jenkins and even running locally will try a 
few different methods to figure out which branch to use  out.  Note that a 
branch name of 'gitX' where X is a valid git reference also works to force a 
patch to start at a particular commit. 

For local use, you'll want to use a 'spare' copy of the source tree via 
the -basedir option and use the -resetrepo flag.  That will enable Jenkins-like 
behavior and gives it permission to make modifications and effectively nuke any 
changes in the source tree you point it at.  (Basically the opposite of the 
-dirty-workspace flag).  If you want to force a branch (for whatever reason, 
including where the branch can't be figured out), you can use the -branch 
option. 

If you don't use -resetrepo, test-patch.sh will warn that it thinks the 
wrong branch is being used but will push on anyway.

In any case, the result of what it thinks the branch is/should be will 
be in the summary output at the bottom along with the git ref that it 
specifically used for the test.

RE: IMPORTANT: testing patches for branches

2015-04-22 Thread Zheng, Kai

Hi Allen,

This sounds great. 

 Naming a patch foo-HDFS-7285.00.patch should get tested on the HDFS-7285 
 branch.
Does it happen locally in developer's machine when running test-patch.sh, or 
also mean something in Hadoop Jenkins building when a JIRA becoming patch 
available? Thanks.

Regards,
Kai

-Original Message-
From: Allen Wittenauer [mailto:a...@altiscale.com] 
Sent: Thursday, April 23, 2015 3:35 AM
To: common-dev@hadoop.apache.org
Cc: yarn-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org
Subject: IMPORTANT: testing patches for branches


Hey gang, 

Just so everyone is aware, if you are working on a patch for either a 
feature branch or a major branch, if you name the patch with the branch name 
following the spec in HowToContribute (and a few other ways... test-patch tries 
to figure it out!), test-patch.sh *should* be switching the repo over to that 
branch for testing. 

For example,  naming a patch foo-branch-2.01.patch should get tested on 
branch-2.  Naming a patch foo-HDFS-7285.00.patch should get tested on the 
HDFS-7285 branch.

This hopefully means that there should really be no more 'blind' +1's 
to patches that go to branches.  The we only test against trunk argument is 
no longer valid. :)

RE: [RFE] Support MIT Kerberos localauth plugin API

2015-03-05 Thread Zheng, Kai

Hello Leo/Liou

 And the plugin interface can be as simple as this function (error handling 
 ignored here) ...
I thought it's good to have the pluggable allowing to customize the method how 
to perform the mapping. 
You could open a JIRA for this. If you'd like to work on it and need help, 
please feel free to ask (me or the community), or discuss in the JIRA, as the 
community do.

With the pluggable interface, you could provide a native implementation 
leveraging the MIT localauth plugin via JNI, just as it's done for user groups 
mapping provider.

If you're looking for something pure in Java, as Allen said, the localauth 
plugin support isn't available in JRE as Java would not be so quick to catch up 
with latest Kerberos features.
One possibility would be to leverage Apache Kerby, you can fire an issue 
request there and let's see how it works out then.
https://issues.apache.org/jira/browse/DIRKRB-102

Regards,
Kai

-Original Message-
From: Sunny Cheung [mailto:sunny.che...@centrify.com] 
Sent: Thursday, March 05, 2015 3:42 PM
To: common-dev@hadoop.apache.org
Cc: Leo Liou
Subject: RE: [RFE] Support MIT Kerberos localauth plugin API

Sorry I was not clear enough about the problem. Let me explain more here.

Our problem is that normal user principal names can be very different from 
their Unix login. Some customers simply have arbitrary mapping between their 
Kerberos principals and Unix user accounts. For example, one customer has over 
200K users on AD with Kerberos principals in format first name.last 
name@REALM (e.g. john@example.com). But their Unix names are in format 
userID or just ID (e.g. user123456, 123456).  

So, when Kerberos security is enabled on Hadoop clusters, how should we 
configure to authenticate these users from Hadoop clients?

The current way is to use the hadoop.security.auth_to_local setting, e.g. from 
core-site.xml:

property
namehadoop.security.auth_to_local/name
value
RULE:[2:$1@$0]([jt]t@.*EXAMPLE.COM)s/.*/mapred/
RULE:[2:$1@$0]([nd]n@.*EXAMPLE.COM)s/.*/hdfs/
RULE:[2:$1@$0](hm@.*EXAMPLE.COM)s/.*/hbase/
RULE:[2:$1@$0](rs@.*EXAMPLE.COM)s/.*/hbase/
DEFAULT/value 
   descriptionThe mapping from kerberos principal names
to local OS user names./description /property

These name translation rules can handle cases like mapping service accounts' 
principals (e.g. nn/host@REALM or dn/host@REALM to hdfs). But that is not 
scalable for normal users. There are just too many users to handle (as compared 
to the finite amount of service accounts).

Therefore, we would like to ask if alternative name resolution plugin interface 
can be supported by Hadoop. It could be similar to the way alternative 
authentication plugin is supported for HTTP web-consoles [1]:

property
namehadoop.http.authentication.type/name
valueorg.my.subclass.of.AltKerberosAuthenticationHandler/value
/property

And the plugin interface can be as simple as this function (error handling 
ignored here):

String auth_to_local (String krb5Principal) {
...
return unixName;
}

If this plugin interface is supported by Hadoop, then everyone can provide a 
plugin to support arbitrary mapping. This will be extremely useful when 
administrators need to tighten security on Hadoop with existing Kerberos 
infrastructure.

References:
[1] Authentication for Hadoop HTTP web-consoles 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/HttpAuthentication.html


-Original Message-
From: Allen Wittenauer [mailto:a...@altiscale.com]
Sent: Tuesday, February 24, 2015 12:47 AM
To: common-dev@hadoop.apache.org
Subject: Re: [RFE] Support MIT Kerberos localauth plugin API


The big question is whether or not Java's implementation of Kerberos 
supports it. If so, which JDK release.  Java's implementation tends to run a 
bit behind MIT.  Additionally, there is a general reluctance to move Hadoop's 
baseline Java version to something even supported until user outcry demands it. 
 So I'd expect support to be a long way off.

It's worth noting that trunk exposes the hadoop kerbname command to 
help out with auth_to_local mapping, BTW.

On Feb 23, 2015, at 2:12 AM, Sunny Cheung sunny.che...@centrify.com wrote:

 Hi Hadoop Common developers,
 
 I am writing to seek your opinion about a feature request: support MIT 
 Kerberos localauth plugin API [1].
 
 Hadoop currently provides the hadoop.security.auth_to_local setting to map 
 Kerberos principal to OS user account [2][3]. However, the regex-based 
 mappings (which mimics krb5.conf auth_to_local) could be difficult to use in 
 complex scenarios. Therefore, MIT Kerberos 1.12 added a plugin interface to 
 control krb5_aname_to_localname and krb5_kuserok behavior. And system daemon 
 SSSD (RHEL/Fedora) has already implemented a plugin to leverage this feature 
 [4].
 
 Is that possible for Hadoop to

RE: 2.7 status

2015-03-04 Thread Zheng, Kai

Thanks Vinod for the hints. 

I have updated the both patches aligning with latest codes, and added more unit 
tests. The building results look reasonable. Thanks anyone that would give them 
more review and I would update in timely manner. 

Regards,
Kai

-Original Message-
From: Vinod Kumar Vavilapalli [mailto:vino...@hortonworks.com] 
Sent: Tuesday, March 03, 2015 11:31 AM
To: Zheng, Kai
Cc: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; Hadoop Common; 
yarn-...@hadoop.apache.org
Subject: Re: 2.7 status

Kai, please ping the reviewers that were already looking at your patches 
before. If the patches go in by end of this week, we can include them.

Thanks,
+Vinod

On Mar 2, 2015, at 7:04 PM, Zheng, Kai kai.zh...@intel.com wrote:

 Is it interested to get the following issues in the release ? Thanks !
 
 HADOOP-10670
 HADOOP-10671
 
 Regards,
 Kai
 
 -Original Message-
 From: Yongjun Zhang [mailto:yzh...@cloudera.com]
 Sent: Monday, March 02, 2015 4:46 AM
 To: hdfs-...@hadoop.apache.org
 Cc: Vinod Kumar Vavilapalli; Hadoop Common; 
 mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
 Subject: Re: 2.7 status
 
 Hi,
 
 Thanks for working on 2.7 release.
 
 Currently the fallback from KerberosAuthenticator to PseudoAuthenticator  is 
 enabled by default in a hardcoded way. HAOOP-10895 changes the default and 
 requires applications (such as oozie) to set a config property or call an API 
 to enable the fallback.
 
 This jira has been reviewed, and almost ready to get in. However, there is 
 a concern that we have to change the relevant applications. Please see my 
 comment here:
 
 https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14
 321823page=com.atlassian.jira.plugin.system.issuetabpanels:comment-ta
 bpanel#comment-14321823
 
 Any of your comments will be highly appreciated. This jira was postponed from 
 2.6. I think it should be no problem to skip 2.7. But your comments would 
 help us to decide what to do with this jira for future releases.
 
 Thanks.
 
 --Yongjun
 
 
 On Sun, Mar 1, 2015 at 11:58 AM, Arun Murthy a...@hortonworks.com wrote:
 
 Sounds good, thanks for the help Vinod!
 
 Arun
 
 
 From: Vinod Kumar Vavilapalli
 Sent: Sunday, March 01, 2015 11:43 AM
 To: Hadoop Common; Jason Lowe; Arun Murthy
 Subject: Re: 2.7 status
 
 Agreed. How about we roll an RC end of this week? As a Java 7+ 
 release with features, patches that already got in?
 
 Here's a filter tracking blocker tickets - 
 https://issues.apache.org/jira/issues/?filter=12330598. Nine open now.
 
 +Arun
 Arun, I'd like to help get 2.7 out without further delay. Do you mind 
 me taking over release duties?
 
 Thanks,
 +Vinod
 
 From: Jason Lowe jl...@yahoo-inc.com.INVALID
 Sent: Friday, February 13, 2015 8:11 AM
 To: common-dev@hadoop.apache.org
 Subject: Re: 2.7 status
 
 I'd like to see a 2.7 release sooner than later.  It has been almost 
 3 months since Hadoop 2.6 was released, and there have already been 
 634 JIRAs committed to 2.7.  That's a lot of changes waiting for an official 
 release.
 
 https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2
 C 
 hdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resolut
 i
 on%3DFixed
 Jason
 
  From: Sangjin Lee sj...@apache.org
 To: common-dev@hadoop.apache.org common-dev@hadoop.apache.org
 Sent: Tuesday, February 10, 2015 1:30 PM
 Subject: 2.7 status
 
 Folks,
 
 What is the current status of the 2.7 release? I know initially it 
 started out as a java-7 only release, but looking at the JIRAs that 
 is very much not the case.
 
 Do we have a certain timeframe for 2.7 or is it time to discuss it?
 
 Thanks,
 Sangjin

RE: Looking to a Hadoop 3 release

2015-03-04 Thread Zheng, Kai

Might I have some comments for this, just providing my thought. Thanks.

 If we start now, it might make it out by 2016. If we start now, 
 downstreamers can start aligning themselves to land versions that suit at 
 about the same time.
Not only for down streamers to align with the long term release, but also for 
contributors like me to align with their future effort, maybe.

In addition to the JDK8 support and classpath isolation, might we add more 
possible candidate considerations. 
How would you like this one, HADOOP-9797 Pluggable and compatible UGI change ?
https://issues.apache.org/jira/browse/HADOOP-9797

The benefits: 
1) allow multiple login sessions/contexts and authentication methods to be used 
in the same Java application/process without conflicts, providing good 
isolation by getting rid of globals and statics.
2) allow to pluggable new authentication methods for UGI, in modular, 
manageable and maintainable manner.

Another, we would also push the first release of Apache Kerby, preparing for a 
strong dedicated and clean Kerberos library in Java for both client and KDC 
sides, and by leveraging the library, 
update Hadoop-MiniKDC and perform more security tests.
https://issues.apache.org/jira/browse/DIRKRB-102

Hope this makes sense. Thanks.

Regards,
Kai

-Original Message-
From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack
Sent: Thursday, March 05, 2015 2:47 AM
To: common-dev@hadoop.apache.org
Cc: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

In general +1 on 3.0.0. Its time. If we start now, it might make it out by 
2016. If we start now, downstreamers can start aligning themselves to land 
versions that suit at about the same time.

While two big items have been called out as possible incompatible changes, and 
there is ongoing discussion as to whether they are or not*, is there any chance 
of getting a longer list of big differences between the branches? In particular 
I'd be interested in improvements that are 'off' by default that would be 
better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept 
seemingly open to interpretation with a definition that is other than prevails 
elsewhere in software. See Allen's list above, and in our downstream project, 
the recent HBASE-13149 HBase server MR tools are broken on Hadoop 2.5+ Yarn, 
among others.  Let 3.x be incompatible with 2.x if only so we can leave behind 
all current notions of 'compatibility'
and just start over (as per Allen).


On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about 
 due for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that 
 will have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been 
 a long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to 
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two 
 months from now). In the past, we've had issues with our dependencies 
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and 
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish 
 series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM 
 and other cat herding responsibilities. There are already quite a few 
 changes slated for 3.0 besides the above (for instance the shell 
 script rewrite) so there's already value in a 3.0 alpha, and the more 
 time we give downstreams to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm 
 hoping to freeze incompatible changes after maybe two alphas, do a 
 beta (with no further incompat changes allowed), and then finally a 
 3.x GA. For those keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a 
 big bang release. For instance, it would be great if we could maintain 
 wire compatibility between 2.x and 3.x, so rolling upgrades work. 
 Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're 
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If 
 people are friendly to the idea, I'd like to cut a branch-3 and start 
 working on the first alpha.

 Best,
 Andrew

RE: Looking to a Hadoop 3 release

2015-03-02 Thread Zheng, Kai

JDK8 support is in the consideration, looks like many issues were reported and 
resolved already.

https://issues.apache.org/jira/browse/HADOOP-11090

-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for 
a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will 
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a 
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to 
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months 
from now). In the past, we've had issues with our dependencies discontinuing 
support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade 
our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other 
cat herding responsibilities. There are already quite a few changes slated for 
3.0 besides the above (for instance the shell script rewrite) so there's 
already value in a 3.0 alpha, and the more time we give downstreams to 
integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to 
freeze incompatible changes after maybe two alphas, do a beta (with no further 
incompat changes allowed), and then finally a 3.x GA. For those keeping track, 
that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang 
release. For instance, it would be great if we could maintain wire 
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely 
maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are 
friendly to the idea, I'd like to cut a branch-3 and start working on the first 
alpha.

Best,
Andrew

RE: 2.7 status

2015-03-02 Thread Zheng, Kai

Is it interested to get the following issues in the release ? Thanks !

HADOOP-10670
HADOOP-10671

Regards,
Kai

-Original Message-
From: Yongjun Zhang [mailto:yzh...@cloudera.com] 
Sent: Monday, March 02, 2015 4:46 AM
To: hdfs-...@hadoop.apache.org
Cc: Vinod Kumar Vavilapalli; Hadoop Common; mapreduce-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: 2.7 status

Hi,

Thanks for working on 2.7 release.

Currently the fallback from KerberosAuthenticator to PseudoAuthenticator  is 
enabled by default in a hardcoded way. HAOOP-10895 changes the default and 
requires applications (such as oozie) to set a config property or call an API 
to enable the fallback.

This jira has been reviewed, and almost ready to get in. However, there is a 
concern that we have to change the relevant applications. Please see my comment 
here:

https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14321823page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14321823

Any of your comments will be highly appreciated. This jira was postponed from 
2.6. I think it should be no problem to skip 2.7. But your comments would help 
us to decide what to do with this jira for future releases.

Thanks.

--Yongjun


On Sun, Mar 1, 2015 at 11:58 AM, Arun Murthy a...@hortonworks.com wrote:

 Sounds good, thanks for the help Vinod!

 Arun

 
 From: Vinod Kumar Vavilapalli
 Sent: Sunday, March 01, 2015 11:43 AM
 To: Hadoop Common; Jason Lowe; Arun Murthy
 Subject: Re: 2.7 status

 Agreed. How about we roll an RC end of this week? As a Java 7+ release 
 with features, patches that already got in?

 Here's a filter tracking blocker tickets - 
 https://issues.apache.org/jira/issues/?filter=12330598. Nine open now.

 +Arun
 Arun, I'd like to help get 2.7 out without further delay. Do you mind 
 me taking over release duties?

 Thanks,
 +Vinod
 
 From: Jason Lowe jl...@yahoo-inc.com.INVALID
 Sent: Friday, February 13, 2015 8:11 AM
 To: common-dev@hadoop.apache.org
 Subject: Re: 2.7 status

 I'd like to see a 2.7 release sooner than later.  It has been almost 3 
 months since Hadoop 2.6 was released, and there have already been 634 
 JIRAs committed to 2.7.  That's a lot of changes waiting for an official 
 release.

 https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2C
 hdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resoluti
 on%3DFixed
 Jason

   From: Sangjin Lee sj...@apache.org
  To: common-dev@hadoop.apache.org common-dev@hadoop.apache.org
  Sent: Tuesday, February 10, 2015 1:30 PM
  Subject: 2.7 status

 Folks,

 What is the current status of the 2.7 release? I know initially it 
 started out as a java-7 only release, but looking at the JIRAs that 
 is very much not the case.

 Do we have a certain timeframe for 2.7 or is it time to discuss it?

 Thanks,
 Sangjin

RE: Looking to a Hadoop 3 release

2015-03-02 Thread Zheng, Kai

Sorry for the bad. I thought it was sending to my colleagues. 

By the way, for the JDK8 support, we (Intel) would like to investigate further 
and help, thanks.

Regards,
Kai

-Original Message-
From: Zheng, Kai 
Sent: Tuesday, March 03, 2015 8:49 AM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: RE: Looking to a Hadoop 3 release

JDK8 support is in the consideration, looks like many issues were reported and 
resolved already.

https://issues.apache.org/jira/browse/HADOOP-11090


-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for 
a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will 
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a 
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to 
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months 
from now). In the past, we've had issues with our dependencies discontinuing 
support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade 
our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other 
cat herding responsibilities. There are already quite a few changes slated for 
3.0 besides the above (for instance the shell script rewrite) so there's 
already value in a 3.0 alpha, and the more time we give downstreams to 
integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to 
freeze incompatible changes after maybe two alphas, do a beta (with no further 
incompat changes allowed), and then finally a 3.x GA. For those keeping track, 
that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang 
release. For instance, it would be great if we could maintain wire 
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely 
maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are 
friendly to the idea, I'd like to cut a branch-3 and start working on the first 
alpha.

Best,
Andrew

RE: Anyone know how to mock a secured hdfs for unit test?

2014-06-30 Thread Zheng, Kai

Hi Chris,

Thanks for your great info. I would paste it in the JIRA for future reference 
if I or somebody else get the chance to work on it. 

Regards,
Kai

-Original Message-
From: Chris Nauroth [mailto:cnaur...@hortonworks.com] 
Sent: Saturday, June 28, 2014 4:27 AM
To: secur...@hadoop.apache.org
Cc: yarn-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
hdfs-iss...@hadoop.apache.org; yarn-iss...@hadoop.apache.org; 
mapreduce-...@hadoop.apache.org
Subject: Re: Anyone know how to mock a secured hdfs for unit test?

Hi David and Kai,

There are a couple of challenges with this, but I just figured out a pretty 
decent setup while working on HDFS-2856.  That code isn't committed yet, but if 
you open patch version 5 attached to that issue and look for the 
TestSaslDataTransfer class, then you'll see how it works.  Most of the logic 
for bootstrapping a MiniKDC and setting up the right HDFS configuration 
properties is in an abstract base class named SaslDataTransferTestCase.

I hope this helps.

There are a few other open issues out there related to tests in secure mode.  I 
know of HDFS-4312 and HDFS-5410.  It would be great to get more regular test 
coverage with something that more closely approximates a secured deployment.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Thu, Jun 26, 2014 at 7:27 AM, Zheng, Kai kai.zh...@intel.com wrote:

 Hi David,

 Quite some time ago I opened HADOOP-9952 and planned to create secured 
 MiniClusters by making use of MiniKDC. Unfortunately since then I 
 didn't get the chance to work on it yet. If you need something like 
 that and would contribute, please let me know and see if anything I can help 
 with. Thanks.

 Regards,
 Kai

 -Original Message-
 From: Liu, David [mailto:liujion...@gmail.com]
 Sent: Thursday, June 26, 2014 10:12 PM
 To: hdfs-...@hadoop.apache.org; hdfs-iss...@hadoop.apache.org; 
 yarn-...@hadoop.apache.org; yarn-iss...@hadoop.apache.org; 
 mapreduce-...@hadoop.apache.org; secur...@hadoop.apache.org
 Subject: Anyone know how to mock a secured hdfs for unit test?

 Hi all,

 I need to test my code which read data from secured hdfs, is there any 
 library to mock secured hdfs, can minihdfscluster do the work?
 Any suggestion is appreciated.


 Thanks


--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.

RE: [DISCUSS] Security Efforts and Branching

2013-09-26 Thread Zheng, Kai

 think it works for us 
to have HAS as a community effort as the TokenAuth framework and we both 
contribute on the implementation?

To proceed, I would try to align between us, complementing your proposal and 
addressing your concerns as follows.

= Iteration Endstate =
Besides what you mentioned from user view, how about adding this consideration:
Additionally, the initial iteration would also lay down the ground TokenAuth 
framework with fine defined APIs, protocols, flows and core facilities for 
implementations. The framework should avoid rework and big change for future 
implementations.

= Terminology and Naming =
It would be great if we can unify the related terminologies in this effort, at 
least in the framework level. This could be probably achieved in the process of 
defining relevant APIs for the TokenAuth framework.

= Project scope =
It's great we have the common list in scope for the first iteration as you 
mentioned as follows:
Usecases:
client types: REST, CLI, UI
authentication types: Simple, Kerberos, authentication/LDAP, federation/SAML

We might also consider OAuth 2.0 support. Anyway please note by defining this 
in-scope list we know what's required as must-have in the iteration as 
enforcement of our consensus, however it should not limit any relevant parties 
to contribute more meanwhile unless it does not be appropriate at the time.

= Branch =
As you mentioned we may have different branches for different features 
considering merge.  Another approach is just having one branch with relevant 
security features, the review and merge work can still be JIRA based.

1. Based on your proposal, how about the following as the branch(es) scope:
1)  Pluggable Authentication and Token based SSO
2)  CryptoFS for volume level encryption (HCFS)
3) Pluggable UGI change
4) Key management system
5) Unified authorization

2. With the above scope in mind, a candidate branch name could be like 
'security-branch' instead of 'tokenauth-branch'. How about creating the branch 
now if we don't have other concerns?

3. Check-in philosophy. Agree with your proposal with slightly concerns:
In terms of check-in philosophy, we should take a review then check-in approach 
to the branch with lazy consensus - wherein we do not need to explicitly +1 
every check-in to the branch but we will honor any -1's with discussion to 
resolve before checking in. This will provide us each with the opportunity to 
track the work being done and ensure that we understand it and find that it 
meets the intended goals.

We might need explicit +1 otherwise we would need define a time window pending 
to wait when to check-in.
One issue we would like to clarify, does voting also include the security 
branch committers.

= JIRA =
We might not need additional umbrella JIRA for now since we already have 
HADOOP-9392 and HADOOP-9533. By the way I would suggest we use existing feature 
JIRAs to discuss relevant and specific issues on the going. Leveraging these 
JIRAs we might avoid too much details in the common-dev thread and it's also 
easy to track relevant discussions.

I agree it's a good point to start with an inventory of the existing JIRAs. We 
can do that if there're no other concerns. We would provide the full list of 
breakdown JIRAs and attach it in HADOOP-9392 then for further collaboration.

Regards,
Kai

From: larry mccay [mailto:larry.mc...@gmail.com]
Sent: Wednesday, September 18, 2013 6:27 AM
To: Zheng, Kai; Chen, Haifeng; common-dev@hadoop.apache.org
Subject: Re: [DISCUSS] Security Efforts and Branching

All -

I apologize for not following up sooner. I have been heads down on some other 
matters that required my attention.

It seems that it may be easier to move forward by gaining consensus a little 
bit at a time rather than trying to hit the ground running where the other 
thread left off.

Would it be agreeable to everyone to start with an inventory of the existing 
Jiras that have patches available or nearly available so that we can determine 
what concrete bits we have to start with?

Once we get that done, we can try and frame a set of goals to to make up the 
initial iteration and determine what from the inventory will be leverage in 
that iteration.

Does this sound reasonable to everyone?
Would anyone like to propose another starting point?

thanks,

--larry

On Wed, Sep 4, 2013 at 4:26 PM, larry mccay 
larry.mc...@gmail.commailto:larry.mc...@gmail.com wrote:
It doesn't look like the PDF made it all the way through to the archives and 
maybe even to recipients - so the following is the text version of the 
iteration-1 draft:

Iteration 1: Pluggable User Authentication and Federation

Introduction
The intent of this effort is to bootstrap the development of pluggable 
token-based authentication mechanisms to support certain goals of enterprise 
authentication integrations. By restricting the scope of this effort, we hope 
to provide immediate benefit to the community while keeping the initial 
contribution

RE: [DISCUSS] Security Efforts and Branching

2013-09-26 Thread Zheng, Kai

 central 
servers like HAS server and HSSO server? If not, do you think it works for us 
to have HAS as a community effort as the TokenAuth framework and we both 
contribute on the implementation?

To proceed, I would try to align between us, complementing your proposal and 
addressing your concerns as follows.

= Iteration Endstate =
Besides what you mentioned from user view, how about adding this consideration:
Additionally, the initial iteration would also lay down the ground TokenAuth 
framework with fine defined APIs, protocols, flows and core facilities for 
implementations. The framework should avoid rework and big change for future 
implementations.

= Terminology and Naming =
It would be great if we can unify the related terminologies in this effort, at 
least in the framework level. This could be probably achieved in the process of 
defining relevant APIs for the TokenAuth framework.

= Project scope =
It's great we have the common list in scope for the first iteration as you 
mentioned as follows:
Usecases:
client types: REST, CLI, UI
authentication types: Simple, Kerberos, authentication/LDAP, federation/SAML

We might also consider OAuth 2.0 support. Anyway please note by defining this 
in-scope list we know what's required as must-have in the iteration as 
enforcement of our consensus, however it should not limit any relevant parties 
to contribute more meanwhile unless it does not be appropriate at the time.

= Branch =
As you mentioned we may have different branches for different features 
considering merge.  Another approach is just having one branch with relevant 
security features, the review and merge work can still be JIRA based.

1. Based on your proposal, how about the following as the branch(es) scope:
1) Pluggable Authentication and Token based SSO
2) CryptoFS for volume level encryption (HCFS)
3) Pluggable UGI change
4) Key management system
5) Unified authorization

2. With the above scope in mind, a candidate branch name could be like 
'security-branch' instead of 'tokenauth-branch'. How about creating the branch 
now if we don't have other concerns?

3. Check-in philosophy. Agree with your proposal with slightly concerns:
In terms of check-in philosophy, we should take a review then check-in approach 
to the branch with lazy consensus - wherein we do not need to explicitly +1 
every check-in to the branch but we will honor any -1's with discussion to 
resolve before checking in. This will provide us each with the opportunity to 
track the work being done and ensure that we understand it and find that it 
meets the intended goals.

We might need explicit +1 otherwise we would need define a time window pending 
to wait when to check-in.
One issue we would like to clarify, does voting also include the security 
branch committers.

= JIRA =
We might not need additional umbrella JIRA for now since we already have 
HADOOP-9392 and HADOOP-9533. By the way I would suggest we use existing feature 
JIRAs to discuss relevant and specific issues on the going. Leveraging these 
JIRAs we might avoid too much details in the common-dev thread and it's also 
easy to track relevant discussions.

I agree it's a good point to start with an inventory of the existing JIRAs. We 
can do that if there're no other concerns. We would provide the full list of 
breakdown JIRAs and attach it in HADOOP-9392 then for further collaboration.

Regards,
Kai

From: larry mccay [mailto:larry.mc...@gmail.com]
Sent: Wednesday, September 18, 2013 6:27 AM
To: Zheng, Kai; Chen, Haifeng; common-dev@hadoop.apache.org
Subject: Re: [DISCUSS] Security Efforts and Branching

All -

I apologize for not following up sooner. I have been heads down on some other 
matters that required my attention.

It seems that it may be easier to move forward by gaining consensus a little 
bit at a time rather than trying to hit the ground running where the other 
thread left off.

Would it be agreeable to everyone to start with an inventory of the existing 
Jiras that have patches available or nearly available so that we can determine 
what concrete bits we have to start with?

Once we get that done, we can try and frame a set of goals to to make up the 
initial iteration and determine what from the inventory will be leverage in 
that iteration.

Does this sound reasonable to everyone?
Would anyone like to propose another starting point?

thanks,

--larry

On Wed, Sep 4, 2013 at 4:26 PM, larry mccay 
larry.mc...@gmail.commailto:larry.mc...@gmail.com wrote:
It doesn't look like the PDF made it all the way through to the archives and 
maybe even to recipients - so the following is the text version of the 
iteration-1 draft:

Iteration 1: Pluggable User Authentication and Federation

Introduction
The intent of this effort is to bootstrap the development of pluggable 
token-based authentication mechanisms to support certain goals of enterprise 
authentication integrations. By restricting the scope of this effort, we hope 
to provide immediate

RE: [DISCUSS] Hadoop SSO/Token Server Components

2013-09-05 Thread Zheng, Kai

Got it Suresh. 

So I guess HADOOP-9797 (and the family) for the UGI change would be a fit to 
this rule right. The refactoring is improving and cleaning UGI, also preparing 
for TokenAuth feature. According to this rule the changes would be in trunk 
first. Thanks for your guidance.

Regards,
Kai

-Original Message-
From: Suresh Srinivas [mailto:sur...@hortonworks.com] 
Sent: Thursday, September 05, 2013 2:42 PM
To: common-dev@hadoop.apache.org
Subject: Re: [DISCUSS] Hadoop SSO/Token Server Components

 One aside: if you come across a bug, please try to fix it upstream and 
 then merge into the feature branch rather than cherry-picking patches 
 or only fixing it on the branch. It becomes very awkward to track. -C


Related to this, when refactoring the code, generally required for large 
feature development, consider first refactoring in trunk and then make 
additional changes for the feature in the feature branch. This helps a lot in 
being able to merge the trunk to feature branch periodically. This will also 
help in keeping the change for merging feature to trunk small and easier 
reviews.

Regards,
Suresh

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.

RE: Need help for building haddop source on windows 7

2013-09-05 Thread Zheng, Kai

Perhaps you could try the 'branch-trunk-win' branch, and look for some building 
instructions in it. Not sure this works or not, though.

-Original Message-
From: Ranjan Dutta [mailto:rdbmsdata.ran...@gmail.com] 
Sent: Thursday, September 05, 2013 3:43 PM
To: common-dev@hadoop.apache.org
Subject: Need help for building haddop source on windows 7

Hi ,

I want to build hadoop source on Windows 7 . Can anybody share a doucument 
related to source build.

Thanks
Ranjan

RE: [DISCUSS] Hadoop SSO/Token Server Components

2013-09-03 Thread Zheng, Kai

 
   approach
 or
   an
   HSSO vs
   TAS discussion.
  
   Your latest design revision actually makes it clear 
   that you
   are
   now targeting exactly what was described as HSSO - so
   comparing
   and
   contrasting
   is not going to add any value.
  
   What we need you to do at this point, is to look at 
   those high-level components described on this thread 
   and comment
 on
   whether we need additional components or any that are 
   listed that don't seem
   necessary
   to
   you and why.
   In other words, we need to define and agree on the 
   work that
   has
   to
   be
   done.
  
   We also need to determine those components that need 
   to be
   done
   before
   anything else can be started.
   I happen to agree with Brian that #4 Hadoop SSO Tokens 
   are central to
   all
   the other components and should probably be defined 
   and
 POC'd
   in
   short
   order.
  
   Personally, I think that continuing the separation of 
   9533
   and
   9392
   will
   do this effort a disservice. There doesn't seem to be 
   enough
   differences
   between the two to justify separate jiras anymore. It 
   may be best to
   file a
   new one that reflects a single vision without the 
   extra
 cruft
   that
   has
   built up in either of the existing ones. We would 
   certainly reference
   the
   existing ones within the new one. This approach would 
   align
   with
   the
   spirit
   of the discussions up to this point.
  
   I am prepared to start a discussion around the shape 
   of the
   two
   Hadoop
   SSO
   tokens: identity and access. If this is what others 
   feel the next
   topic
   should be.
   If we can identify a jira home for it, we can do it 
   there -
   otherwise we
   can create another DISCUSS thread for it.
  
   thanks,
  
   --larry
  
  
   On Jul 3, 2013, at 2:39 PM, Zheng, Kai 
   kai.zh...@intel.com
   wrote:
  
   Hi Larry,
  
   Thanks for the update. Good to see that with this 
   update we
   are
   now
   aligned on most points.
  
   I have also updated our TokenAuth design in HADOOP-9392.
 The
   new
   revision incorporates feedback and suggestions in 
   related discussion
   with
   the community, particularly from Microsoft and others
   attending
   the Security design lounge session at the Hadoop summit.
   Summary
   of the
   changes:
   1.Revised the approach to now use two tokens, Identity
   Token
   plus
   Access Token, particularly considering our 
   authorization framework
   and
   compatibility with HSSO;
   2.Introduced Authorization Server (AS) from our
   authorization
   framework into the flow that issues access tokens for
 clients
   with
   identity
   tokens to access services;
   3.Refined proxy access token and the
 proxy/impersonation
   flow;
   4.Refined the browser web SSO flow regarding access to
   Hadoop
   web
   services;
   5.Added Hadoop RPC access flow regard
  
  
  
   --
   Best regards,
  
   - Andy
  
   Problems worthy of attack prove their worth by hitting 
   back. -
   Piet
   Hein (via Tom White)
  
  
  
  
  
   --
   Alejandro
  
  
  
  
  
  
   Iteration1PluggableUserAuthenticationandFederation.pdf
  
  
  
  
   --
   Alejandro
  
  
  
  
  
   --
   Alejandro
 
 


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or 
 entity to which it is addressed and may contain information that is 
 confidential, privileged and exempt from disclosure under applicable 
 law. If the reader of this message is not the intended recipient, you 
 are hereby notified that any printing, copying, dissemination, 
 distribution, disclosure or forwarding of this communication is 
 strictly prohibited. If you have received this communication in error, 
 please contact the sender immediately and delete it from your system. Thank 
 You.

RE: [DISCUSS] Hadoop SSO/Token Server Components

2013-07-05 Thread Zheng, Kai

 for sure, there were some key differences that didn't just 
   disappear.
   Subsequent discussion will make that clear. I also disagree with 
   your characterization that we have simply endorsed all of the 
   design
 decisions
   of the so-called HSSO, this is taking a mile from an inch. We are 
   here
 to
   engage in a collaborative process as peers. I've been encouraged 
   by the spirit of the discussions up to this point and hope that 
   can continue beyond one design summit.
  
  
  
   On Wed, Jul 3, 2013 at 1:10 PM, Larry McCay 
   lmc...@hortonworks.com
  wrote:
  
   Hi Kai -
  
   I think that I need to clarify something...
  
   This is not an update for 9533 but a continuation of the 
   discussions
  that
   are focused on a fresh look at a SSO for Hadoop.
   We've agreed to leave our previous designs behind and therefore 
   we
  aren't
   really seeing it as an HSSO layered on top of TAS approach or an 
   HSSO
 vs
   TAS discussion.
  
   Your latest design revision actually makes it clear that you are 
   now targeting exactly what was described as HSSO - so comparing 
   and
  contrasting
   is not going to add any value.
  
   What we need you to do at this point, is to look at those 
   high-level components described on this thread and comment on 
   whether we need additional components or any that are listed that 
   don't seem necessary
  to
   you and why.
   In other words, we need to define and agree on the work that has 
   to be done.
  
   We also need to determine those components that need to be done 
   before anything else can be started.
   I happen to agree with Brian that #4 Hadoop SSO Tokens are 
   central to
  all
   the other components and should probably be defined and POC'd in 
   short order.
  
   Personally, I think that continuing the separation of 9533 and 
   9392
 will
   do this effort a disservice. There doesn't seem to be enough
 differences
   between the two to justify separate jiras anymore. It may be best 
   to
  file a
   new one that reflects a single vision without the extra cruft 
   that has built up in either of the existing ones. We would 
   certainly reference
  the
   existing ones within the new one. This approach would align with 
   the
  spirit
   of the discussions up to this point.
  
   I am prepared to start a discussion around the shape of the two 
   Hadoop
  SSO
   tokens: identity and access. If this is what others feel the next
 topic
   should be.
   If we can identify a jira home for it, we can do it there - 
   otherwise
 we
   can create another DISCUSS thread for it.
  
   thanks,
  
   --larry
  
  
   On Jul 3, 2013, at 2:39 PM, Zheng, Kai kai.zh...@intel.com wrote:
  
   Hi Larry,
  
   Thanks for the update. Good to see that with this update we are 
   now
   aligned on most points.
  
   I have also updated our TokenAuth design in HADOOP-9392. The new
   revision incorporates feedback and suggestions in related 
   discussion
  with
   the community, particularly from Microsoft and others attending 
   the Security design lounge session at the Hadoop summit. Summary 
   of the
  changes:
   1.Revised the approach to now use two tokens, Identity Token plus
   Access Token, particularly considering our authorization 
   framework and compatibility with HSSO;
   2.Introduced Authorization Server (AS) from our authorization
   framework into the flow that issues access tokens for clients 
   with
  identity
   tokens to access services;
   3.Refined proxy access token and the proxy/impersonation flow;
   4.Refined the browser web SSO flow regarding access to Hadoop web
   services;
   5.Added Hadoop RPC access flow regard



 --
 Best regards,

- Andy

 Problems worthy of attack prove their worth by hitting back. - Piet 
 Hein (via Tom White)




--
Alejandro

RE: [DISCUSS] Hadoop SSO/Token Server Components

2013-07-04 Thread Zheng, Kai

Hi Larry,
 
Our design from its first revision focuses on and provides comprehensive 
support to allow pluggable authentication mechanisms based on a common token, 
trying to address single sign on issues across the ecosystem to support access 
to Hadoop services via RPC, REST, and web browser SSO flow. The updated design 
doc adds even more texts and flows to explain or illustrate these existing 
items in details as requested by some on the JIRA.
 
Additional to the identity token we had proposed, we adopted access token and 
adapted the approach not only for sake of making TokenAuth compatible with 
HSSO, but also for better support of fine grained access control, and seamless 
integration with our authorization framework and even 3rd party authorization 
service like OAuth Authorization Server. We regard these as important because 
Hadoop is evolving into an enterprise and cloud platform that needs a complete 
authN and authZ solution and without this support we would need future rework 
to complete the solution.
 
Since you asked about the differences between TokenAuth and HSSO, here are some 
key ones:
 
TokenAuth supports TAS federation to allow clients to access multiple clusters 
without a centralized SSO server while HSSO provides a centralized SSO server 
for multiple clusters.
 
TokenAuth integrates authorization framework with auditing support in order to 
provide a complete solution for enterprise data access security. This allows 
administrators to administrate security polices centrally and have the polices 
be enforced consistently across components in the ecosystem in a pluggable way 
that supports different authorization models like RBAC, ABAC and even XACML 
standards.
 
TokenAuth targets support for domain based authN  authZ to allow multi-tenant 
deployments. Authentication and authorization rules can be configured and 
enforced per domain, which allows organizations to manage their individual 
policies separately while sharing a common large pool of resources.
 
TokenAuth addresses proxy/impersonation case with flow as Tianyou mentioned, 
where a service can proxy client to access another service in a secured and 
constrained way.
 
Regarding token based authentication plus SSO and unified authorization 
framework, HADOOP-9392 and HADOOP-9466 let's continue to use these as umbrella 
JIRAs for these efforts. HSSO targets support for centralized SSO server for 
multiple clusters and as we have pointed out before is a nice subset of the 
work proposed on HADOOP-9392. Let's align these two JIRAs and address the 
question Kevin raised multiple times in 9392/9533 JIRAs, How can HSSO and TAS 
work together? What is the relationship?. The design update I provided was 
meant to provide the necessary details so we can nail down that relationship 
and collaborate on the implementation of these JIRAs.

As you have also confirmed, this design aligns with related community 
discussions, so let's continue our collaborative effort to contribute code to 
these JIRAs.

Regards,
Kai

-Original Message-
From: Larry McCay [mailto:lmc...@hortonworks.com] 
Sent: Thursday, July 04, 2013 4:10 AM
To: Zheng, Kai
Cc: common-dev@hadoop.apache.org
Subject: Re: [DISCUSS] Hadoop SSO/Token Server Components

Hi Kai -

I think that I need to clarify something...

This is not an update for 9533 but a continuation of the discussions that are 
focused on a fresh look at a SSO for Hadoop.
We've agreed to leave our previous designs behind and therefore we aren't 
really seeing it as an HSSO layered on top of TAS approach or an HSSO vs TAS 
discussion.

Your latest design revision actually makes it clear that you are now targeting 
exactly what was described as HSSO - so comparing and contrasting is not going 
to add any value.

What we need you to do at this point, is to look at those high-level components 
described on this thread and comment on whether we need additional components 
or any that are listed that don't seem necessary to you and why.
In other words, we need to define and agree on the work that has to be done.

We also need to determine those components that need to be done before anything 
else can be started.
I happen to agree with Brian that #4 Hadoop SSO Tokens are central to all the 
other components and should probably be defined and POC'd in short order.

Personally, I think that continuing the separation of 9533 and 9392 will do 
this effort a disservice. There doesn't seem to be enough differences between 
the two to justify separate jiras anymore. It may be best to file a new one 
that reflects a single vision without the extra cruft that has built up in 
either of the existing ones. We would certainly reference the existing ones 
within the new one. This approach would align with the spirit of the 
discussions up to this point.

I am prepared to start a discussion around the shape of the two Hadoop SSO 
tokens: identity and access. If this is what others feel the next topic should

RE: [DISCUSS] Hadoop SSO/Token Server Components

2013-07-03 Thread Zheng, Kai

Hi Larry,

Thanks for the update. Good to see that with this update we are now aligned on 
most points.

 I have also updated our TokenAuth design in HADOOP-9392. The new revision 
incorporates feedback and suggestions in related discussion with the community, 
particularly from Microsoft and others attending the Security design lounge 
session at the Hadoop summit. Summary of the changes:
1.Revised the approach to now use two tokens, Identity Token plus Access 
Token, particularly considering our authorization framework and compatibility 
with HSSO;
2.Introduced Authorization Server (AS) from our authorization framework 
into the flow that issues access tokens for clients with identity tokens to 
access services;
3.Refined proxy access token and the proxy/impersonation flow;
4.Refined the browser web SSO flow regarding access to Hadoop web services;
5.Added Hadoop RPC access flow regarding CLI clients accessing Hadoop 
services via RPC/SASL;
6.Added client authentication integration flow to illustrate how desktop 
logins can be integrated into the authentication process to TAS to exchange 
identity token;
7.Introduced fine grained access control flow from authorization framework, 
I have put it in appendices section for the reference;
8.Added a detailed flow to illustrate Hadoop Simple authentication over 
TokenAuth, in the appendices section;
9.Added secured task launcher in appendices as possible solutions for 
Windows platform;
10.Removed low level contents, and not so relevant parts into appendices 
section from the main body.

As we all think about how to layer HSSO on TAS in TokenAuth framework, please 
take some time to look at the doc and then let's discuss the gaps we might 
have. I would like to discuss these gaps with focus on the implementations 
details so we are all moving towards getting code done. Let's continue this 
part of the discussion in HADOOP-9392 to allow for better tracking on the JIRA 
itself. For discussions related to Centralized SSO server, suggest we continue 
to use HADOOP-9533 to consolidate all discussion related to that JIRA. That way 
we don't need extra umbrella JIRAs.

I agree we should speed up these discussions, agree on some of the 
implementation specifics so both us can get moving on the code while not 
stepping on each other in our work.

Look forward to your comments and comments from others in the community. Thanks.

Regards,
Kai

-Original Message-
From: Larry McCay [mailto:lmc...@hortonworks.com] 
Sent: Wednesday, July 03, 2013 4:04 AM
To: common-dev@hadoop.apache.org
Subject: [DISCUSS] Hadoop SSO/Token Server Components

All -

As a follow up to the discussions that were had during Hadoop Summit, I would 
like to introduce the discussion topic around the moving parts of a Hadoop 
SSO/Token Service.
There are a couple of related Jira's that can be referenced and may or may not 
be updated as a result of this discuss thread.

https://issues.apache.org/jira/browse/HADOOP-9533
https://issues.apache.org/jira/browse/HADOOP-9392

As the first aspect of the discussion, we should probably state the overall 
goals and scoping for this effort:
* An alternative authentication mechanism to Kerberos for user authentication
* A broader capability for integration into enterprise identity and SSO 
solutions
* Possibly the advertisement/negotiation of available authentication mechanisms
* Backward compatibility for the existing use of Kerberos
* No (or minimal) changes to existing Hadoop tokens (delegation, job, block 
access, etc)
* Pluggable authentication mechanisms across: RPC, REST and webui enforcement 
points
* Continued support for existing authorization policy/ACLs, etc
* Keeping more fine grained authorization policies in mind - like attribute 
based access control
- fine grained access control is a separate but related effort that we 
must not preclude with this effort
* Cross cluster SSO

In order to tease out the moving parts here are a couple high level and 
simplified descriptions of SSO interaction flow:
   +--+
+--+ credentials 1 | SSO  |
|CLIENT|--|SERVER|
+--+  :tokens  +--+
  2 |
| access token
V :requested resource
+---+
|HADOOP |
|SERVICE|
+---+

The above diagram represents the simplest interaction model for an SSO service 
in Hadoop.
1. client authenticates to SSO service and acquires an access token
  a. client presents credentials to an authentication service endpoint exposed 
by the SSO server (AS) and receives a token representing the authentication 
event and verified identity
  b. client then presents the identity token from 1.a. to the token endpoint 
exposed by the SSO server (TGS) to request an access token to a particular 
Hadoop service and receives an access token 2. client presents the Hadoop 
access

RE: Fostering a Hadoop security dev community

2013-06-20 Thread Zheng, Kai

In my view it should be for the whole ecosystem. One inspiration of this is to 
ease the collaboration and discussion for the work on going about token based 
authentication and SSO, which absolutely targets the ecosystem, although the 
coming up libraries and facilities might reside in hadoop common umbrella. 

-Original Message-
From: Alejandro Abdelnur [mailto:t...@cloudera.com] 
Sent: Friday, June 21, 2013 1:32 AM
To: common-dev@hadoop.apache.org
Subject: Re: Fostering a Hadoop security dev community

This sounds great,

Is this restricted to the Hadoop project itself or the intention is to cover 
the whole Hadoop ecosystem? If the later, how are you planning to engage and 
sync up with the different projects?

Thanks.

On Thu, Jun 20, 2013 at 9:45 AM, Larry McCay lmc...@hortonworks.com wrote:

 It would be great to have dedicated resources like these.
 One thing missing for cross cutting concerns like security is a source 
 of truth for a holistic view of the entire model.
 A dedicated wiki space would allow for this view and facilitate the 
 filing of Jiras that align with the big picture.

 On Thu, Jun 20, 2013 at 12:31 PM, Kevin Minder  
 kevin.min...@hortonworks.com
  wrote:

  Hi PMCs  Everyone,

  There are a number of significant, complex and overlapping efforts 
  underway to improve the Hadoop security model.  Many involved are 
  struggling to form this into a cohesive whole across the numerous 
  Jiras
 and
  within the traffic of common-dev.  There has been a suggestion made 
  that having two additional pieces of infrastructure might help.

  1) Establish a security-dev mailing list similar to hdfs-dev, 
  yarn-dev, mapreduce-dev, etc. that would help us have more focused 
  interaction on non-vulnerability security topics.  I understand that 
  this might
 devalue
  common-dev somewhat but the benefits might outweigh that.

  2) Establish a corner of the wiki were cross cutting security design
 could
  be worked out more collaboratively than a doc rev upload mechanism.  
  I
 fear
  if we don't have this we will end up collaborating outside Apache 
  infrastructure which seems inappropriate.  I understand the risk of
 losing
  context in the individual Jiras but again my sense is that the
 cohesiveness
  provided will outweigh the risk.

  I'm open to and interested in other suggestions for how others have
 solved
  these types of cross cutting collaboration challenges.

  Thanks.
  Kevin.

--
Alejandro

RE: [jira] [Updated] (HADOOP-9477) posixGroups support for LDAP groups mapping service

2013-05-06 Thread Zheng, Kai

Hi,

Can anyone help take a look at this why the patch submitting won't trigger the 
checking of HADOOP-QA? Thanks.

Regards,
Kai

-Original Message-
From: Dapeng Sun (JIRA) [mailto:j...@apache.org] 
Sent: Sunday, May 05, 2013 11:00 PM
To: Zheng, Kai
Subject: [jira] [Updated] (HADOOP-9477) posixGroups support for LDAP groups 
mapping service


 [ 
https://issues.apache.org/jira/browse/HADOOP-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated HADOOP-9477:
---

 Target Version/s: 2.0.4-alpha
Affects Version/s: 2.0.4-alpha
   Status: Patch Available  (was: Open)

 posixGroups support for LDAP groups mapping service
 ---

 Key: HADOOP-9477
 URL: https://issues.apache.org/jira/browse/HADOOP-9477
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Kai Zheng
Assignee: Kai Zheng
 Fix For: 2.0.5-beta

 Attachments: HADOOP-9477.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 It would be nice to support posixGroups for LdapGroupsMapping service. Below 
 is from current description for the provider:
 hadoop.security.group.mapping.ldap.search.filter.group:
 An additional filter to use when searching for LDAP groups. This 
 should be changed when resolving groups against a non-Active Directory 
 installation.
 posixGroups are currently not a supported group class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators 
For more information on JIRA, see: http://www.atlassian.com/software/jira

RE: [jira] [Updated] (HADOOP-9477) posixGroups support for LDAP groups mapping service

2013-05-06 Thread Zheng, Kai

Thank you Tianhong. 

Dapeng you can resubmit your patch again as Tianhong told. Thanks.

Regards,
Kai

-Original Message-
From: Wang Tianhong [mailto:wangt...@linux.vnet.ibm.com] 
Sent: Monday, May 06, 2013 3:47 PM
To: common-dev@hadoop.apache.org
Subject: RE: [jira] [Updated] (HADOOP-9477) posixGroups support for LDAP groups 
mapping service

Hi Zheng
You can resubmit the patch again. Sometimes the Jenkins may not work well.


On Mon, 2013-05-06 at 07:36 +, Zheng, Kai wrote:
 Hi,
 
 Can anyone help take a look at this why the patch submitting won't trigger 
 the checking of HADOOP-QA? Thanks.
 
 Regards,
 Kai
 
 -Original Message-
 From: Dapeng Sun (JIRA) [mailto:j...@apache.org]
 Sent: Sunday, May 05, 2013 11:00 PM
 To: Zheng, Kai
 Subject: [jira] [Updated] (HADOOP-9477) posixGroups support for LDAP 
 groups mapping service
 
 
  [ 
 https://issues.apache.org/jira/browse/HADOOP-9477?page=com.atlassian.j
 ira.plugin.system.issuetabpanels:all-tabpanel ]
 
 Dapeng Sun updated HADOOP-9477:
 ---
 
  Target Version/s: 2.0.4-alpha
 Affects Version/s: 2.0.4-alpha
Status: Patch Available  (was: Open)
 
  posixGroups support for LDAP groups mapping service
  ---
 
  Key: HADOOP-9477
  URL: https://issues.apache.org/jira/browse/HADOOP-9477
  Project: Hadoop Common
   Issue Type: Improvement
 Affects Versions: 2.0.4-alpha
 Reporter: Kai Zheng
 Assignee: Kai Zheng
  Fix For: 2.0.5-beta
 
  Attachments: HADOOP-9477.patch
 
Original Estimate: 168h
   Remaining Estimate: 168h
 
  It would be nice to support posixGroups for LdapGroupsMapping service. 
  Below is from current description for the provider:
  hadoop.security.group.mapping.ldap.search.filter.group:
  An additional filter to use when searching for LDAP groups. This 
  should be changed when resolving groups against a non-Active Directory 
  installation.
  posixGroups are currently not a supported group class.
 
 --
 This message is automatically generated by JIRA.
 If you think it was sent incorrectly, please contact your JIRA 
 administrators For more information on JIRA, see: 
 http://www.atlassian.com/software/jira

About FileBasedGroupMapping provider and Virtual Groups

2012-12-10 Thread Zheng, Kai

Hi everyone,

Before I open a JIRA, I'd like to know how you like it, a file based group 
mapping provider. The idea is as follows.
1. Have a new user group mapping provider such as FileBasedGroupMapping, which 
consumes a mapping file like below:
$HADOOP_CONF/groupsMapping.txt:
group1:user1,user2
group2:usuer3,user4
groupX:user5 group1
groupY:user6 group2
...
According to this file, the provider will get groups list for the users as:
user1-group1,groupX #same for user2
user3-group2,groupY #same for user4
user5-groupX
user6-groupY
Note for user1, it gets group1 directly as above mapping file; then, since 
group1 belongs to groupX, 
user1 must also belong to groupX, so groupX is also user1's group.

2. So what's the benefits
1) It opens a door to role based access control for Hadoop. As you can see, in 
the mapping
file we can define virtual groups (or roles) like groupX, groupY to hold users 
and other groups. Such virtual groups can just be used 
as real groups, for example, assign to HDFS file as owner group, assign to MR 
queue level acl list, or in HBase/Hive, grant them some 
privileges on databases, tables.
2) It makes it possible that in HDFS allows users from more than one groups to 
read/write some file/folder while disallows 
others not to. For example, if we want to allow only user1 plus users in 
group1, group2 to read/write into /data/secure, we can define
a virtual group in the mapping file as secureGroup:user1 group1,group2, then 
chgrp for the folder to be secureGroup, 
and chmod for the folder as g+rw. 
3) As told above, this makes much sense and not just try to resolve a corner 
case. As you may know, Hive supports HDFS as backend storage,
and role based access control. Using Hive one can create a database and then 
grant some users/groups/roles with CREATE privilege on it. 
After that,some granted user (granted directly or via granted group or role) 
runs a cmd to create table in that database. It can pass the access
control check in Hive but still may be failed by HDFS when Hive tries to create 
a file for the table in the database folder for the user,
just due that the user hasn't write permission to the folder! To resolve such 
issues, we can easily achieve using this provider.
3) Minor but very convinent, we can use this mapping file and provider to 
define some users, groups for test purpose, when don't want to
involve ShellBasedGroupMapping or LdapGroupMapping.

Thanks for your feedback!

Kai

Questions and possible improvements for LdapGroupsMapping

2012-10-18 Thread Zheng, Kai

Hi All,

Regarding LdapGroupsMapping, I have following questions:


1.   Is it possible to use ShellBasedUnixGroupsMapping for Hadoop service 
principals/users, and LdapGroupsMapping for end user accounts?
In our  environment, normal end users (along with their groups info) for Hadoop 
cluster are from AD, and for them we prefer to use the ldap mapping;
but for hdfs/mapred service principals, the default shell based one is enough, 
and we don't want to create the user/group entries in AD just for that.
Seems in current implementation, only one user group mapping provider can be 
configured.


2.   Can we support multiple ADs? Hadoop users might come from more than 
ONE AD in big org.


3.   Is there any technical  issue not to support LDAPs like OpenLDAP? In 
my understanding, one possible difficulity might be that it's not easy to 
extract common
group lookup mechanism with common filters/configurations both applied for AD 
and OpenLDAP like, right?

I'm wondering if these are just limits for current implementation, and if so if 
we need to improve that. Might the community has already been going for that?

RE: Questions and possible improvements for LdapGroupsMapping

2012-10-18 Thread Zheng, Kai

Just got reply from user mailing list from Natty, as follows.
And I'd like to discuss further here since it's more appropriate.

Hi Natty,

1. It's great idea that we just write a customized group mapping service to 
handle different mapping for AD user and service principal;
2. OK, I'd like to improve it to support multiple ADs;
3. Great to know it. I will try the group mapping with OpenLDAP making use of 
the current configuration properties.

And further, to support to do different mapping for different user/principal, 
and support multiple ADs, we also need extra properties to 
configure what kind of user/principal (regarding domain/realm is an option) 
should use which group mapping mechanism.

To improve such things, I'm going to fire a JIRA for these. It would be great 
if you could continue to  comment on it. 

Thanks  regards,
Kai

From: Jonathan Natkins [mailto:na...@cloudera.com] 
Sent: Friday, October 19, 2012 8:58 AM
To: u...@hadoop.apache.org
Subject: Re: Secure hadoop and group permission on HDFS

Hi Kai,

1. To the best of my knowledge, you can only use one group mapping service at a 
time. In order to do what you're suggesting, you'd have to write a customized 
group mapping service.

2. Currently multiple ADs are not supported, but it's certainly an improvement 
that could be made.

3. The LdapGroupsMapping already supports OpenLDAP. It's pretty heavily 
configurable for the purpose of supporting multiple types of LDAP 
implementations. The defaults just happen to be geared towards Active Directory.

Thanks,
Natty

-Original Message-
From: Zheng, Kai [mailto:kai.zh...@intel.com] 
Sent: Friday, October 19, 2012 8:32 AM
To: common-dev@hadoop.apache.org
Subject: Questions and possible improvements for LdapGroupsMapping

Hi All,

Regarding LdapGroupsMapping, I have following questions:


1.   Is it possible to use ShellBasedUnixGroupsMapping for Hadoop service 
principals/users, and LdapGroupsMapping for end user accounts?
In our  environment, normal end users (along with their groups info) for Hadoop 
cluster are from AD, and for them we prefer to use the ldap mapping; but for 
hdfs/mapred service principals, the default shell based one is enough, and we 
don't want to create the user/group entries in AD just for that.
Seems in current implementation, only one user group mapping provider can be 
configured.


2.   Can we support multiple ADs? Hadoop users might come from more than 
ONE AD in big org.


3.   Is there any technical  issue not to support LDAPs like OpenLDAP? In 
my understanding, one possible difficulity might be that it's not easy to 
extract common
group lookup mechanism with common filters/configurations both applied for AD 
and OpenLDAP like, right?

I'm wondering if these are just limits for current implementation, and if so if 
we need to improve that. Might the community has already been going for that?

RE: Questions and possible improvements for LdapGroupsMapping

2012-10-18 Thread Zheng, Kai

JIRA is opened for this:

https://issues.apache.org/jira/browse/HADOOP-8943

-Original Message-
From: Zheng, Kai [mailto:kai.zh...@intel.com] 
Sent: Friday, October 19, 2012 10:17 AM
To: common-dev@hadoop.apache.org; na...@cloudera.com
Subject: RE: Questions and possible improvements for LdapGroupsMapping

Just got reply from user mailing list from Natty, as follows.
And I'd like to discuss further here since it's more appropriate.

Hi Natty,

1. It's great idea that we just write a customized group mapping service to 
handle different mapping for AD user and service principal; 2. OK, I'd like to 
improve it to support multiple ADs; 3. Great to know it. I will try the group 
mapping with OpenLDAP making use of the current configuration properties.

And further, to support to do different mapping for different user/principal, 
and support multiple ADs, we also need extra properties to configure what kind 
of user/principal (regarding domain/realm is an option) should use which group 
mapping mechanism.

To improve such things, I'm going to fire a JIRA for these. It would be great 
if you could continue to  comment on it. 

Thanks  regards,
Kai

From: Jonathan Natkins [mailto:na...@cloudera.com]
Sent: Friday, October 19, 2012 8:58 AM
To: u...@hadoop.apache.org
Subject: Re: Secure hadoop and group permission on HDFS

Hi Kai,

1. To the best of my knowledge, you can only use one group mapping service at a 
time. In order to do what you're suggesting, you'd have to write a customized 
group mapping service.

2. Currently multiple ADs are not supported, but it's certainly an improvement 
that could be made.

3. The LdapGroupsMapping already supports OpenLDAP. It's pretty heavily 
configurable for the purpose of supporting multiple types of LDAP 
implementations. The defaults just happen to be geared towards Active Directory.

Thanks,
Natty

-Original Message-
From: Zheng, Kai [mailto:kai.zh...@intel.com]
Sent: Friday, October 19, 2012 8:32 AM
To: common-dev@hadoop.apache.org
Subject: Questions and possible improvements for LdapGroupsMapping

Hi All,

Regarding LdapGroupsMapping, I have following questions:

1.   Is it possible to use ShellBasedUnixGroupsMapping for Hadoop service 
principals/users, and LdapGroupsMapping for end user accounts?
In our  environment, normal end users (along with their groups info) for Hadoop 
cluster are from AD, and for them we prefer to use the ldap mapping; but for 
hdfs/mapred service principals, the default shell based one is enough, and we 
don't want to create the user/group entries in AD just for that.
Seems in current implementation, only one user group mapping provider can be 
configured.

2.   Can we support multiple ADs? Hadoop users might come from more than 
ONE AD in big org.

3.   Is there any technical  issue not to support LDAPs like OpenLDAP? In 
my understanding, one possible difficulity might be that it's not easy to 
extract common
group lookup mechanism with common filters/configurations both applied for AD 
and OpenLDAP like, right?

I'm wondering if these are just limits for current implementation, and if so if 
we need to improve that. Might the community has already been going for that?

71 matches

Mail list logo