[jira] [Created] (HADOOP-11662) trunk's CHANGES.txt is missing releases

2015-03-02 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HADOOP-11662:
-

 Summary: trunk's CHANGES.txt is missing releases
 Key: HADOOP-11662
 URL: https://issues.apache.org/jira/browse/HADOOP-11662
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Reporter: Allen Wittenauer


I've been doing some archeological work on the release data.  Looking at trunk, 
it's missing 0.20.205 and all of 1.x.

We should either make the call that we chop it off at a reasonable date or we 
fix the changelog to reflect reality of these releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11663) Remove description about Java 6 from docs

2015-03-02 Thread Masatake Iwasaki (JIRA)
Masatake Iwasaki created HADOOP-11663:
-

 Summary: Remove description about Java 6 from docs
 Key: HADOOP-11663
 URL: https://issues.apache.org/jira/browse/HADOOP-11663
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor


{{hadoop-auth/BuildingIt.md}} has:
{noformat}
Hadoop Auth, Java HTTP SPNEGO - Building It
Requirements
Java 6+
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Looking to a Hadoop 3 release

2015-03-02 Thread Karthik Kambatla
+1

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.


 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.


Guava etc. have been such a pain in the past. Can't wait to have a release
we don't have to worry about what version of dependencies users want to
use.



 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.


Are you saying we can use lambdas without re-writing all of Hadoop in
Scala?



 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities.


Will be glad to help.


 There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.


 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: Looking to a Hadoop 3 release

2015-03-02 Thread Aaron T. Myers
+1, this sounds like a good plan to me.

Thanks a lot for volunteering to take this on, Andrew.

Best,
Aaron

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew



Looking to a Hadoop 3 release

2015-03-02 Thread Andrew Wang
Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew


Re: Looking to a Hadoop 3 release

2015-03-02 Thread Yongjun Zhang
Thanks Andrew for the proposal.

+1, and I will be happy to help.

--Yongjun




On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew



Re: Looking to a Hadoop 3 release

2015-03-02 Thread sanjay Radia
Andrew 
  Thanks for bringing up the issue of moving to Java8. Java8 is important
However, I am not seeing a strong motivation for changing the major number.
We can go to Java8 in  the 2.series. 
The classpath issue for Hadoop-11656 is too minor to force a major number 
change (no pun intended).

Lets separate the issue of Java8 and Hadoop 3.0

sanjay


 On Mar 2, 2015, at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote:
 
 Hi devs,
 
 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.
 
 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.
 
 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.
 
 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.
 
 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.
 
 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.
 
 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.
 
 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.
 
 Best,
 Andrew



Re: Looking to a Hadoop 3 release

2015-03-02 Thread Vinod Kumar Vavilapalli

 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.


Is moving to JDK8 fundamentally different from the move to JDK7? We are moving 
to JDK7 via release 2.7 that I am helping with now.


 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.


Aren't the shell script rewrite changes supposed to be compatible?

Thanks,
+Vinod



Re: Looking to a Hadoop 3 release

2015-03-02 Thread Robert Kanter
+1  Happy to help too

On Mon, Mar 2, 2015 at 3:57 PM, Yongjun Zhang yzh...@cloudera.com wrote:

 Thanks Andrew for the proposal.

 +1, and I will be happy to help.

 --Yongjun




 On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
 wrote:

  Hi devs,
 
  It's been a year and a half since 2.x went GA, and I think we're about
 due
  for a 3.x release.
  Notably, there are two incompatible changes I'd like to call out, that
 will
  have a tremendous positive impact for our users.
 
  First, classpath isolation being done at HADOOP-11656, which has been a
  long-standing request from many downstreams and Hadoop users.
 
  Second, bumping the source and target JDK version to JDK8 (related to
  HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
  months from now). In the past, we've had issues with our dependencies
  discontinuing support for old JDKs, so this will future-proof us.
 
  Between the two, we'll also have quite an opportunity to clean up and
  upgrade our dependencies, another common user and developer request.
 
  I'd like to propose that we start rolling a series of monthly-ish series
 of
  3.0 alpha releases ASAP, with myself volunteering to take on the RM and
  other cat herding responsibilities. There are already quite a few changes
  slated for 3.0 besides the above (for instance the shell script rewrite)
 so
  there's already value in a 3.0 alpha, and the more time we give
 downstreams
  to integrate, the better.
 
  This opens up discussion about inclusion of other changes, but I'm hoping
  to freeze incompatible changes after maybe two alphas, do a beta (with no
  further incompat changes allowed), and then finally a 3.x GA. For those
  keeping track, that means a 3.x GA in about four months.
 
  I would also like to stress though that this is not intended to be a big
  bang release. For instance, it would be great if we could maintain wire
  compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
  branch-2 and branch-3 similar also makes backports easier, since we're
  likely maintaining 2.x for a while yet.
 
  Please let me know any comments / concerns related to the above. If
 people
  are friendly to the idea, I'd like to cut a branch-3 and start working on
  the first alpha.
 
  Best,
  Andrew
 



RE: Looking to a Hadoop 3 release

2015-03-02 Thread Zheng, Kai
JDK8 support is in the consideration, looks like many issues were reported and 
resolved already.

https://issues.apache.org/jira/browse/HADOOP-11090


-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for 
a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will 
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a 
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to 
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months 
from now). In the past, we've had issues with our dependencies discontinuing 
support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade 
our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other 
cat herding responsibilities. There are already quite a few changes slated for 
3.0 besides the above (for instance the shell script rewrite) so there's 
already value in a 3.0 alpha, and the more time we give downstreams to 
integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to 
freeze incompatible changes after maybe two alphas, do a beta (with no further 
incompat changes allowed), and then finally a 3.x GA. For those keeping track, 
that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang 
release. For instance, it would be great if we could maintain wire 
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely 
maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are 
friendly to the idea, I'd like to cut a branch-3 and start working on the first 
alpha.

Best,
Andrew


RE: 2.7 status

2015-03-02 Thread Zheng, Kai
Is it interested to get the following issues in the release ? Thanks !

HADOOP-10670
HADOOP-10671

Regards,
Kai

-Original Message-
From: Yongjun Zhang [mailto:yzh...@cloudera.com] 
Sent: Monday, March 02, 2015 4:46 AM
To: hdfs-...@hadoop.apache.org
Cc: Vinod Kumar Vavilapalli; Hadoop Common; mapreduce-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: 2.7 status

Hi,

Thanks for working on 2.7 release.

Currently the fallback from KerberosAuthenticator to PseudoAuthenticator  is 
enabled by default in a hardcoded way. HAOOP-10895 changes the default and 
requires applications (such as oozie) to set a config property or call an API 
to enable the fallback.

This jira has been reviewed, and almost ready to get in. However, there is a 
concern that we have to change the relevant applications. Please see my comment 
here:

https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14321823page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14321823

Any of your comments will be highly appreciated. This jira was postponed from 
2.6. I think it should be no problem to skip 2.7. But your comments would help 
us to decide what to do with this jira for future releases.

Thanks.

--Yongjun


On Sun, Mar 1, 2015 at 11:58 AM, Arun Murthy a...@hortonworks.com wrote:

 Sounds good, thanks for the help Vinod!

 Arun

 
 From: Vinod Kumar Vavilapalli
 Sent: Sunday, March 01, 2015 11:43 AM
 To: Hadoop Common; Jason Lowe; Arun Murthy
 Subject: Re: 2.7 status

 Agreed. How about we roll an RC end of this week? As a Java 7+ release 
 with features, patches that already got in?

 Here's a filter tracking blocker tickets - 
 https://issues.apache.org/jira/issues/?filter=12330598. Nine open now.

 +Arun
 Arun, I'd like to help get 2.7 out without further delay. Do you mind 
 me taking over release duties?

 Thanks,
 +Vinod
 
 From: Jason Lowe jl...@yahoo-inc.com.INVALID
 Sent: Friday, February 13, 2015 8:11 AM
 To: common-dev@hadoop.apache.org
 Subject: Re: 2.7 status

 I'd like to see a 2.7 release sooner than later.  It has been almost 3 
 months since Hadoop 2.6 was released, and there have already been 634 
 JIRAs committed to 2.7.  That's a lot of changes waiting for an official 
 release.

 https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2C
 hdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resoluti
 on%3DFixed
 Jason

   From: Sangjin Lee sj...@apache.org
  To: common-dev@hadoop.apache.org common-dev@hadoop.apache.org
  Sent: Tuesday, February 10, 2015 1:30 PM
  Subject: 2.7 status

 Folks,

 What is the current status of the 2.7 release? I know initially it 
 started out as a java-7 only release, but looking at the JIRAs that 
 is very much not the case.

 Do we have a certain timeframe for 2.7 or is it time to discuss it?

 Thanks,
 Sangjin




Re: 2.7 status

2015-03-02 Thread Vinod Kumar Vavilapalli
Seems like there is already some action on the JIRA. Can you please ping the 
previous reviewers on JIRA to make progress?

Thanks,
+Vinod

On Mar 1, 2015, at 12:45 PM, Yongjun Zhang 
yzh...@cloudera.commailto:yzh...@cloudera.com wrote:

Hi,

Thanks for working on 2.7 release.

Currently the fallback from KerberosAuthenticator to PseudoAuthenticator  is 
enabled by default in a hardcoded way. HAOOP-10895 changes the default and 
requires applications (such as oozie) to set a config property or call an API 
to enable the fallback.

This jira has been reviewed, and almost ready to get in. However, there is a 
concern that we have to change the relevant applications. Please see my comment 
here:

https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14321823page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14321823

Any of your comments will be highly appreciated. This jira was postponed from 
2.6. I think it should be no problem to skip 2.7. But your comments would help 
us to decide what to do with this jira for future releases.

Thanks.

--Yongjun


On Sun, Mar 1, 2015 at 11:58 AM, Arun Murthy 
a...@hortonworks.commailto:a...@hortonworks.com wrote:
Sounds good, thanks for the help Vinod!

Arun


From: Vinod Kumar Vavilapalli
Sent: Sunday, March 01, 2015 11:43 AM
To: Hadoop Common; Jason Lowe; Arun Murthy
Subject: Re: 2.7 status

Agreed. How about we roll an RC end of this week? As a Java 7+ release with 
features, patches that already got in?

Here's a filter tracking blocker tickets - 
https://issues.apache.org/jira/issues/?filter=12330598. Nine open now.

+Arun
Arun, I'd like to help get 2.7 out without further delay. Do you mind me taking 
over release duties?

Thanks,
+Vinod

From: Jason Lowe 
jl...@yahoo-inc.com.INVALIDmailto:jl...@yahoo-inc.com.INVALID
Sent: Friday, February 13, 2015 8:11 AM
To: common-dev@hadoop.apache.orgmailto:common-dev@hadoop.apache.org
Subject: Re: 2.7 status

I'd like to see a 2.7 release sooner than later.  It has been almost 3 months 
since Hadoop 2.6 was released, and there have already been 634 JIRAs committed 
to 2.7.  That's a lot of changes waiting for an official release.
https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2Chdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resolution%3DFixed
Jason

  From: Sangjin Lee sj...@apache.orgmailto:sj...@apache.org
 To: common-dev@hadoop.apache.orgmailto:common-dev@hadoop.apache.org 
common-dev@hadoop.apache.orgmailto:common-dev@hadoop.apache.org
 Sent: Tuesday, February 10, 2015 1:30 PM
 Subject: 2.7 status

Folks,

What is the current status of the 2.7 release? I know initially it started
out as a java-7 only release, but looking at the JIRAs that is very much
not the case.

Do we have a certain timeframe for 2.7 or is it time to discuss it?

Thanks,
Sangjin





Re: DISCUSSION: Patch commit criteria.

2015-03-02 Thread Karthik Kambatla
On Mon, Mar 2, 2015 at 11:29 AM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 We always needed another committer's +1 even if it isn't that clear in the
 bylaws. In the minimum, we should codify this in the bylaws to avoid stuff
 like people committing their own patches.

 Regarding trivial changes, I always distinguish between trivial *patches*
 and trivial changes to *existing* patches. Patches even if trivial need to
 be +1ed by another committer. OTOH, many a times, for patches that are
 extensively reviewed, potentially for months on, I sometimes end up making
 a small javadoc/documentation change in the last version of patch before
 committing. It just avoids one more cycle and more delay. It's hard to
 codify this distinction though.


In the past, I have made trivial (new lines, indentation, etc.) changes to
well reviewed patches before committing. Even then, I believe we should
upload the updated patch or the diff of trivial changes and wait for
someone else (potentially a non-committer contributor) to quickly check to
avoid making silly mistakes.


 Thanks
 +Vinod

 On Feb 27, 2015, at 1:04 PM, Konstantin Shvachko shv.had...@gmail.com
 wrote:

  There were discussions on several jiras and threads recently about how
 RTC
  actually works in Hadoop.
  My opinion has always been that for a patch to be committed it needs an
  approval  (+1) of at least one committer other than the author and no
 -1s.
  The Bylaws seem to be stating just that:
  Consensus approval of active committers, but with a minimum of one +1.
  See the full version under Actions / Code Change
  http://hadoop.apache.org/bylaws.html#Decision+Making
 
  Turned out people have different readings of that part of Bylaws, and
  different opinions on how RTC should work in different cases. Some of the
  questions that were raised include:
  - Should we clarify the Code Change decision making clause in Bylaws?
  - Should there be a relaxed criteria for trivial changes?
  - Can a patch be committed if approved only by a non committer?
  - Can a patch be committed based on self-review by a committer?
  - What is the point for a non-committer to review the patch?
  Creating this thread to discuss these (and other that I sure missed)
 issues
  and to combine multiple discussions into one.
 
  My personal opinion we should just stick to the tradition. Good or bad,
 it
  worked for this community so far.
  I think most of the discrepancies arise from the fact that reviewers are
  hard to find. May be this should be the focus of improvements rather than
  the RTC rules.
 
  Thanks,
  --Konst




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


RE: Looking to a Hadoop 3 release

2015-03-02 Thread Zheng, Kai
Sorry for the bad. I thought it was sending to my colleagues. 

By the way, for the JDK8 support, we (Intel) would like to investigate further 
and help, thanks.

Regards,
Kai

-Original Message-
From: Zheng, Kai 
Sent: Tuesday, March 03, 2015 8:49 AM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: RE: Looking to a Hadoop 3 release

JDK8 support is in the consideration, looks like many issues were reported and 
resolved already.

https://issues.apache.org/jira/browse/HADOOP-11090


-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for 
a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will 
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a 
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to 
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months 
from now). In the past, we've had issues with our dependencies discontinuing 
support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade 
our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other 
cat herding responsibilities. There are already quite a few changes slated for 
3.0 besides the above (for instance the shell script rewrite) so there's 
already value in a 3.0 alpha, and the more time we give downstreams to 
integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to 
freeze incompatible changes after maybe two alphas, do a beta (with no further 
incompat changes allowed), and then finally a 3.x GA. For those keeping track, 
that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang 
release. For instance, it would be great if we could maintain wire 
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely 
maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are 
friendly to the idea, I'd like to cut a branch-3 and start working on the first 
alpha.

Best,
Andrew


[jira] [Resolved] (HADOOP-11449) [JDK8] Cannot build on Windows: error: unexpected end tag: /ul

2015-03-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-11449.
-
   Resolution: Fixed
Fix Version/s: 2.7.0
 Assignee: Chris Nauroth  (was: Anu Engineer)

 [JDK8] Cannot build on Windows: error: unexpected end tag: /ul
 

 Key: HADOOP-11449
 URL: https://issues.apache.org/jira/browse/HADOOP-11449
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Affects Versions: site, 3.0.0, trunk-win, 2.6.0
 Environment: jdk8
 Windows 8.1 x64
 java version 1.8.0_25
 Java(TM) SE Runtime Environment (build 1.8.0_25-b18)
 Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
 Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; 
 2014-12-15T04:29:2 3)
Reporter: Alec Taylor
Assignee: Chris Nauroth
  Labels: build, easyfix, hadoop, windows
 Fix For: 2.7.0

   Original Estimate: 2h
  Remaining Estimate: 2h

 Tried on hadoop-2.6.0-src, branch-2.5 and branch-trunk-win. All gave this 
 error:
 ```
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-javadoc-plugin:2.8.1:jar (module-javadocs) on 
 project hadoop-annotations: MavenReportException: Error while creating 
 archive:
 [ERROR] Exit code: 1 - 
 E:\Projects\hadoop-common\hadoop-common-project\hadoop-annotations\src\main\java\org\apache\hadoop\classification\InterfaceStability.java:27:
  error: unexpected end tag: /ul
 [ERROR] * /ul
 [ERROR] ^
 [ERROR] 
 [ERROR] Command line was: C:\Program 
 Files\Java\jdk1.8.0_25\jre\..\bin\javadoc.exe @options @packages
 [ERROR] 
 [ERROR] Refer to the generated Javadoc files in 
 'E:\Projects\hadoop-common\hadoop-common-project\hadoop-annotations\target' 
 dir.
 [ERROR] - [Help 1]
 [ERROR] 
 [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
 switch.
 [ERROR] Re-run Maven using the -X switch to enable full debug logging.
 [ERROR] 
 [ERROR] For more information about the errors and possible solutions, please 
 read the following articles:
 [ERROR] [Help 1] 
 http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
 [ERROR] 
 [ERROR] After correcting the problems, you can resume the build with the 
 command
 [ERROR]   mvn goals -rf :hadoop-annotations
 ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: Looking to a Hadoop 3 release

2015-03-02 Thread Liu, Yi A
+1

Regards,
Yi Liu

-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for 
a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will 
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a 
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to 
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months 
from now). In the past, we've had issues with our dependencies discontinuing 
support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade 
our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other 
cat herding responsibilities. There are already quite a few changes slated for 
3.0 besides the above (for instance the shell script rewrite) so there's 
already value in a 3.0 alpha, and the more time we give downstreams to 
integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to 
freeze incompatible changes after maybe two alphas, do a beta (with no further 
incompat changes allowed), and then finally a 3.x GA. For those keeping track, 
that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang 
release. For instance, it would be great if we could maintain wire 
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely 
maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are 
friendly to the idea, I'd like to cut a branch-3 and start working on the first 
alpha.

Best,
Andrew


Re: Looking to a Hadoop 3 release

2015-03-02 Thread Chen He
+1 non-binding

It is a nice to have hadoop 3.x release. My honor to help.

Regards!

Chen

On Mon, Mar 2, 2015 at 4:58 PM, Zheng, Kai kai.zh...@intel.com wrote:

 Sorry for the bad. I thought it was sending to my colleagues.

 By the way, for the JDK8 support, we (Intel) would like to investigate
 further and help, thanks.

 Regards,
 Kai

 -Original Message-
 From: Zheng, Kai
 Sent: Tuesday, March 03, 2015 8:49 AM
 To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
 hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
 Subject: RE: Looking to a Hadoop 3 release

 JDK8 support is in the consideration, looks like many issues were reported
 and resolved already.

 https://issues.apache.org/jira/browse/HADOOP-11090


 -Original Message-
 From: Andrew Wang [mailto:andrew.w...@cloudera.com]
 Sent: Tuesday, March 03, 2015 7:20 AM
 To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
 hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
 Subject: Looking to a Hadoop 3 release

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that
 will have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew



Re: Looking to a Hadoop 3 release

2015-03-02 Thread Arun Murthy
Andrew,

 Thanks for bringing up this discussion.

 I'm a little puzzled for I feel like we are rehashing the same discussion from 
last year - where we agreed on a different course of action w.r.t switch to 
JDK7.

 IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly 
for users such as Yahoo/Twitter/eBay who have several clusters between which 
compatibility is paramount. 

 Now, breaking compatibility is perfectly fine over time where there is 
sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). 

 However, I'm struggling to quantify the benefit of hadoop-3 for users for the 
cost of the breakage.

 Given that we already agreed to put in JDK7 in 2.7, and that the classpath is 
a fairly minor irritant given some existing solutions (e.g. a new default 
classloader), how do you quantify the benefit for users?

 We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome 
to run the RM role for that release.

 Furthermore, I'm really concerned that this will be used as an opportunity to 
further break compat in more egregious ways. 

 Also, are you foreseeing more compat breaks? OTOH, if we all agree that we 
should absolutely prevent compat breakages such as the client-server wire 
protocol, I feel the point of a major release is kinda lost.

 Overall, my biggest concern is the compatibility story vis-a-vis the benefit. 

 Thoughts?

thanks,
Arun


From: Andrew Wang andrew.w...@cloudera.com
Sent: Monday, March 02, 2015 3:19 PM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew


Re: Looking to a Hadoop 3 release

2015-03-02 Thread Steve Loughran
I'm +1 for a migrate to Java 8 as soon as possible.

That's branch-2  trunk, as having them on the same language level makes 
cherrypicking stuff off trunk possible. That's particularly the case for Java 8 
as it is the first major change to the language since Java 5.

w.r.t shipping trunk as 3.x, it's going to take longer than planned. Hopefully 
not as long as the 2.x release process, but you never know.   Which means I 
expect some more Hadoop 2 releases this year. We need to make the jump there 
too, get 2.7 out the door and include a roadmap in there to when the java 8+ 
only event happens across the codebase.


-Steve


ps. for anyone who wants a pure java8 build today, set -Djavac.version=1.8 on 
the classpath of a maven build. Last time I tried there were some (minor) bits 
of YARN that wouldn't compile...




On 2 March 2015 at 18:31:00, Arun Murthy 
(a...@hortonworks.commailto:a...@hortonworks.com) wrote:

Andrew,

Thanks for bringing up this discussion.

I'm a little puzzled for I feel like we are rehashing the same discussion from 
last year - where we agreed on a different course of action w.r.t switch to 
JDK7.

IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly 
for users such as Yahoo/Twitter/eBay who have several clusters between which 
compatibility is paramount.

Now, breaking compatibility is perfectly fine over time where there is 
sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1).

However, I'm struggling to quantify the benefit of hadoop-3 for users for the 
cost of the breakage.

Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a 
fairly minor irritant given some existing solutions (e.g. a new default 
classloader), how do you quantify the benefit for users?

We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome 
to run the RM role for that release.

Furthermore, I'm really concerned that this will be used as an opportunity to 
further break compat in more egregious ways.

Also, are you foreseeing more compat breaks? OTOH, if we all agree that we 
should absolutely prevent compat breakages such as the client-server wire 
protocol, I feel the point of a major release is kinda lost.

Overall, my biggest concern is the compatibility story vis-a-vis the benefit.

Thoughts?

thanks,
Arun


From: Andrew Wang andrew.w...@cloudera.com
Sent: Monday, March 02, 2015 3:19 PM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew


Re: 2.7 status

2015-03-02 Thread Vinod Kumar Vavilapalli
Kai, please ping the reviewers that were already looking at your patches 
before. If the patches go in by end of this week, we can include them.

Thanks,
+Vinod

On Mar 2, 2015, at 7:04 PM, Zheng, Kai kai.zh...@intel.com wrote:

 Is it interested to get the following issues in the release ? Thanks !
 
 HADOOP-10670
 HADOOP-10671
 
 Regards,
 Kai
 
 -Original Message-
 From: Yongjun Zhang [mailto:yzh...@cloudera.com] 
 Sent: Monday, March 02, 2015 4:46 AM
 To: hdfs-...@hadoop.apache.org
 Cc: Vinod Kumar Vavilapalli; Hadoop Common; mapreduce-...@hadoop.apache.org; 
 yarn-...@hadoop.apache.org
 Subject: Re: 2.7 status
 
 Hi,
 
 Thanks for working on 2.7 release.
 
 Currently the fallback from KerberosAuthenticator to PseudoAuthenticator  is 
 enabled by default in a hardcoded way. HAOOP-10895 changes the default and 
 requires applications (such as oozie) to set a config property or call an API 
 to enable the fallback.
 
 This jira has been reviewed, and almost ready to get in. However, there is 
 a concern that we have to change the relevant applications. Please see my 
 comment here:
 
 https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14321823page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14321823
 
 Any of your comments will be highly appreciated. This jira was postponed from 
 2.6. I think it should be no problem to skip 2.7. But your comments would 
 help us to decide what to do with this jira for future releases.
 
 Thanks.
 
 --Yongjun
 
 
 On Sun, Mar 1, 2015 at 11:58 AM, Arun Murthy a...@hortonworks.com wrote:
 
 Sounds good, thanks for the help Vinod!
 
 Arun
 
 
 From: Vinod Kumar Vavilapalli
 Sent: Sunday, March 01, 2015 11:43 AM
 To: Hadoop Common; Jason Lowe; Arun Murthy
 Subject: Re: 2.7 status
 
 Agreed. How about we roll an RC end of this week? As a Java 7+ release 
 with features, patches that already got in?
 
 Here's a filter tracking blocker tickets - 
 https://issues.apache.org/jira/issues/?filter=12330598. Nine open now.
 
 +Arun
 Arun, I'd like to help get 2.7 out without further delay. Do you mind 
 me taking over release duties?
 
 Thanks,
 +Vinod
 
 From: Jason Lowe jl...@yahoo-inc.com.INVALID
 Sent: Friday, February 13, 2015 8:11 AM
 To: common-dev@hadoop.apache.org
 Subject: Re: 2.7 status
 
 I'd like to see a 2.7 release sooner than later.  It has been almost 3 
 months since Hadoop 2.6 was released, and there have already been 634 
 JIRAs committed to 2.7.  That's a lot of changes waiting for an official 
 release.
 
 https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2C
 hdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resoluti
 on%3DFixed
 Jason
 
  From: Sangjin Lee sj...@apache.org
 To: common-dev@hadoop.apache.org common-dev@hadoop.apache.org
 Sent: Tuesday, February 10, 2015 1:30 PM
 Subject: 2.7 status
 
 Folks,
 
 What is the current status of the 2.7 release? I know initially it 
 started out as a java-7 only release, but looking at the JIRAs that 
 is very much not the case.
 
 Do we have a certain timeframe for 2.7 or is it time to discuss it?
 
 Thanks,
 Sangjin
 
 



Re: Looking to a Hadoop 3 release

2015-03-02 Thread Jean-Baptiste Onofré

+1

It sounds like a good idea, especially regarding JDK.

Regards
JB

On 03/03/2015 12:19 AM, Andrew Wang wrote:

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Looking to a Hadoop 3 release

2015-03-02 Thread Andrew Wang
 Thanks as always for the feedback everyone. Some inline comments to Arun's
email, as his were the most extensive:


  Given that we already agreed to put in JDK7 in 2.7, and that the
 classpath is a fairly minor irritant given some existing solutions (e.g. a
 new default classloader), how do you quantify the benefit for users?

 I looked at our thread on this topic from last time, and we (meaning at
least myself and Tucu) agreed to a one-time exception to the JDK7 bump in
2.x for practical reasons. We waited for so long that we had some assurance
JDK6 was on the outs. Multiple distros also already had bumped their min
version to JDK7. This is not true this time around. Bumping the JDK version
is hugely impactful on the end user, and my email on the earlier thread
still reflects my thoughts on JDK compatibility:

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3CCAGB5D2a5fEDfBApQyER_zyhc8a4Xd_ea1wJSsxxkiAiDZO9%2BNg%40mail.gmail.com%3E

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Having the freedom to upgrade our dependencies at will would also be a big
win for us as developers.

 We could just do JDK8 in hadoop-2.10 or some such, you are definitely
 welcome to run the RM role for that release.

  Furthermore, I'm really concerned that this will be used as an
 opportunity to further break compat in more egregious ways.

  Also, are you foreseeing more compat breaks? OTOH, if we all agree that
 we should absolutely prevent compat breakages such as the client-server
 wire protocol, I feel the point of a major release is kinda lost.


Right now, the incompatible changes would be JDK8, classpath isolation, and
whatever is already in trunk. I can audit these existing trunk changes when
branch-3 is cut.

I would like to keep this list as short as possible, to preserve wire
compat and rolling upgrade. As far as major releases go, this is not one to
be scared of. However, since it's incompatible, it still needs that major
version bump.

Best,
Andrew

P.S. Vinod, the shell script rewrite is incompatible. Allen intentionally
excluded it from branch-2 for this reason.


Re: Looking to a Hadoop 3 release

2015-03-02 Thread Konstantin Shvachko
Andrew,

Hadoop 3 seems in general like a good idea to me.
1. I did not understand if you propose to release 3.0 instead of 2.7 or in
addition?
   I think 2.7 is needed at least as a stabilization step for the 2.x line.

2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
other versions. If that somehow beneficial for commercial vendors, which I
don't see how, for the community it was proven to be very disruptive. Would
be really good to avoid it this time.

3. Could we release Hadoop 3 directly from trunk? With a proper feature
freeze in advance. Current trunk is in the best working condition I've seen
in years - much better, than when hadoop-2 was coming to life. It could
make a good alpha.
I believe we can start planning 3.0 from trunk right after 2.7 is out.

Thanks,
--Konst

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew



Re: DISCUSSION: Patch commit criteria.

2015-03-02 Thread Konstantin Shvachko
Vinod,
I agree that triviality is hard to define and we should not add things that
can be interpreted multiple ways to the bylaws.
If something is not quite clear in the bylaws, it would make sense to have
a proposal of new phrasing, so that we could discuss it here and call a
vote upon reaching an agreement.
Thanks,
--Konst

On Mon, Mar 2, 2015 at 11:29 AM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 We always needed another committer's +1 even if it isn't that clear in the
 bylaws. In the minimum, we should codify this in the bylaws to avoid stuff
 like people committing their own patches.

 Regarding trivial changes, I always distinguish between trivial *patches*
 and trivial changes to *existing* patches. Patches even if trivial need to
 be +1ed by another committer. OTOH, many a times, for patches that are
 extensively reviewed, potentially for months on, I sometimes end up making
 a small javadoc/documentation change in the last version of patch before
 committing. It just avoids one more cycle and more delay. It's hard to
 codify this distinction though.

 Thanks
 +Vinod

 On Feb 27, 2015, at 1:04 PM, Konstantin Shvachko shv.had...@gmail.com
 wrote:

  There were discussions on several jiras and threads recently about how
 RTC
  actually works in Hadoop.
  My opinion has always been that for a patch to be committed it needs an
  approval  (+1) of at least one committer other than the author and no
 -1s.
  The Bylaws seem to be stating just that:
  Consensus approval of active committers, but with a minimum of one +1.
  See the full version under Actions / Code Change
  http://hadoop.apache.org/bylaws.html#Decision+Making
 
  Turned out people have different readings of that part of Bylaws, and
  different opinions on how RTC should work in different cases. Some of the
  questions that were raised include:
  - Should we clarify the Code Change decision making clause in Bylaws?
  - Should there be a relaxed criteria for trivial changes?
  - Can a patch be committed if approved only by a non committer?
  - Can a patch be committed based on self-review by a committer?
  - What is the point for a non-committer to review the patch?
  Creating this thread to discuss these (and other that I sure missed)
 issues
  and to combine multiple discussions into one.
 
  My personal opinion we should just stick to the tradition. Good or bad,
 it
  worked for this community so far.
  I think most of the discrepancies arise from the fact that reviewers are
  hard to find. May be this should be the focus of improvements rather than
  the RTC rules.
 
  Thanks,
  --Konst




Build failed in Jenkins: Hadoop-common-trunk-Java8 #122

2015-03-02 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-common-trunk-Java8/122/changes

Changes:

[aajisaka] HDFS-5853. Add hadoop.user.group.metrics.percentiles.intervals to 
hdfs-default.xml (aajisaka)

[ozawa] HADOOP-11634. Description of webhdfs' principal/keytab should switch 
places each other. Contributed by Brahma Reddy Battula.

[aajisaka] HADOOP-11657. Align the output of `hadoop fs -du` to be more 
Unix-like. (aajisaka)

[aajisaka] HADOOP-11615. Update ServiceLevelAuth.md for YARN. Contributed by 
Brahma Reddy Battula.

[szetszwo] HDFS-7439. Add BlockOpResponseProto's message to the exception 
messages.  Contributed by Takanobu Asanuma

--
[...truncated 9085 lines...]
[WARNING] ^
[WARNING] 
https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/XDR.java:223:
 warning: no @param for len
[WARNING] public static boolean verifyLength(XDR xdr, int len) {
[WARNING] ^
[WARNING] 
https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/XDR.java:223:
 warning: no @return
[WARNING] public static boolean verifyLength(XDR xdr, int len) {
[WARNING] ^
[WARNING] 
https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/XDR.java:235:
 warning: no @param for request
[WARNING] public static ChannelBuffer writeMessageTcp(XDR request, boolean 
last) {
[WARNING] ^
[WARNING] 
https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/XDR.java:235:
 warning: no @param for last
[WARNING] public static ChannelBuffer writeMessageTcp(XDR request, boolean 
last) {
[WARNING] ^
[WARNING] 
https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/XDR.java:235:
 warning: no @return
[WARNING] public static ChannelBuffer writeMessageTcp(XDR request, boolean 
last) {
[WARNING] ^
[WARNING] 
https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/XDR.java:247:
 warning: no @param for response
[WARNING] public static ChannelBuffer writeMessageUdp(XDR response) {
[WARNING] ^
[WARNING] 
https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/XDR.java:247:
 warning: no @return
[WARNING] public static ChannelBuffer writeMessageUdp(XDR response) {
[WARNING] ^
[WARNING] 
https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/security/Credentials.java:51:
 warning: no @param for cred
[WARNING] public static void writeFlavorAndCredentials(Credentials cred, XDR 
xdr) {
[WARNING] ^
[WARNING] 
https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/security/Credentials.java:51:
 warning: no @param for xdr
[WARNING] public static void writeFlavorAndCredentials(Credentials cred, XDR 
xdr) {
[WARNING] ^
[WARNING] 
https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/security/RpcAuthInfo.java:61:
 warning: no @param for xdr
[WARNING] public abstract void read(XDR xdr);
[WARNING] ^
[WARNING] 
https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/security/RpcAuthInfo.java:64:
 warning: no @param for xdr
[WARNING] public abstract void write(XDR xdr);
[WARNING] ^
[WARNING] 
https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/security/SecurityHandler.java:45:
 warning: no @param for request
[WARNING] public XDR unwrap(RpcCall request, byte[] data ) throws IOException {
[WARNING] ^
[INFO] Building jar: 
https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/target/hadoop-nfs-3.0.0-SNAPSHOT-javadoc.jar
[INFO] 
[INFO] --- maven-assembly-plugin:2.4:single (dist) @ hadoop-nfs ---
[INFO] Reading assembly descriptor: 
../../hadoop-assemblies/src/main/resources/assemblies/hadoop-nfs-dist.xml
[WARNING] The following patterns were never triggered in this artifact 
exclusion filter:
o  'org.apache.hadoop:hadoop-common'
o  'org.apache.hadoop:hadoop-hdfs'
o  'org.hsqldb:hsqldb'

[INFO] Copying files to 
https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/target/hadoop-nfs-3.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] Building Apache Hadoop KMS 3.0.0-SNAPSHOT
[INFO] 
[INFO] 

[jira] [Created] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup

2015-03-02 Thread Gera Shegalov (JIRA)
Gera Shegalov created HADOOP-11659:
--

 Summary: o.a.h.fs.FileSystem.Cache#remove should use a single hash 
map lookup
 Key: HADOOP-11659
 URL: https://issues.apache.org/jira/browse/HADOOP-11659
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 2.6.0
Reporter: Gera Shegalov
Priority: Trivial


The method looks up the same key in the same hash map potentially 3 times
{code}
if (map.containsKey(key)  fs == map.get(key)) {
  map.remove(key)
{code}

Instead it could do a single lookup
{code}
FileSystem cachedFs = map.remove(key);
{code}

and then test cachedFs == fs or something else.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11660) Add support for hardware crc on ARM aarch64 architecture

2015-03-02 Thread Edward Nevill (JIRA)
Edward Nevill created HADOOP-11660:
--

 Summary: Add support for hardware crc on ARM aarch64 architecture
 Key: HADOOP-11660
 URL: https://issues.apache.org/jira/browse/HADOOP-11660
 Project: Hadoop Common
  Issue Type: Improvement
  Components: native
Affects Versions: 3.0.0
 Environment: ARM aarch64 development platform
Reporter: Edward Nevill
Assignee: Edward Nevill
Priority: Minor
 Fix For: 3.0.0


This patch adds support for hardware crc for ARM's new 64 bit architecture

The patch is completely conditionalized on __aarch64__

I have only added support for the non pipelined version as I benchmarked the 
pipelined version on aarch64 and it showed no performance improvement.

The aarch64 version supports both Castagnoli and Zlib CRCs as both of these are 
supported on ARM aarch64 hardwre.

To benchmark this I modified the test_bulk_crc32 test to print out the time 
taken to CRC a 1MB dataset 1000 times.

Before:

CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55

After:

CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57

So this represents a 5X performance improvement on raw CRC calculation.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: DISCUSSION: Patch commit criteria.

2015-03-02 Thread Colin P. McCabe
I agree with Andrew and Konst here.  I don't think the language is
unclear in the rule, either... consensus with a minimum of one +1
clearly indicates that _other people_ are involved, not just one
person.

I would also mention that we created the branch committer role
specifically to make it easier to do rapid development on a new
feature.  Branch committers can commit patches to a branch without any
full committers involved at all.  When the branch is ready to merge,
the community can review it and give feedback.

best,
Colin

On Fri, Feb 27, 2015 at 2:27 PM, Andrew Wang andrew.w...@cloudera.com wrote:
 I have the same interpretation as Konst on this. +1 from at least one
 committer other than the author, no -1s.

 I don't think there should be an exclusion for trivial patches, since the
 definition of trivial is subjective. The exception here is CHANGES.txt,
 which is something we really should get rid of.

 Non-committers are still strongly encouraged to review patches even if
 their +1 is not binding. Additional reviews improve code quality. Also,
 when it comes to choosing new committers, one of the primary things I look
 for is a history of quality code reviews.

 Best,
 Andrew

 On Fri, Feb 27, 2015 at 1:04 PM, Konstantin Shvachko shv.had...@gmail.com
 wrote:

 There were discussions on several jiras and threads recently about how RTC
 actually works in Hadoop.
 My opinion has always been that for a patch to be committed it needs an
 approval  (+1) of at least one committer other than the author and no -1s.
 The Bylaws seem to be stating just that:
 Consensus approval of active committers, but with a minimum of one +1.
 See the full version under Actions / Code Change
 http://hadoop.apache.org/bylaws.html#Decision+Making

 Turned out people have different readings of that part of Bylaws, and
 different opinions on how RTC should work in different cases. Some of the
 questions that were raised include:
  - Should we clarify the Code Change decision making clause in Bylaws?
  - Should there be a relaxed criteria for trivial changes?
  - Can a patch be committed if approved only by a non committer?
  - Can a patch be committed based on self-review by a committer?
  - What is the point for a non-committer to review the patch?
 Creating this thread to discuss these (and other that I sure missed) issues
 and to combine multiple discussions into one.

 My personal opinion we should just stick to the tradition. Good or bad, it
 worked for this community so far.
 I think most of the discrepancies arise from the fact that reviewers are
 hard to find. May be this should be the focus of improvements rather than
 the RTC rules.

 Thanks,
 --Konst



Re: TimSort bug and its workaround

2015-03-02 Thread Colin P. McCabe
Thanks for bringing this up.  If you can find any place where an array
might realistically be larger than 67 million elements, then I guess
file a JIRA for it.  Also this array needs to be of objects, not of
primitives (quicksort is used for those in jdk7, apparently).  I can't
think of any such place off the top of my head, but I might be missing
something.

Potentially we could also file a JIRA just as a place to gather the
discussion, even if the only outcome is a release note.

best,
Colin

On Thu, Feb 26, 2015 at 12:16 AM, Tsuyoshi Ozawa oz...@apache.org wrote:
 Maybe we should discuss whether the elements of array can be larger
 than 67108864 in our use cases - e.g. FairScheduler uses
 Collection.sort(), but the number of job isn't larger than 67108864 in
 many use cases, so we can keep using it. It's also reasonable that we
 choose to use safe algorithms for stability.

 Thanks,
 - Tsuyoshi

 On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa oz...@apache.org wrote:
 Hi hadoop developers,

 Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
  is reported. How can we deal with this problem?

 http://envisage-project.eu/timsort-specification-and-verification/
 https://bugs.openjdk.java.net/browse/JDK-8072909

 The bug causes ArrayIndexOutOfBoundsException if the number of element
 is larger than 67108864.

 We use the sort method at 77 places at least.
 find . -name *.java | xargs grep Collections.sort  | wc -l
 77

 One reasonable workaround is to set
 java.util.Arrays.useLegacyMergeSort() by default.

 Thanks,
 - Tsuyoshi


Re: DISCUSSION: Patch commit criteria.

2015-03-02 Thread Vinod Kumar Vavilapalli
We always needed another committer's +1 even if it isn't that clear in the 
bylaws. In the minimum, we should codify this in the bylaws to avoid stuff like 
people committing their own patches.

Regarding trivial changes, I always distinguish between trivial *patches* and 
trivial changes to *existing* patches. Patches even if trivial need to be +1ed 
by another committer. OTOH, many a times, for patches that are extensively 
reviewed, potentially for months on, I sometimes end up making a small 
javadoc/documentation change in the last version of patch before committing. It 
just avoids one more cycle and more delay. It's hard to codify this distinction 
though.

Thanks
+Vinod

On Feb 27, 2015, at 1:04 PM, Konstantin Shvachko shv.had...@gmail.com wrote:

 There were discussions on several jiras and threads recently about how RTC
 actually works in Hadoop.
 My opinion has always been that for a patch to be committed it needs an
 approval  (+1) of at least one committer other than the author and no -1s.
 The Bylaws seem to be stating just that:
 Consensus approval of active committers, but with a minimum of one +1.
 See the full version under Actions / Code Change
 http://hadoop.apache.org/bylaws.html#Decision+Making
 
 Turned out people have different readings of that part of Bylaws, and
 different opinions on how RTC should work in different cases. Some of the
 questions that were raised include:
 - Should we clarify the Code Change decision making clause in Bylaws?
 - Should there be a relaxed criteria for trivial changes?
 - Can a patch be committed if approved only by a non committer?
 - Can a patch be committed based on self-review by a committer?
 - What is the point for a non-committer to review the patch?
 Creating this thread to discuss these (and other that I sure missed) issues
 and to combine multiple discussions into one.
 
 My personal opinion we should just stick to the tradition. Good or bad, it
 worked for this community so far.
 I think most of the discrepancies arise from the fact that reviewers are
 hard to find. May be this should be the focus of improvements rather than
 the RTC rules.
 
 Thanks,
 --Konst



[jira] [Created] (HADOOP-11661) Deprecate FileUtil#copyMerge

2015-03-02 Thread Brahma Reddy Battula (JIRA)
Brahma Reddy Battula created HADOOP-11661:
-

 Summary: Deprecate FileUtil#copyMerge
 Key: HADOOP-11661
 URL: https://issues.apache.org/jira/browse/HADOOP-11661
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula


 FileUtil#copyMerge is currently unused in the Hadoop source tree. In branch-1, 
it had been part of the implementation of the hadoop fs -getmerge shell 
command. In branch-2, the code for that shell command was rewritten in a way 
that no longer requires this method.

Please check more details here..

https://issues.apache.org/jira/browse/HADOOP-11392?focusedCommentId=14339336page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14339336



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)