[jira] [Created] (HADOOP-11662) trunk's CHANGES.txt is missing releases
Allen Wittenauer created HADOOP-11662: - Summary: trunk's CHANGES.txt is missing releases Key: HADOOP-11662 URL: https://issues.apache.org/jira/browse/HADOOP-11662 Project: Hadoop Common Issue Type: Bug Components: build Reporter: Allen Wittenauer I've been doing some archeological work on the release data. Looking at trunk, it's missing 0.20.205 and all of 1.x. We should either make the call that we chop it off at a reasonable date or we fix the changelog to reflect reality of these releases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11663) Remove description about Java 6 from docs
Masatake Iwasaki created HADOOP-11663: - Summary: Remove description about Java 6 from docs Key: HADOOP-11663 URL: https://issues.apache.org/jira/browse/HADOOP-11663 Project: Hadoop Common Issue Type: Bug Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Minor {{hadoop-auth/BuildingIt.md}} has: {noformat} Hadoop Auth, Java HTTP SPNEGO - Building It Requirements Java 6+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Looking to a Hadoop 3 release
+1 On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Guava etc. have been such a pain in the past. Can't wait to have a release we don't have to worry about what version of dependencies users want to use. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Are you saying we can use lambdas without re-writing all of Hadoop in Scala? Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. Will be glad to help. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew -- Karthik Kambatla Software Engineer, Cloudera Inc. http://five.sentenc.es
Re: Looking to a Hadoop 3 release
+1, this sounds like a good plan to me. Thanks a lot for volunteering to take this on, Andrew. Best, Aaron On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Looking to a Hadoop 3 release
Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
Thanks Andrew for the proposal. +1, and I will be happy to help. --Yongjun On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
Andrew Thanks for bringing up the issue of moving to Java8. Java8 is important However, I am not seeing a strong motivation for changing the major number. We can go to Java8 in the 2.series. The classpath issue for Hadoop-11656 is too minor to force a major number change (no pun intended). Lets separate the issue of Java8 and Hadoop 3.0 sanjay On Mar 2, 2015, at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Is moving to JDK8 fundamentally different from the move to JDK7? We are moving to JDK7 via release 2.7 that I am helping with now. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. Aren't the shell script rewrite changes supposed to be compatible? Thanks, +Vinod
Re: Looking to a Hadoop 3 release
+1 Happy to help too On Mon, Mar 2, 2015 at 3:57 PM, Yongjun Zhang yzh...@cloudera.com wrote: Thanks Andrew for the proposal. +1, and I will be happy to help. --Yongjun On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
RE: Looking to a Hadoop 3 release
JDK8 support is in the consideration, looks like many issues were reported and resolved already. https://issues.apache.org/jira/browse/HADOOP-11090 -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, March 03, 2015 7:20 AM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Looking to a Hadoop 3 release Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
RE: 2.7 status
Is it interested to get the following issues in the release ? Thanks ! HADOOP-10670 HADOOP-10671 Regards, Kai -Original Message- From: Yongjun Zhang [mailto:yzh...@cloudera.com] Sent: Monday, March 02, 2015 4:46 AM To: hdfs-...@hadoop.apache.org Cc: Vinod Kumar Vavilapalli; Hadoop Common; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: 2.7 status Hi, Thanks for working on 2.7 release. Currently the fallback from KerberosAuthenticator to PseudoAuthenticator is enabled by default in a hardcoded way. HAOOP-10895 changes the default and requires applications (such as oozie) to set a config property or call an API to enable the fallback. This jira has been reviewed, and almost ready to get in. However, there is a concern that we have to change the relevant applications. Please see my comment here: https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14321823page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14321823 Any of your comments will be highly appreciated. This jira was postponed from 2.6. I think it should be no problem to skip 2.7. But your comments would help us to decide what to do with this jira for future releases. Thanks. --Yongjun On Sun, Mar 1, 2015 at 11:58 AM, Arun Murthy a...@hortonworks.com wrote: Sounds good, thanks for the help Vinod! Arun From: Vinod Kumar Vavilapalli Sent: Sunday, March 01, 2015 11:43 AM To: Hadoop Common; Jason Lowe; Arun Murthy Subject: Re: 2.7 status Agreed. How about we roll an RC end of this week? As a Java 7+ release with features, patches that already got in? Here's a filter tracking blocker tickets - https://issues.apache.org/jira/issues/?filter=12330598. Nine open now. +Arun Arun, I'd like to help get 2.7 out without further delay. Do you mind me taking over release duties? Thanks, +Vinod From: Jason Lowe jl...@yahoo-inc.com.INVALID Sent: Friday, February 13, 2015 8:11 AM To: common-dev@hadoop.apache.org Subject: Re: 2.7 status I'd like to see a 2.7 release sooner than later. It has been almost 3 months since Hadoop 2.6 was released, and there have already been 634 JIRAs committed to 2.7. That's a lot of changes waiting for an official release. https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2C hdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resoluti on%3DFixed Jason From: Sangjin Lee sj...@apache.org To: common-dev@hadoop.apache.org common-dev@hadoop.apache.org Sent: Tuesday, February 10, 2015 1:30 PM Subject: 2.7 status Folks, What is the current status of the 2.7 release? I know initially it started out as a java-7 only release, but looking at the JIRAs that is very much not the case. Do we have a certain timeframe for 2.7 or is it time to discuss it? Thanks, Sangjin
Re: 2.7 status
Seems like there is already some action on the JIRA. Can you please ping the previous reviewers on JIRA to make progress? Thanks, +Vinod On Mar 1, 2015, at 12:45 PM, Yongjun Zhang yzh...@cloudera.commailto:yzh...@cloudera.com wrote: Hi, Thanks for working on 2.7 release. Currently the fallback from KerberosAuthenticator to PseudoAuthenticator is enabled by default in a hardcoded way. HAOOP-10895 changes the default and requires applications (such as oozie) to set a config property or call an API to enable the fallback. This jira has been reviewed, and almost ready to get in. However, there is a concern that we have to change the relevant applications. Please see my comment here: https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14321823page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14321823 Any of your comments will be highly appreciated. This jira was postponed from 2.6. I think it should be no problem to skip 2.7. But your comments would help us to decide what to do with this jira for future releases. Thanks. --Yongjun On Sun, Mar 1, 2015 at 11:58 AM, Arun Murthy a...@hortonworks.commailto:a...@hortonworks.com wrote: Sounds good, thanks for the help Vinod! Arun From: Vinod Kumar Vavilapalli Sent: Sunday, March 01, 2015 11:43 AM To: Hadoop Common; Jason Lowe; Arun Murthy Subject: Re: 2.7 status Agreed. How about we roll an RC end of this week? As a Java 7+ release with features, patches that already got in? Here's a filter tracking blocker tickets - https://issues.apache.org/jira/issues/?filter=12330598. Nine open now. +Arun Arun, I'd like to help get 2.7 out without further delay. Do you mind me taking over release duties? Thanks, +Vinod From: Jason Lowe jl...@yahoo-inc.com.INVALIDmailto:jl...@yahoo-inc.com.INVALID Sent: Friday, February 13, 2015 8:11 AM To: common-dev@hadoop.apache.orgmailto:common-dev@hadoop.apache.org Subject: Re: 2.7 status I'd like to see a 2.7 release sooner than later. It has been almost 3 months since Hadoop 2.6 was released, and there have already been 634 JIRAs committed to 2.7. That's a lot of changes waiting for an official release. https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2Chdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resolution%3DFixed Jason From: Sangjin Lee sj...@apache.orgmailto:sj...@apache.org To: common-dev@hadoop.apache.orgmailto:common-dev@hadoop.apache.org common-dev@hadoop.apache.orgmailto:common-dev@hadoop.apache.org Sent: Tuesday, February 10, 2015 1:30 PM Subject: 2.7 status Folks, What is the current status of the 2.7 release? I know initially it started out as a java-7 only release, but looking at the JIRAs that is very much not the case. Do we have a certain timeframe for 2.7 or is it time to discuss it? Thanks, Sangjin
Re: DISCUSSION: Patch commit criteria.
On Mon, Mar 2, 2015 at 11:29 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: We always needed another committer's +1 even if it isn't that clear in the bylaws. In the minimum, we should codify this in the bylaws to avoid stuff like people committing their own patches. Regarding trivial changes, I always distinguish between trivial *patches* and trivial changes to *existing* patches. Patches even if trivial need to be +1ed by another committer. OTOH, many a times, for patches that are extensively reviewed, potentially for months on, I sometimes end up making a small javadoc/documentation change in the last version of patch before committing. It just avoids one more cycle and more delay. It's hard to codify this distinction though. In the past, I have made trivial (new lines, indentation, etc.) changes to well reviewed patches before committing. Even then, I believe we should upload the updated patch or the diff of trivial changes and wait for someone else (potentially a non-committer contributor) to quickly check to avoid making silly mistakes. Thanks +Vinod On Feb 27, 2015, at 1:04 PM, Konstantin Shvachko shv.had...@gmail.com wrote: There were discussions on several jiras and threads recently about how RTC actually works in Hadoop. My opinion has always been that for a patch to be committed it needs an approval (+1) of at least one committer other than the author and no -1s. The Bylaws seem to be stating just that: Consensus approval of active committers, but with a minimum of one +1. See the full version under Actions / Code Change http://hadoop.apache.org/bylaws.html#Decision+Making Turned out people have different readings of that part of Bylaws, and different opinions on how RTC should work in different cases. Some of the questions that were raised include: - Should we clarify the Code Change decision making clause in Bylaws? - Should there be a relaxed criteria for trivial changes? - Can a patch be committed if approved only by a non committer? - Can a patch be committed based on self-review by a committer? - What is the point for a non-committer to review the patch? Creating this thread to discuss these (and other that I sure missed) issues and to combine multiple discussions into one. My personal opinion we should just stick to the tradition. Good or bad, it worked for this community so far. I think most of the discrepancies arise from the fact that reviewers are hard to find. May be this should be the focus of improvements rather than the RTC rules. Thanks, --Konst -- Karthik Kambatla Software Engineer, Cloudera Inc. http://five.sentenc.es
RE: Looking to a Hadoop 3 release
Sorry for the bad. I thought it was sending to my colleagues. By the way, for the JDK8 support, we (Intel) would like to investigate further and help, thanks. Regards, Kai -Original Message- From: Zheng, Kai Sent: Tuesday, March 03, 2015 8:49 AM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: RE: Looking to a Hadoop 3 release JDK8 support is in the consideration, looks like many issues were reported and resolved already. https://issues.apache.org/jira/browse/HADOOP-11090 -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, March 03, 2015 7:20 AM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Looking to a Hadoop 3 release Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
[jira] [Resolved] (HADOOP-11449) [JDK8] Cannot build on Windows: error: unexpected end tag: /ul
[ https://issues.apache.org/jira/browse/HADOOP-11449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-11449. - Resolution: Fixed Fix Version/s: 2.7.0 Assignee: Chris Nauroth (was: Anu Engineer) [JDK8] Cannot build on Windows: error: unexpected end tag: /ul Key: HADOOP-11449 URL: https://issues.apache.org/jira/browse/HADOOP-11449 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: site, 3.0.0, trunk-win, 2.6.0 Environment: jdk8 Windows 8.1 x64 java version 1.8.0_25 Java(TM) SE Runtime Environment (build 1.8.0_25-b18) Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode) Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; 2014-12-15T04:29:2 3) Reporter: Alec Taylor Assignee: Chris Nauroth Labels: build, easyfix, hadoop, windows Fix For: 2.7.0 Original Estimate: 2h Remaining Estimate: 2h Tried on hadoop-2.6.0-src, branch-2.5 and branch-trunk-win. All gave this error: ``` [ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:2.8.1:jar (module-javadocs) on project hadoop-annotations: MavenReportException: Error while creating archive: [ERROR] Exit code: 1 - E:\Projects\hadoop-common\hadoop-common-project\hadoop-annotations\src\main\java\org\apache\hadoop\classification\InterfaceStability.java:27: error: unexpected end tag: /ul [ERROR] * /ul [ERROR] ^ [ERROR] [ERROR] Command line was: C:\Program Files\Java\jdk1.8.0_25\jre\..\bin\javadoc.exe @options @packages [ERROR] [ERROR] Refer to the generated Javadoc files in 'E:\Projects\hadoop-common\hadoop-common-project\hadoop-annotations\target' dir. [ERROR] - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :hadoop-annotations ``` -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: Looking to a Hadoop 3 release
+1 Regards, Yi Liu -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, March 03, 2015 7:20 AM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Looking to a Hadoop 3 release Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
+1 non-binding It is a nice to have hadoop 3.x release. My honor to help. Regards! Chen On Mon, Mar 2, 2015 at 4:58 PM, Zheng, Kai kai.zh...@intel.com wrote: Sorry for the bad. I thought it was sending to my colleagues. By the way, for the JDK8 support, we (Intel) would like to investigate further and help, thanks. Regards, Kai -Original Message- From: Zheng, Kai Sent: Tuesday, March 03, 2015 8:49 AM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: RE: Looking to a Hadoop 3 release JDK8 support is in the consideration, looks like many issues were reported and resolved already. https://issues.apache.org/jira/browse/HADOOP-11090 -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, March 03, 2015 7:20 AM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Looking to a Hadoop 3 release Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
Andrew, Thanks for bringing up this discussion. I'm a little puzzled for I feel like we are rehashing the same discussion from last year - where we agreed on a different course of action w.r.t switch to JDK7. IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount. Now, breaking compatibility is perfectly fine over time where there is sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the breakage. Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users? We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release. Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways. Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost. Overall, my biggest concern is the compatibility story vis-a-vis the benefit. Thoughts? thanks, Arun From: Andrew Wang andrew.w...@cloudera.com Sent: Monday, March 02, 2015 3:19 PM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Looking to a Hadoop 3 release Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
I'm +1 for a migrate to Java 8 as soon as possible. That's branch-2 trunk, as having them on the same language level makes cherrypicking stuff off trunk possible. That's particularly the case for Java 8 as it is the first major change to the language since Java 5. w.r.t shipping trunk as 3.x, it's going to take longer than planned. Hopefully not as long as the 2.x release process, but you never know. Which means I expect some more Hadoop 2 releases this year. We need to make the jump there too, get 2.7 out the door and include a roadmap in there to when the java 8+ only event happens across the codebase. -Steve ps. for anyone who wants a pure java8 build today, set -Djavac.version=1.8 on the classpath of a maven build. Last time I tried there were some (minor) bits of YARN that wouldn't compile... On 2 March 2015 at 18:31:00, Arun Murthy (a...@hortonworks.commailto:a...@hortonworks.com) wrote: Andrew, Thanks for bringing up this discussion. I'm a little puzzled for I feel like we are rehashing the same discussion from last year - where we agreed on a different course of action w.r.t switch to JDK7. IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount. Now, breaking compatibility is perfectly fine over time where there is sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the breakage. Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users? We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release. Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways. Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost. Overall, my biggest concern is the compatibility story vis-a-vis the benefit. Thoughts? thanks, Arun From: Andrew Wang andrew.w...@cloudera.com Sent: Monday, March 02, 2015 3:19 PM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Looking to a Hadoop 3 release Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: 2.7 status
Kai, please ping the reviewers that were already looking at your patches before. If the patches go in by end of this week, we can include them. Thanks, +Vinod On Mar 2, 2015, at 7:04 PM, Zheng, Kai kai.zh...@intel.com wrote: Is it interested to get the following issues in the release ? Thanks ! HADOOP-10670 HADOOP-10671 Regards, Kai -Original Message- From: Yongjun Zhang [mailto:yzh...@cloudera.com] Sent: Monday, March 02, 2015 4:46 AM To: hdfs-...@hadoop.apache.org Cc: Vinod Kumar Vavilapalli; Hadoop Common; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: 2.7 status Hi, Thanks for working on 2.7 release. Currently the fallback from KerberosAuthenticator to PseudoAuthenticator is enabled by default in a hardcoded way. HAOOP-10895 changes the default and requires applications (such as oozie) to set a config property or call an API to enable the fallback. This jira has been reviewed, and almost ready to get in. However, there is a concern that we have to change the relevant applications. Please see my comment here: https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14321823page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14321823 Any of your comments will be highly appreciated. This jira was postponed from 2.6. I think it should be no problem to skip 2.7. But your comments would help us to decide what to do with this jira for future releases. Thanks. --Yongjun On Sun, Mar 1, 2015 at 11:58 AM, Arun Murthy a...@hortonworks.com wrote: Sounds good, thanks for the help Vinod! Arun From: Vinod Kumar Vavilapalli Sent: Sunday, March 01, 2015 11:43 AM To: Hadoop Common; Jason Lowe; Arun Murthy Subject: Re: 2.7 status Agreed. How about we roll an RC end of this week? As a Java 7+ release with features, patches that already got in? Here's a filter tracking blocker tickets - https://issues.apache.org/jira/issues/?filter=12330598. Nine open now. +Arun Arun, I'd like to help get 2.7 out without further delay. Do you mind me taking over release duties? Thanks, +Vinod From: Jason Lowe jl...@yahoo-inc.com.INVALID Sent: Friday, February 13, 2015 8:11 AM To: common-dev@hadoop.apache.org Subject: Re: 2.7 status I'd like to see a 2.7 release sooner than later. It has been almost 3 months since Hadoop 2.6 was released, and there have already been 634 JIRAs committed to 2.7. That's a lot of changes waiting for an official release. https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2C hdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resoluti on%3DFixed Jason From: Sangjin Lee sj...@apache.org To: common-dev@hadoop.apache.org common-dev@hadoop.apache.org Sent: Tuesday, February 10, 2015 1:30 PM Subject: 2.7 status Folks, What is the current status of the 2.7 release? I know initially it started out as a java-7 only release, but looking at the JIRAs that is very much not the case. Do we have a certain timeframe for 2.7 or is it time to discuss it? Thanks, Sangjin
Re: Looking to a Hadoop 3 release
+1 It sounds like a good idea, especially regarding JDK. Regards JB On 03/03/2015 12:19 AM, Andrew Wang wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com
Re: Looking to a Hadoop 3 release
Thanks as always for the feedback everyone. Some inline comments to Arun's email, as his were the most extensive: Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users? I looked at our thread on this topic from last time, and we (meaning at least myself and Tucu) agreed to a one-time exception to the JDK7 bump in 2.x for practical reasons. We waited for so long that we had some assurance JDK6 was on the outs. Multiple distros also already had bumped their min version to JDK7. This is not true this time around. Bumping the JDK version is hugely impactful on the end user, and my email on the earlier thread still reflects my thoughts on JDK compatibility: http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3CCAGB5D2a5fEDfBApQyER_zyhc8a4Xd_ea1wJSsxxkiAiDZO9%2BNg%40mail.gmail.com%3E Regarding classpath isolation, based on what I hear from our customers, it's still a big problem (even after the MR classloader work). The latest Jackson version bump was quite painful for our downstream projects, and the HDFS client still leaks a lot of dependencies. Would welcome more discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already chimed in. Having the freedom to upgrade our dependencies at will would also be a big win for us as developers. We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release. Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways. Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost. Right now, the incompatible changes would be JDK8, classpath isolation, and whatever is already in trunk. I can audit these existing trunk changes when branch-3 is cut. I would like to keep this list as short as possible, to preserve wire compat and rolling upgrade. As far as major releases go, this is not one to be scared of. However, since it's incompatible, it still needs that major version bump. Best, Andrew P.S. Vinod, the shell script rewrite is incompatible. Allen intentionally excluded it from branch-2 for this reason.
Re: Looking to a Hadoop 3 release
Andrew, Hadoop 3 seems in general like a good idea to me. 1. I did not understand if you propose to release 3.0 instead of 2.7 or in addition? I think 2.7 is needed at least as a stabilization step for the 2.x line. 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and other versions. If that somehow beneficial for commercial vendors, which I don't see how, for the community it was proven to be very disruptive. Would be really good to avoid it this time. 3. Could we release Hadoop 3 directly from trunk? With a proper feature freeze in advance. Current trunk is in the best working condition I've seen in years - much better, than when hadoop-2 was coming to life. It could make a good alpha. I believe we can start planning 3.0 from trunk right after 2.7 is out. Thanks, --Konst On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: DISCUSSION: Patch commit criteria.
Vinod, I agree that triviality is hard to define and we should not add things that can be interpreted multiple ways to the bylaws. If something is not quite clear in the bylaws, it would make sense to have a proposal of new phrasing, so that we could discuss it here and call a vote upon reaching an agreement. Thanks, --Konst On Mon, Mar 2, 2015 at 11:29 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: We always needed another committer's +1 even if it isn't that clear in the bylaws. In the minimum, we should codify this in the bylaws to avoid stuff like people committing their own patches. Regarding trivial changes, I always distinguish between trivial *patches* and trivial changes to *existing* patches. Patches even if trivial need to be +1ed by another committer. OTOH, many a times, for patches that are extensively reviewed, potentially for months on, I sometimes end up making a small javadoc/documentation change in the last version of patch before committing. It just avoids one more cycle and more delay. It's hard to codify this distinction though. Thanks +Vinod On Feb 27, 2015, at 1:04 PM, Konstantin Shvachko shv.had...@gmail.com wrote: There were discussions on several jiras and threads recently about how RTC actually works in Hadoop. My opinion has always been that for a patch to be committed it needs an approval (+1) of at least one committer other than the author and no -1s. The Bylaws seem to be stating just that: Consensus approval of active committers, but with a minimum of one +1. See the full version under Actions / Code Change http://hadoop.apache.org/bylaws.html#Decision+Making Turned out people have different readings of that part of Bylaws, and different opinions on how RTC should work in different cases. Some of the questions that were raised include: - Should we clarify the Code Change decision making clause in Bylaws? - Should there be a relaxed criteria for trivial changes? - Can a patch be committed if approved only by a non committer? - Can a patch be committed based on self-review by a committer? - What is the point for a non-committer to review the patch? Creating this thread to discuss these (and other that I sure missed) issues and to combine multiple discussions into one. My personal opinion we should just stick to the tradition. Good or bad, it worked for this community so far. I think most of the discrepancies arise from the fact that reviewers are hard to find. May be this should be the focus of improvements rather than the RTC rules. Thanks, --Konst
Build failed in Jenkins: Hadoop-common-trunk-Java8 #122
See https://builds.apache.org/job/Hadoop-common-trunk-Java8/122/changes Changes: [aajisaka] HDFS-5853. Add hadoop.user.group.metrics.percentiles.intervals to hdfs-default.xml (aajisaka) [ozawa] HADOOP-11634. Description of webhdfs' principal/keytab should switch places each other. Contributed by Brahma Reddy Battula. [aajisaka] HADOOP-11657. Align the output of `hadoop fs -du` to be more Unix-like. (aajisaka) [aajisaka] HADOOP-11615. Update ServiceLevelAuth.md for YARN. Contributed by Brahma Reddy Battula. [szetszwo] HDFS-7439. Add BlockOpResponseProto's message to the exception messages. Contributed by Takanobu Asanuma -- [...truncated 9085 lines...] [WARNING] ^ [WARNING] https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/XDR.java:223: warning: no @param for len [WARNING] public static boolean verifyLength(XDR xdr, int len) { [WARNING] ^ [WARNING] https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/XDR.java:223: warning: no @return [WARNING] public static boolean verifyLength(XDR xdr, int len) { [WARNING] ^ [WARNING] https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/XDR.java:235: warning: no @param for request [WARNING] public static ChannelBuffer writeMessageTcp(XDR request, boolean last) { [WARNING] ^ [WARNING] https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/XDR.java:235: warning: no @param for last [WARNING] public static ChannelBuffer writeMessageTcp(XDR request, boolean last) { [WARNING] ^ [WARNING] https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/XDR.java:235: warning: no @return [WARNING] public static ChannelBuffer writeMessageTcp(XDR request, boolean last) { [WARNING] ^ [WARNING] https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/XDR.java:247: warning: no @param for response [WARNING] public static ChannelBuffer writeMessageUdp(XDR response) { [WARNING] ^ [WARNING] https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/XDR.java:247: warning: no @return [WARNING] public static ChannelBuffer writeMessageUdp(XDR response) { [WARNING] ^ [WARNING] https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/security/Credentials.java:51: warning: no @param for cred [WARNING] public static void writeFlavorAndCredentials(Credentials cred, XDR xdr) { [WARNING] ^ [WARNING] https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/security/Credentials.java:51: warning: no @param for xdr [WARNING] public static void writeFlavorAndCredentials(Credentials cred, XDR xdr) { [WARNING] ^ [WARNING] https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/security/RpcAuthInfo.java:61: warning: no @param for xdr [WARNING] public abstract void read(XDR xdr); [WARNING] ^ [WARNING] https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/security/RpcAuthInfo.java:64: warning: no @param for xdr [WARNING] public abstract void write(XDR xdr); [WARNING] ^ [WARNING] https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/security/SecurityHandler.java:45: warning: no @param for request [WARNING] public XDR unwrap(RpcCall request, byte[] data ) throws IOException { [WARNING] ^ [INFO] Building jar: https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/target/hadoop-nfs-3.0.0-SNAPSHOT-javadoc.jar [INFO] [INFO] --- maven-assembly-plugin:2.4:single (dist) @ hadoop-nfs --- [INFO] Reading assembly descriptor: ../../hadoop-assemblies/src/main/resources/assemblies/hadoop-nfs-dist.xml [WARNING] The following patterns were never triggered in this artifact exclusion filter: o 'org.apache.hadoop:hadoop-common' o 'org.apache.hadoop:hadoop-hdfs' o 'org.hsqldb:hsqldb' [INFO] Copying files to https://builds.apache.org/job/Hadoop-common-trunk-Java8/ws/hadoop-common-project/hadoop-nfs/target/hadoop-nfs-3.0.0-SNAPSHOT [INFO] [INFO] [INFO] Building Apache Hadoop KMS 3.0.0-SNAPSHOT [INFO] [INFO]
[jira] [Created] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
Gera Shegalov created HADOOP-11659: -- Summary: o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup Key: HADOOP-11659 URL: https://issues.apache.org/jira/browse/HADOOP-11659 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 2.6.0 Reporter: Gera Shegalov Priority: Trivial The method looks up the same key in the same hash map potentially 3 times {code} if (map.containsKey(key) fs == map.get(key)) { map.remove(key) {code} Instead it could do a single lookup {code} FileSystem cachedFs = map.remove(key); {code} and then test cachedFs == fs or something else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11660) Add support for hardware crc on ARM aarch64 architecture
Edward Nevill created HADOOP-11660: -- Summary: Add support for hardware crc on ARM aarch64 architecture Key: HADOOP-11660 URL: https://issues.apache.org/jira/browse/HADOOP-11660 Project: Hadoop Common Issue Type: Improvement Components: native Affects Versions: 3.0.0 Environment: ARM aarch64 development platform Reporter: Edward Nevill Assignee: Edward Nevill Priority: Minor Fix For: 3.0.0 This patch adds support for hardware crc for ARM's new 64 bit architecture The patch is completely conditionalized on __aarch64__ I have only added support for the non pipelined version as I benchmarked the pipelined version on aarch64 and it showed no performance improvement. The aarch64 version supports both Castagnoli and Zlib CRCs as both of these are supported on ARM aarch64 hardwre. To benchmark this I modified the test_bulk_crc32 test to print out the time taken to CRC a 1MB dataset 1000 times. Before: CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55 CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55 After: CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57 CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57 So this represents a 5X performance improvement on raw CRC calculation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: DISCUSSION: Patch commit criteria.
I agree with Andrew and Konst here. I don't think the language is unclear in the rule, either... consensus with a minimum of one +1 clearly indicates that _other people_ are involved, not just one person. I would also mention that we created the branch committer role specifically to make it easier to do rapid development on a new feature. Branch committers can commit patches to a branch without any full committers involved at all. When the branch is ready to merge, the community can review it and give feedback. best, Colin On Fri, Feb 27, 2015 at 2:27 PM, Andrew Wang andrew.w...@cloudera.com wrote: I have the same interpretation as Konst on this. +1 from at least one committer other than the author, no -1s. I don't think there should be an exclusion for trivial patches, since the definition of trivial is subjective. The exception here is CHANGES.txt, which is something we really should get rid of. Non-committers are still strongly encouraged to review patches even if their +1 is not binding. Additional reviews improve code quality. Also, when it comes to choosing new committers, one of the primary things I look for is a history of quality code reviews. Best, Andrew On Fri, Feb 27, 2015 at 1:04 PM, Konstantin Shvachko shv.had...@gmail.com wrote: There were discussions on several jiras and threads recently about how RTC actually works in Hadoop. My opinion has always been that for a patch to be committed it needs an approval (+1) of at least one committer other than the author and no -1s. The Bylaws seem to be stating just that: Consensus approval of active committers, but with a minimum of one +1. See the full version under Actions / Code Change http://hadoop.apache.org/bylaws.html#Decision+Making Turned out people have different readings of that part of Bylaws, and different opinions on how RTC should work in different cases. Some of the questions that were raised include: - Should we clarify the Code Change decision making clause in Bylaws? - Should there be a relaxed criteria for trivial changes? - Can a patch be committed if approved only by a non committer? - Can a patch be committed based on self-review by a committer? - What is the point for a non-committer to review the patch? Creating this thread to discuss these (and other that I sure missed) issues and to combine multiple discussions into one. My personal opinion we should just stick to the tradition. Good or bad, it worked for this community so far. I think most of the discrepancies arise from the fact that reviewers are hard to find. May be this should be the focus of improvements rather than the RTC rules. Thanks, --Konst
Re: TimSort bug and its workaround
Thanks for bringing this up. If you can find any place where an array might realistically be larger than 67 million elements, then I guess file a JIRA for it. Also this array needs to be of objects, not of primitives (quicksort is used for those in jdk7, apparently). I can't think of any such place off the top of my head, but I might be missing something. Potentially we could also file a JIRA just as a place to gather the discussion, even if the only outcome is a release note. best, Colin On Thu, Feb 26, 2015 at 12:16 AM, Tsuyoshi Ozawa oz...@apache.org wrote: Maybe we should discuss whether the elements of array can be larger than 67108864 in our use cases - e.g. FairScheduler uses Collection.sort(), but the number of job isn't larger than 67108864 in many use cases, so we can keep using it. It's also reasonable that we choose to use safe algorithms for stability. Thanks, - Tsuyoshi On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa oz...@apache.org wrote: Hi hadoop developers, Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort, is reported. How can we deal with this problem? http://envisage-project.eu/timsort-specification-and-verification/ https://bugs.openjdk.java.net/browse/JDK-8072909 The bug causes ArrayIndexOutOfBoundsException if the number of element is larger than 67108864. We use the sort method at 77 places at least. find . -name *.java | xargs grep Collections.sort | wc -l 77 One reasonable workaround is to set java.util.Arrays.useLegacyMergeSort() by default. Thanks, - Tsuyoshi
Re: DISCUSSION: Patch commit criteria.
We always needed another committer's +1 even if it isn't that clear in the bylaws. In the minimum, we should codify this in the bylaws to avoid stuff like people committing their own patches. Regarding trivial changes, I always distinguish between trivial *patches* and trivial changes to *existing* patches. Patches even if trivial need to be +1ed by another committer. OTOH, many a times, for patches that are extensively reviewed, potentially for months on, I sometimes end up making a small javadoc/documentation change in the last version of patch before committing. It just avoids one more cycle and more delay. It's hard to codify this distinction though. Thanks +Vinod On Feb 27, 2015, at 1:04 PM, Konstantin Shvachko shv.had...@gmail.com wrote: There were discussions on several jiras and threads recently about how RTC actually works in Hadoop. My opinion has always been that for a patch to be committed it needs an approval (+1) of at least one committer other than the author and no -1s. The Bylaws seem to be stating just that: Consensus approval of active committers, but with a minimum of one +1. See the full version under Actions / Code Change http://hadoop.apache.org/bylaws.html#Decision+Making Turned out people have different readings of that part of Bylaws, and different opinions on how RTC should work in different cases. Some of the questions that were raised include: - Should we clarify the Code Change decision making clause in Bylaws? - Should there be a relaxed criteria for trivial changes? - Can a patch be committed if approved only by a non committer? - Can a patch be committed based on self-review by a committer? - What is the point for a non-committer to review the patch? Creating this thread to discuss these (and other that I sure missed) issues and to combine multiple discussions into one. My personal opinion we should just stick to the tradition. Good or bad, it worked for this community so far. I think most of the discrepancies arise from the fact that reviewers are hard to find. May be this should be the focus of improvements rather than the RTC rules. Thanks, --Konst
[jira] [Created] (HADOOP-11661) Deprecate FileUtil#copyMerge
Brahma Reddy Battula created HADOOP-11661: - Summary: Deprecate FileUtil#copyMerge Key: HADOOP-11661 URL: https://issues.apache.org/jira/browse/HADOOP-11661 Project: Hadoop Common Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula FileUtil#copyMerge is currently unused in the Hadoop source tree. In branch-1, it had been part of the implementation of the hadoop fs -getmerge shell command. In branch-2, the code for that shell command was rewritten in a way that no longer requires this method. Please check more details here.. https://issues.apache.org/jira/browse/HADOOP-11392?focusedCommentId=14339336page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14339336 -- This message was sent by Atlassian JIRA (v6.3.4#6332)