Re: Looking to a Hadoop 3 release
A heads up that I think we're getting close on the blockers for the first alpha. Looking at my list, I see two I'd like to get in still: YARN-5270 and HADOOP-13316. Will cut a branch and roll the release once those go in; my test builds have looked good thus far. My original plan was to do alphas and then beta in Aug/Sep, but given how the create-release and L changes delayed us by a few months, it also pushes out the beta timeframe. Given that Nov/Dec is often a quiet period of development, I think a realistic new beta date is sometime early next year (Jan/Feb). FYI. Thanks, Andrew On Thu, May 12, 2016 at 5:20 PM, Karthik Kambatlawrote: > I am with Vinod on avoiding merging mostly_complete_branches to trunk since > we are not shipping any release off it. If 3.x releases going off of trunk > is going to help with this, I am fine with that approach. We should still > make sure to keep trunk-incompat small and not include large features. > > On Sat, Apr 23, 2016 at 6:53 PM, Chris Douglas > wrote: > > > If we're not starting branch-3/trunk, what would distinguish it from > > trunk/trunk-incompat? Is it the same mechanism with different labels? > > > > That may be a reasonable strategy when we create branch-3, as a > > release branch for beta. Releasing 3.x from trunk will help us figure > > out which incompatibilities can be called out in an upgrade guide > > (e.g., "new feature X is incompatible with uncommon configuration Y") > > and which require code changes (e.g., "data loss upgrading a cluster > > with feature X"). Given how long trunk has been unreleased, we need > > more data from deployments to triage. How to manage transitions > > between major versions will always be case-by-case; consensus on how > > we'll address generic incompatible changes is not saving any work. > > > > Once created, removing functionality from branch-3 (leaving it in > > trunk) _because_ nobody volunteers cycles to address urgent > > compatibility issues is fair. It's also more workable than asking that > > features be committed to a branch that we have no plan to release, > > even as alpha. -C > > > > On Fri, Apr 22, 2016 at 6:50 PM, Vinod Kumar Vavilapalli > > wrote: > > > Tx for your replies, Andrew. > > > > > >>> For exit criteria, how about we time box it? My plan was to do > monthly > > >> alphas through the summer, leading up to beta in late August / early > > Sep. > > >> At that point we freeze and stabilize for GA in Nov/Dec. > > > > > > > > > Time-boxing is a reasonable exit-criterion. > > > > > > > > >> In this case, does trunk-incompat essentially become the new trunk? Or > > are > > >> we treating trunk-incompat as a feature branch, which periodically > > merges > > >> changes from trunk? > > > > > > > > > It’s the later. Essentially > > > - trunk-incompat = trunk + only incompatible changes, periodically > kept > > up-to-date to trunk > > > - trunk is always ready to ship > > > - and no compatible code gets left behind > > > > > > The reason for my proposal like this is to address the tension between > > “there is lot of compatible code in trunk that we are not shipping” and > > “don’t ship trunk, it has incompatibilities”. With this, we will not have > > (compatible) code not getting shipped to users. > > > > > > Obviously, we can forget about all of my proposal completely if > everyone > > puts in all compatible code into branch-2 / branch-3 or whatever the main > > releasable branch is. This didn’t work in practice, have seen this not > > happening prominently during 0.21, and now 3.x. > > > > > > There is another related issue - "my feature is nearly ready, so I’ll > > just merge it into trunk as we don’t release that anyways, but not the > > current releasable branch - I’m lazy to fix the last few stability > related > > issues”. With this, we will (should) get more disciplined, take feature > > stability on a branch seriously and merge a feature branch only when it > is > > truly ready! > > > > > >> For 3.x, my strawman was to release off trunk for the alphas, then > > branch a > > >> branch-3 for the beta and onwards. > > > > > > > > > Repeating above, I’m proposing continuing to make GA 3.x releases also > > off of trunk! This way only incompatible changes don’t get shipped to > users > > - by design! Eventually, trunk-incompat will be latest 3.x GA + enough > > incompatible code to warrant a 4.x, 5.x etc. > > > > > > +Vinod > > >
Re: Looking to a Hadoop 3 release
I am with Vinod on avoiding merging mostly_complete_branches to trunk since we are not shipping any release off it. If 3.x releases going off of trunk is going to help with this, I am fine with that approach. We should still make sure to keep trunk-incompat small and not include large features. On Sat, Apr 23, 2016 at 6:53 PM, Chris Douglaswrote: > If we're not starting branch-3/trunk, what would distinguish it from > trunk/trunk-incompat? Is it the same mechanism with different labels? > > That may be a reasonable strategy when we create branch-3, as a > release branch for beta. Releasing 3.x from trunk will help us figure > out which incompatibilities can be called out in an upgrade guide > (e.g., "new feature X is incompatible with uncommon configuration Y") > and which require code changes (e.g., "data loss upgrading a cluster > with feature X"). Given how long trunk has been unreleased, we need > more data from deployments to triage. How to manage transitions > between major versions will always be case-by-case; consensus on how > we'll address generic incompatible changes is not saving any work. > > Once created, removing functionality from branch-3 (leaving it in > trunk) _because_ nobody volunteers cycles to address urgent > compatibility issues is fair. It's also more workable than asking that > features be committed to a branch that we have no plan to release, > even as alpha. -C > > On Fri, Apr 22, 2016 at 6:50 PM, Vinod Kumar Vavilapalli > wrote: > > Tx for your replies, Andrew. > > > >>> For exit criteria, how about we time box it? My plan was to do monthly > >> alphas through the summer, leading up to beta in late August / early > Sep. > >> At that point we freeze and stabilize for GA in Nov/Dec. > > > > > > Time-boxing is a reasonable exit-criterion. > > > > > >> In this case, does trunk-incompat essentially become the new trunk? Or > are > >> we treating trunk-incompat as a feature branch, which periodically > merges > >> changes from trunk? > > > > > > It’s the later. Essentially > > - trunk-incompat = trunk + only incompatible changes, periodically kept > up-to-date to trunk > > - trunk is always ready to ship > > - and no compatible code gets left behind > > > > The reason for my proposal like this is to address the tension between > “there is lot of compatible code in trunk that we are not shipping” and > “don’t ship trunk, it has incompatibilities”. With this, we will not have > (compatible) code not getting shipped to users. > > > > Obviously, we can forget about all of my proposal completely if everyone > puts in all compatible code into branch-2 / branch-3 or whatever the main > releasable branch is. This didn’t work in practice, have seen this not > happening prominently during 0.21, and now 3.x. > > > > There is another related issue - "my feature is nearly ready, so I’ll > just merge it into trunk as we don’t release that anyways, but not the > current releasable branch - I’m lazy to fix the last few stability related > issues”. With this, we will (should) get more disciplined, take feature > stability on a branch seriously and merge a feature branch only when it is > truly ready! > > > >> For 3.x, my strawman was to release off trunk for the alphas, then > branch a > >> branch-3 for the beta and onwards. > > > > > > Repeating above, I’m proposing continuing to make GA 3.x releases also > off of trunk! This way only incompatible changes don’t get shipped to users > - by design! Eventually, trunk-incompat will be latest 3.x GA + enough > incompatible code to warrant a 4.x, 5.x etc. > > > > +Vinod >
Re: Looking to a Hadoop 3 release
If we're not starting branch-3/trunk, what would distinguish it from trunk/trunk-incompat? Is it the same mechanism with different labels? That may be a reasonable strategy when we create branch-3, as a release branch for beta. Releasing 3.x from trunk will help us figure out which incompatibilities can be called out in an upgrade guide (e.g., "new feature X is incompatible with uncommon configuration Y") and which require code changes (e.g., "data loss upgrading a cluster with feature X"). Given how long trunk has been unreleased, we need more data from deployments to triage. How to manage transitions between major versions will always be case-by-case; consensus on how we'll address generic incompatible changes is not saving any work. Once created, removing functionality from branch-3 (leaving it in trunk) _because_ nobody volunteers cycles to address urgent compatibility issues is fair. It's also more workable than asking that features be committed to a branch that we have no plan to release, even as alpha. -C On Fri, Apr 22, 2016 at 6:50 PM, Vinod Kumar Vavilapalliwrote: > Tx for your replies, Andrew. > >>> For exit criteria, how about we time box it? My plan was to do monthly >> alphas through the summer, leading up to beta in late August / early Sep. >> At that point we freeze and stabilize for GA in Nov/Dec. > > > Time-boxing is a reasonable exit-criterion. > > >> In this case, does trunk-incompat essentially become the new trunk? Or are >> we treating trunk-incompat as a feature branch, which periodically merges >> changes from trunk? > > > It’s the later. Essentially > - trunk-incompat = trunk + only incompatible changes, periodically kept > up-to-date to trunk > - trunk is always ready to ship > - and no compatible code gets left behind > > The reason for my proposal like this is to address the tension between “there > is lot of compatible code in trunk that we are not shipping” and “don’t ship > trunk, it has incompatibilities”. With this, we will not have (compatible) > code not getting shipped to users. > > Obviously, we can forget about all of my proposal completely if everyone puts > in all compatible code into branch-2 / branch-3 or whatever the main > releasable branch is. This didn’t work in practice, have seen this not > happening prominently during 0.21, and now 3.x. > > There is another related issue - "my feature is nearly ready, so I’ll just > merge it into trunk as we don’t release that anyways, but not the current > releasable branch - I’m lazy to fix the last few stability related issues”. > With this, we will (should) get more disciplined, take feature stability on a > branch seriously and merge a feature branch only when it is truly ready! > >> For 3.x, my strawman was to release off trunk for the alphas, then branch a >> branch-3 for the beta and onwards. > > > Repeating above, I’m proposing continuing to make GA 3.x releases also off of > trunk! This way only incompatible changes don’t get shipped to users - by > design! Eventually, trunk-incompat will be latest 3.x GA + enough > incompatible code to warrant a 4.x, 5.x etc. > > +Vinod
Re: Looking to a Hadoop 3 release
Tx for your replies, Andrew. >> For exit criteria, how about we time box it? My plan was to do monthly > alphas through the summer, leading up to beta in late August / early Sep. > At that point we freeze and stabilize for GA in Nov/Dec. Time-boxing is a reasonable exit-criterion. > In this case, does trunk-incompat essentially become the new trunk? Or are > we treating trunk-incompat as a feature branch, which periodically merges > changes from trunk? It’s the later. Essentially - trunk-incompat = trunk + only incompatible changes, periodically kept up-to-date to trunk - trunk is always ready to ship - and no compatible code gets left behind The reason for my proposal like this is to address the tension between “there is lot of compatible code in trunk that we are not shipping” and “don’t ship trunk, it has incompatibilities”. With this, we will not have (compatible) code not getting shipped to users. Obviously, we can forget about all of my proposal completely if everyone puts in all compatible code into branch-2 / branch-3 or whatever the main releasable branch is. This didn’t work in practice, have seen this not happening prominently during 0.21, and now 3.x. There is another related issue - "my feature is nearly ready, so I’ll just merge it into trunk as we don’t release that anyways, but not the current releasable branch - I’m lazy to fix the last few stability related issues”. With this, we will (should) get more disciplined, take feature stability on a branch seriously and merge a feature branch only when it is truly ready! > For 3.x, my strawman was to release off trunk for the alphas, then branch a > branch-3 for the beta and onwards. Repeating above, I’m proposing continuing to make GA 3.x releases also off of trunk! This way only incompatible changes don’t get shipped to users - by design! Eventually, trunk-incompat will be latest 3.x GA + enough incompatible code to warrant a 4.x, 5.x etc. +Vinod
Re: Looking to a Hadoop 3 release
> On Apr 22, 2016, at 6:10 PM, Vinod Kumar Vavilapalli> wrote: > > Nope. > > I’m proposing making a new 3.x release (as has been discussed in this thread) > off today’s trunk (instead of creating a fresh branch-3) and create a new > trunk-incompt where incompatible changes that we don’t want in 3.x go. > > This is mainly to avoid repeating the “we are not releasing 3.x off trunk” > issue when we start thinking about 4.x or any such major release in the > future. The only difference between “we aren’t releasing 4.x off of trunk” and “we aren’t releasing 4.x off of trunk-incompat” is 10 characters.
Re: Looking to a Hadoop 3 release
Great comments Vinod, thanks for replying. Since trunk is a superset of branch-2.8, I think the two efforts are mostly aligned. The 2.8 blockers are likely also 3.0 blockers. For example, the create-release and L JIRAs I mentioned are in this camp. The difference between the two is the expectation as to the level of quality. Once we get create-release and L settled, I think it's ready for an alpha. Yes, this means we ship with some known issues, but right now there's no 3.0 artifact for downstreams to compile and test against. Considering that we're shipping incompatible changes, I want to give downstreams as much opportunity to give feedback as possible. While welcoming the push for alphas, i think we should set some exit > criteria. Otherwise, I can imagine us doing 3/4/5 alpha releases, and then > getting restless about calling it beta or GA of whatever. Essentially, > instead of today’s questions as to "why we aren’t doing a 3.x release", > we’d be fielding a "why is 3.x still considered alpha” question. This > happened with 2.x alpha releases too and it wasn’t fun. > > For exit criteria, how about we time box it? My plan was to do monthly alphas through the summer, leading up to beta in late August / early Sep. At that point we freeze and stabilize for GA in Nov/Dec. I think we all have an interest in declaring beta/GA, no one wants eternal alpha releases. On an unrelated note, offline I was pitching to a bunch of contributors > another idea to deal with rotting trunk post 3.x: *Make 3.x releases off of > trunk directly*. > > What this gains us is that > - Trunk is always nearly stable or nearly ready for releases > - We no longer have some code lying around in some branch (today’s trunk) > that is not releasable because it gets mixed with other undesirable and > incompatible changes. > - This needs to be coupled with more discipline on individual features - > medium to to large features are always worked upon in branches and get > merged into trunk (and a nearing release!) when they are ready > - All incompatible changes go into some sort of a trunk-incompat branch > and stay there till we accumulate enough of those to warrant another major > release. > In this case, does trunk-incompat essentially become the new trunk? Or are we treating trunk-incompat as a feature branch, which periodically merges changes from trunk? Linux has a "next" branch for separate from master for integrating pending feature branches. I think this is a good model, and would be even better if we published artifacts to assist with testing. However, that depends on someone stepping up to be the maintainer of the integration branch. I really like a more stringent policy around branch merges and new feature development. That'd be great. For 3.x, my strawman was to release off trunk for the alphas, then branch a branch-3 for the beta and onwards. Best, Andrew
Re: Looking to a Hadoop 3 release
Nope. I’m proposing making a new 3.x release (as has been discussed in this thread) off today’s trunk (instead of creating a fresh branch-3) and create a new trunk-incompt where incompatible changes that we don’t want in 3.x go. This is mainly to avoid repeating the “we are not releasing 3.x off trunk” issue when we start thinking about 4.x or any such major release in the future. We’ll do 2.8.x independently and later figure out if 2.9 is needed or not. +Vinod > On Apr 22, 2016, at 5:59 PM, Allen Wittenauerwrote: > > >> On Apr 22, 2016, at 5:38 PM, Vinod Kumar Vavilapalli >> wrote: >> >> On an unrelated note, offline I was pitching to a bunch of contributors >> another idea to deal with rotting trunk post 3.x: *Make 3.x releases off of >> trunk directly*. >> >> What this gains us is that >> - Trunk is always nearly stable or nearly ready for releases >> - We no longer have some code lying around in some branch (today’s trunk) >> that is not releasable because it gets mixed with other undesirable and >> incompatible changes. >> - This needs to be coupled with more discipline on individual features - >> medium to to large features are always worked upon in branches and get >> merged into trunk (and a nearing release!) when they are ready >> - All incompatible changes go into some sort of a trunk-incompat branch and >> stay there till we accumulate enough of those to warrant another major >> release. >> >> Thoughts? > > Unless I’m missing something, all this proposal does is (using today’s > branch names) effectively rename trunk to trunk-incompat and branch-2 to > trunk. I’m unclear how moving "rotting trunk” to “rotting trunk-incompat” is > really progress. > >
Re: Looking to a Hadoop 3 release
> On Apr 22, 2016, at 5:38 PM, Vinod Kumar Vavilapalli> wrote: > > On an unrelated note, offline I was pitching to a bunch of contributors > another idea to deal with rotting trunk post 3.x: *Make 3.x releases off of > trunk directly*. > > What this gains us is that > - Trunk is always nearly stable or nearly ready for releases > - We no longer have some code lying around in some branch (today’s trunk) > that is not releasable because it gets mixed with other undesirable and > incompatible changes. > - This needs to be coupled with more discipline on individual features - > medium to to large features are always worked upon in branches and get merged > into trunk (and a nearing release!) when they are ready > - All incompatible changes go into some sort of a trunk-incompat branch and > stay there till we accumulate enough of those to warrant another major > release. > > Thoughts? Unless I’m missing something, all this proposal does is (using today’s branch names) effectively rename trunk to trunk-incompat and branch-2 to trunk. I’m unclear how moving "rotting trunk” to “rotting trunk-incompat” is really progress.
Re: Looking to a Hadoop 3 release
.0) >>>> and two stable releases (2.6.x and 2.7.x). It brings a lot of >> challenges in >>>> issues tracking and patch committing, not even mention the tremendous >>>> effort of release verification and voting. >>>>>> I would like to propose to wait 2.8 release become stable (may be 2nd >>>> release in 2.8 branch cause first release is alpha due to discussion in >>>> another email thread), then we can move to 3.0 as the only alpha >> release. >>>> In the meantime, we can bring more significant features (like ATS v2, >> etc.) >>>> to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe >> that >>>> make life easier. :) >>>>>> Thoughts? >>>>>> >>>>> >>>>> 2.8.0 is relatively close to shipping. I say relatively as I'm doing >>>> some work with ATS 1.5 downstream and I'd like to make sure all that >> works. >>>> There's also a large collection of S3 and swift patches needing >> attention >>>> from any reviewers with time and credentials. >>>>> >>>>> 3.x is going to take multiple iterations to stabilise, and with more >>>> changes, more significant a rollout. I'd also like to do a complete >> update >>>> of all the dependencies before a final release, so we can have less >>>> pressure to upgrade for a while, and get Sean's classloader patch in so >>>> it's slightly less visible. >>>>> >>>>> That means 3.0 is going to be an alpha release, not final. >>>>> >>>>> one thing that could be shared is any build.xml automation of the >>>> release process, to at least take away most of the manual steps in the >>>> process, to have something more repeatable. >>>>> >>>>> -steve >>>>> >>>>> >>>>>> Thanks, >>>>>> >>>>>> Junping >>>>>> >>>>>> From: Yongjun Zhang <yzh...@cloudera.com> >>>>>> Sent: Friday, February 19, 2016 8:05 PM >>>>>> To: hdfs-...@hadoop.apache.org >>>>>> Cc: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; >>>> yarn-...@hadoop.apache.org >>>>>> Subject: Re: Looking to a Hadoop 3 release >>>>>> >>>>>> Thanks Andrew for initiating the effort! >>>>>> >>>>>> +1 on pushing 3.x with extended alpha cycle, and continuing the more >>>> stable >>>>>> 2.x releases. >>>>>> >>>>>> --Yongjun >>>>>> >>>>>> On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang < >> andrew.w...@cloudera.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Kai, >>>>>>> >>>>>>> Sure, I'm open to it. It's a new major release, so we're allowed to >>>> make >>>>>>> these kinds of big changes. The idea behind the extended alpha >> cycle is >>>>>>> that downstreams can give us feedback. This way if we do anything >> too >>>>>>> radical, we can address it in the next alpha and have downstreams >>>> re-test. >>>>>>> >>>>>>> Best, >>>>>>> Andrew >>>>>>> >>>>>>> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <kai.zh...@intel.com> >>>> wrote: >>>>>>> >>>>>>>> Thanks Andrew for driving this. Wonder if it's a good chance for >>>>>>>> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. >> Note >>>>>>> it's >>>>>>>> not an incompatible change, but feel better to be done in the major >>>>>>> release. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Kai >>>>>>>> >>>>>>>> -Original Message- >>>>>>>> From: Andrew Wang [mailto:andrew.w...@cloudera.com] >>>>>>>> Sent: Friday, February 19, 2016 7:04 AM >>>>>>>> To: hdfs-...@hadoop.apache.org; Kihwal Lee <kih...@yahoo-inc.com> >>>>>>>> Cc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org; >>>>>>>> yarn-...@hadoop.a
Re: Looking to a Hadoop 3 release
I kind of echo Junping’s comment too. While 2.8 and 3.0 don’t need to be serialized in theory, in practice I’m desperately looking for help on 2.8.0. We haven’t been converging on 2.8.0 what with 50+ blocker / critical patches still unfinished. If postponing 3.x alpha to after a 2.8.0 alpha means undivided attention from the community, I’d strongly root for such a proposal. Thanks +Vinod > On Feb 20, 2016, at 9:07 PM, Andrew Wangwrote: > > Hi Junping, thanks for the mail, inline: > > On Sat, Feb 20, 2016 at 7:34 AM, Junping Du wrote: > >> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds >> reasonable to have two alpha releases to go in parallel. Is EC feature the >> main motivation of releasing hadoop 3 here? If so, I don't understand why >> this feature cannot land on 2.8.x or 2.9.x as an alpha feature. >> > > EC is one motivation, there are others too (JDK8, shell scripts, jar > bumps). I'm open to EC going into branch-2, but I haven't seen any > backporting yet and it's a lot of code. > > >> If we release 3.0 in a month like plan proposed below, it means we will >> have 4 active releases going in parallel - two alpha releases (2.8 and 3.0) >> and two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in >> issues tracking and patch committing, not even mention the tremendous >> effort of release verification and voting. >> I would like to propose to wait 2.8 release become stable (may be 2nd >> release in 2.8 branch cause first release is alpha due to discussion in >> another email thread), then we can move to 3.0 as the only alpha release. >> In the meantime, we can bring more significant features (like ATS v2, etc.) >> to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe that >> make life easier. :) >> Thoughts? >> >> Based on some earlier mails in this chain, I was planning to release off > trunk. This way we avoid having to commit to yet-another-branch, and makes > tracking easier since trunk will always be a superset of the branch-2's. > This does mean though that trunk needs to be stable, and we need to be more > judicious with branch merges, and quickly revert broken code. > > Regarding RM/voting/validation efforts, Steve mentioned some scripts that > he uses to automate Slider releases. This is something I'd like to bring > over to Hadoop. Ideally, publishing an RC is push-button, and it comes with > automated validation. I think this will help with the overhead. Also, since > these will be early alphas, and there will be a lot of them, I'm not > expecting anyone to do endurance runs on a large cluster before casting a > +1. > > Best, > Andrew
Re: Looking to a Hadoop 3 release
Hi folks, Very optimistically, we're still on track for a 3.0 alpha this month. Here's a JIRA query for 3.0 and 2.8: https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20MAPREDUCE%2C%20YARN)%20AND%20%22Target%20Version%2Fs%22%20in%20(3.0.0%2C%202.8.0)%20AND%20statusCategory%20not%20in%20(Complete)%20ORDER%20BY%20priority I think two of these are true alpha blockers: HADOOP-12892 and HADOOP-12893. I'm trying to help push both of those forward. For the rest, I think it's probably okay to delay until the next alpha, since we're planning a few alphas leading up to beta. That said, if you are the owner of a Blocker targeted at 3.0.0, I'd encourage reviving those patches. The earlier the better for incompatible changes. In all likelihood, this first release will slip into early May, but I'll be disappointed if we don't have an RC out before ApacheCon. Best, Andrew On Mon, Feb 22, 2016 at 3:19 PM, Colin P. McCabe <cmcc...@apache.org> wrote: > I think starting a 3.0 alpha soon would be a great idea. As some > other people commented, this would come with no compatibility > guarantees, so that we can iron out any issues. > > Colin > > On Mon, Feb 22, 2016 at 1:26 PM, Zhe Zhang <zhezh...@cloudera.com> wrote: > > Thanks Andrew for driving the effort! > > > > +1 (non-binding) on starting the 3.0 release process now with 3.0 as an > > alpha. > > > > I wanted to echo Andrew's point that backporting EC to branch-2 is a lot > of > > work. Considering that no concrete backporting plan has been proposed, it > > seems quite uncertain whether / when it can be released in 2.9. I think > we > > should rather concentrate our EC dev efforts to harden key features under > > the follow-on umbrella HDFS-8031 and make it solid for a 3.0 release. > > > > Sincerely, > > Zhe > > > > On Mon, Feb 22, 2016 at 9:25 AM Colin P. McCabe <cmcc...@apache.org> > wrote: > > > >> +1 for a release of 3.0. There are a lot of significant, > >> compatibility-breaking, but necessary changes in this release... we've > >> touched on some of them in this thread. > >> > >> +1 for a parallel release of 2.8 as well. I think we are pretty close > >> to this, barring a dozen or so blockers. > >> > >> best, > >> Colin > >> > >> On Mon, Feb 22, 2016 at 2:56 AM, Steve Loughran <ste...@hortonworks.com > > > >> wrote: > >> > > >> >> On 20 Feb 2016, at 15:34, Junping Du <j...@hortonworks.com> wrote: > >> >> > >> >> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds > >> reasonable to have two alpha releases to go in parallel. Is EC feature > the > >> main motivation of releasing hadoop 3 here? If so, I don't understand > why > >> this feature cannot land on 2.8.x or 2.9.x as an alpha feature. > >> > > >> > > >> > > >> >> If we release 3.0 in a month like plan proposed below, it means we > will > >> have 4 active releases going in parallel - two alpha releases (2.8 and > 3.0) > >> and two stable releases (2.6.x and 2.7.x). It brings a lot of > challenges in > >> issues tracking and patch committing, not even mention the tremendous > >> effort of release verification and voting. > >> >> I would like to propose to wait 2.8 release become stable (may be 2nd > >> release in 2.8 branch cause first release is alpha due to discussion in > >> another email thread), then we can move to 3.0 as the only alpha > release. > >> In the meantime, we can bring more significant features (like ATS v2, > etc.) > >> to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe > that > >> make life easier. :) > >> >> Thoughts? > >> >> > >> > > >> > 2.8.0 is relatively close to shipping. I say relatively as I'm doing > >> some work with ATS 1.5 downstream and I'd like to make sure all that > works. > >> There's also a large collection of S3 and swift patches needing > attention > >> from any reviewers with time and credentials. > >> > > >> > 3.x is going to take multiple iterations to stabilise, and with more > >> changes, more significant a rollout. I'd also like to do a complete > update > >> of all the dependencies before a final release, so we can have less > >> pressure to upgrade for a while, and get Sean's classloader patch in so > >> it's slightly less visible. > >> > > >> > That means 3.0 is going to be an alpha release, not final. >
Re: Looking to a Hadoop 3 release
I think starting a 3.0 alpha soon would be a great idea. As some other people commented, this would come with no compatibility guarantees, so that we can iron out any issues. Colin On Mon, Feb 22, 2016 at 1:26 PM, Zhe Zhang <zhezh...@cloudera.com> wrote: > Thanks Andrew for driving the effort! > > +1 (non-binding) on starting the 3.0 release process now with 3.0 as an > alpha. > > I wanted to echo Andrew's point that backporting EC to branch-2 is a lot of > work. Considering that no concrete backporting plan has been proposed, it > seems quite uncertain whether / when it can be released in 2.9. I think we > should rather concentrate our EC dev efforts to harden key features under > the follow-on umbrella HDFS-8031 and make it solid for a 3.0 release. > > Sincerely, > Zhe > > On Mon, Feb 22, 2016 at 9:25 AM Colin P. McCabe <cmcc...@apache.org> wrote: > >> +1 for a release of 3.0. There are a lot of significant, >> compatibility-breaking, but necessary changes in this release... we've >> touched on some of them in this thread. >> >> +1 for a parallel release of 2.8 as well. I think we are pretty close >> to this, barring a dozen or so blockers. >> >> best, >> Colin >> >> On Mon, Feb 22, 2016 at 2:56 AM, Steve Loughran <ste...@hortonworks.com> >> wrote: >> > >> >> On 20 Feb 2016, at 15:34, Junping Du <j...@hortonworks.com> wrote: >> >> >> >> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds >> reasonable to have two alpha releases to go in parallel. Is EC feature the >> main motivation of releasing hadoop 3 here? If so, I don't understand why >> this feature cannot land on 2.8.x or 2.9.x as an alpha feature. >> > >> > >> > >> >> If we release 3.0 in a month like plan proposed below, it means we will >> have 4 active releases going in parallel - two alpha releases (2.8 and 3.0) >> and two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in >> issues tracking and patch committing, not even mention the tremendous >> effort of release verification and voting. >> >> I would like to propose to wait 2.8 release become stable (may be 2nd >> release in 2.8 branch cause first release is alpha due to discussion in >> another email thread), then we can move to 3.0 as the only alpha release. >> In the meantime, we can bring more significant features (like ATS v2, etc.) >> to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe that >> make life easier. :) >> >> Thoughts? >> >> >> > >> > 2.8.0 is relatively close to shipping. I say relatively as I'm doing >> some work with ATS 1.5 downstream and I'd like to make sure all that works. >> There's also a large collection of S3 and swift patches needing attention >> from any reviewers with time and credentials. >> > >> > 3.x is going to take multiple iterations to stabilise, and with more >> changes, more significant a rollout. I'd also like to do a complete update >> of all the dependencies before a final release, so we can have less >> pressure to upgrade for a while, and get Sean's classloader patch in so >> it's slightly less visible. >> > >> > That means 3.0 is going to be an alpha release, not final. >> > >> > one thing that could be shared is any build.xml automation of the >> release process, to at least take away most of the manual steps in the >> process, to have something more repeatable. >> > >> > -steve >> > >> > >> >> Thanks, >> >> >> >> Junping >> >> >> >> From: Yongjun Zhang <yzh...@cloudera.com> >> >> Sent: Friday, February 19, 2016 8:05 PM >> >> To: hdfs-...@hadoop.apache.org >> >> Cc: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; >> yarn-...@hadoop.apache.org >> >> Subject: Re: Looking to a Hadoop 3 release >> >> >> >> Thanks Andrew for initiating the effort! >> >> >> >> +1 on pushing 3.x with extended alpha cycle, and continuing the more >> stable >> >> 2.x releases. >> >> >> >> --Yongjun >> >> >> >> On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang <andrew.w...@cloudera.com> >> >> wrote: >> >> >> >>> Hi Kai, >> >>> >> >>> Sure, I'm open to it. It's a new major release, so we're allowed to >> make >> >>> these kinds of big changes. The ide
Re: Looking to a Hadoop 3 release
Thanks Andrew for driving the effort! +1 (non-binding) on starting the 3.0 release process now with 3.0 as an alpha. I wanted to echo Andrew's point that backporting EC to branch-2 is a lot of work. Considering that no concrete backporting plan has been proposed, it seems quite uncertain whether / when it can be released in 2.9. I think we should rather concentrate our EC dev efforts to harden key features under the follow-on umbrella HDFS-8031 and make it solid for a 3.0 release. Sincerely, Zhe On Mon, Feb 22, 2016 at 9:25 AM Colin P. McCabe <cmcc...@apache.org> wrote: > +1 for a release of 3.0. There are a lot of significant, > compatibility-breaking, but necessary changes in this release... we've > touched on some of them in this thread. > > +1 for a parallel release of 2.8 as well. I think we are pretty close > to this, barring a dozen or so blockers. > > best, > Colin > > On Mon, Feb 22, 2016 at 2:56 AM, Steve Loughran <ste...@hortonworks.com> > wrote: > > > >> On 20 Feb 2016, at 15:34, Junping Du <j...@hortonworks.com> wrote: > >> > >> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds > reasonable to have two alpha releases to go in parallel. Is EC feature the > main motivation of releasing hadoop 3 here? If so, I don't understand why > this feature cannot land on 2.8.x or 2.9.x as an alpha feature. > > > > > > > >> If we release 3.0 in a month like plan proposed below, it means we will > have 4 active releases going in parallel - two alpha releases (2.8 and 3.0) > and two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in > issues tracking and patch committing, not even mention the tremendous > effort of release verification and voting. > >> I would like to propose to wait 2.8 release become stable (may be 2nd > release in 2.8 branch cause first release is alpha due to discussion in > another email thread), then we can move to 3.0 as the only alpha release. > In the meantime, we can bring more significant features (like ATS v2, etc.) > to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe that > make life easier. :) > >> Thoughts? > >> > > > > 2.8.0 is relatively close to shipping. I say relatively as I'm doing > some work with ATS 1.5 downstream and I'd like to make sure all that works. > There's also a large collection of S3 and swift patches needing attention > from any reviewers with time and credentials. > > > > 3.x is going to take multiple iterations to stabilise, and with more > changes, more significant a rollout. I'd also like to do a complete update > of all the dependencies before a final release, so we can have less > pressure to upgrade for a while, and get Sean's classloader patch in so > it's slightly less visible. > > > > That means 3.0 is going to be an alpha release, not final. > > > > one thing that could be shared is any build.xml automation of the > release process, to at least take away most of the manual steps in the > process, to have something more repeatable. > > > > -steve > > > > > >> Thanks, > >> > >> Junping > >> ____ > >> From: Yongjun Zhang <yzh...@cloudera.com> > >> Sent: Friday, February 19, 2016 8:05 PM > >> To: hdfs-...@hadoop.apache.org > >> Cc: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; > yarn-...@hadoop.apache.org > >> Subject: Re: Looking to a Hadoop 3 release > >> > >> Thanks Andrew for initiating the effort! > >> > >> +1 on pushing 3.x with extended alpha cycle, and continuing the more > stable > >> 2.x releases. > >> > >> --Yongjun > >> > >> On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang <andrew.w...@cloudera.com> > >> wrote: > >> > >>> Hi Kai, > >>> > >>> Sure, I'm open to it. It's a new major release, so we're allowed to > make > >>> these kinds of big changes. The idea behind the extended alpha cycle is > >>> that downstreams can give us feedback. This way if we do anything too > >>> radical, we can address it in the next alpha and have downstreams > re-test. > >>> > >>> Best, > >>> Andrew > >>> > >>> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <kai.zh...@intel.com> > wrote: > >>> > >>>> Thanks Andrew for driving this. Wonder if it's a good chance for > >>>> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note > >>> it's > >>>> not an
Re: Looking to a Hadoop 3 release
+1 for a release of 3.0. There are a lot of significant, compatibility-breaking, but necessary changes in this release... we've touched on some of them in this thread. +1 for a parallel release of 2.8 as well. I think we are pretty close to this, barring a dozen or so blockers. best, Colin On Mon, Feb 22, 2016 at 2:56 AM, Steve Loughran <ste...@hortonworks.com> wrote: > >> On 20 Feb 2016, at 15:34, Junping Du <j...@hortonworks.com> wrote: >> >> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds >> reasonable to have two alpha releases to go in parallel. Is EC feature the >> main motivation of releasing hadoop 3 here? If so, I don't understand why >> this feature cannot land on 2.8.x or 2.9.x as an alpha feature. > > > >> If we release 3.0 in a month like plan proposed below, it means we will have >> 4 active releases going in parallel - two alpha releases (2.8 and 3.0) and >> two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in >> issues tracking and patch committing, not even mention the tremendous effort >> of release verification and voting. >> I would like to propose to wait 2.8 release become stable (may be 2nd >> release in 2.8 branch cause first release is alpha due to discussion in >> another email thread), then we can move to 3.0 as the only alpha release. In >> the meantime, we can bring more significant features (like ATS v2, etc.) to >> trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe that >> make life easier. :) >> Thoughts? >> > > 2.8.0 is relatively close to shipping. I say relatively as I'm doing some > work with ATS 1.5 downstream and I'd like to make sure all that works. > There's also a large collection of S3 and swift patches needing attention > from any reviewers with time and credentials. > > 3.x is going to take multiple iterations to stabilise, and with more changes, > more significant a rollout. I'd also like to do a complete update of all the > dependencies before a final release, so we can have less pressure to upgrade > for a while, and get Sean's classloader patch in so it's slightly less > visible. > > That means 3.0 is going to be an alpha release, not final. > > one thing that could be shared is any build.xml automation of the release > process, to at least take away most of the manual steps in the process, to > have something more repeatable. > > -steve > > >> Thanks, >> >> Junping >> >> From: Yongjun Zhang <yzh...@cloudera.com> >> Sent: Friday, February 19, 2016 8:05 PM >> To: hdfs-...@hadoop.apache.org >> Cc: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; >> yarn-...@hadoop.apache.org >> Subject: Re: Looking to a Hadoop 3 release >> >> Thanks Andrew for initiating the effort! >> >> +1 on pushing 3.x with extended alpha cycle, and continuing the more stable >> 2.x releases. >> >> --Yongjun >> >> On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang <andrew.w...@cloudera.com> >> wrote: >> >>> Hi Kai, >>> >>> Sure, I'm open to it. It's a new major release, so we're allowed to make >>> these kinds of big changes. The idea behind the extended alpha cycle is >>> that downstreams can give us feedback. This way if we do anything too >>> radical, we can address it in the next alpha and have downstreams re-test. >>> >>> Best, >>> Andrew >>> >>> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <kai.zh...@intel.com> wrote: >>> >>>> Thanks Andrew for driving this. Wonder if it's a good chance for >>>> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note >>> it's >>>> not an incompatible change, but feel better to be done in the major >>> release. >>>> >>>> Regards, >>>> Kai >>>> >>>> -Original Message- >>>> From: Andrew Wang [mailto:andrew.w...@cloudera.com] >>>> Sent: Friday, February 19, 2016 7:04 AM >>>> To: hdfs-...@hadoop.apache.org; Kihwal Lee <kih...@yahoo-inc.com> >>>> Cc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org; >>>> yarn-...@hadoop.apache.org >>>> Subject: Re: Looking to a Hadoop 3 release >>>> >>>> Hi Kihwal, >>>> >>>> I think there's still value in continuing the 2.x releases. 3.x comes >>> with >>>> the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't >>>> be be
Re: Looking to a Hadoop 3 release
> On 20 Feb 2016, at 15:34, Junping Du <j...@hortonworks.com> wrote: > > Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds reasonable > to have two alpha releases to go in parallel. Is EC feature the main > motivation of releasing hadoop 3 here? If so, I don't understand why this > feature cannot land on 2.8.x or 2.9.x as an alpha feature. > If we release 3.0 in a month like plan proposed below, it means we will have > 4 active releases going in parallel - two alpha releases (2.8 and 3.0) and > two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in > issues tracking and patch committing, not even mention the tremendous effort > of release verification and voting. > I would like to propose to wait 2.8 release become stable (may be 2nd release > in 2.8 branch cause first release is alpha due to discussion in another email > thread), then we can move to 3.0 as the only alpha release. In the meantime, > we can bring more significant features (like ATS v2, etc.) to trunk and > consolidate stable releases in 2.6.x and 2.7.x. I believe that make life > easier. :) > Thoughts? > 2.8.0 is relatively close to shipping. I say relatively as I'm doing some work with ATS 1.5 downstream and I'd like to make sure all that works. There's also a large collection of S3 and swift patches needing attention from any reviewers with time and credentials. 3.x is going to take multiple iterations to stabilise, and with more changes, more significant a rollout. I'd also like to do a complete update of all the dependencies before a final release, so we can have less pressure to upgrade for a while, and get Sean's classloader patch in so it's slightly less visible. That means 3.0 is going to be an alpha release, not final. one thing that could be shared is any build.xml automation of the release process, to at least take away most of the manual steps in the process, to have something more repeatable. -steve > Thanks, > > Junping > > From: Yongjun Zhang <yzh...@cloudera.com> > Sent: Friday, February 19, 2016 8:05 PM > To: hdfs-...@hadoop.apache.org > Cc: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; > yarn-...@hadoop.apache.org > Subject: Re: Looking to a Hadoop 3 release > > Thanks Andrew for initiating the effort! > > +1 on pushing 3.x with extended alpha cycle, and continuing the more stable > 2.x releases. > > --Yongjun > > On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang <andrew.w...@cloudera.com> > wrote: > >> Hi Kai, >> >> Sure, I'm open to it. It's a new major release, so we're allowed to make >> these kinds of big changes. The idea behind the extended alpha cycle is >> that downstreams can give us feedback. This way if we do anything too >> radical, we can address it in the next alpha and have downstreams re-test. >> >> Best, >> Andrew >> >> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <kai.zh...@intel.com> wrote: >> >>> Thanks Andrew for driving this. Wonder if it's a good chance for >>> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note >> it's >>> not an incompatible change, but feel better to be done in the major >> release. >>> >>> Regards, >>> Kai >>> >>> -Original Message----- >>> From: Andrew Wang [mailto:andrew.w...@cloudera.com] >>> Sent: Friday, February 19, 2016 7:04 AM >>> To: hdfs-...@hadoop.apache.org; Kihwal Lee <kih...@yahoo-inc.com> >>> Cc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org; >>> yarn-...@hadoop.apache.org >>> Subject: Re: Looking to a Hadoop 3 release >>> >>> Hi Kihwal, >>> >>> I think there's still value in continuing the 2.x releases. 3.x comes >> with >>> the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't >>> be beta or GA for some number of months. In the meanwhile, it'd be good >> to >>> keep putting out regular, stable 2.x releases. >>> >>> Best, >>> Andrew >>> >>> >>> On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid >>> >>> wrote: >>> >>>> Moving Hadoop 3 forward sounds fine. If EC is one of the main >>>> motivations, are we getting rid of branch-2.8? >>>> >>>> Kihwal >>>> >>>> From: Andrew Wang <andrew.w...@cloudera.com> >>>> To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org> >>>> Cc: "yarn-...@hadoop.apache
Re: Looking to a Hadoop 3 release
Hi Junping, thanks for the mail, inline: On Sat, Feb 20, 2016 at 7:34 AM, Junping Duwrote: > Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds > reasonable to have two alpha releases to go in parallel. Is EC feature the > main motivation of releasing hadoop 3 here? If so, I don't understand why > this feature cannot land on 2.8.x or 2.9.x as an alpha feature. > EC is one motivation, there are others too (JDK8, shell scripts, jar bumps). I'm open to EC going into branch-2, but I haven't seen any backporting yet and it's a lot of code. > If we release 3.0 in a month like plan proposed below, it means we will > have 4 active releases going in parallel - two alpha releases (2.8 and 3.0) > and two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in > issues tracking and patch committing, not even mention the tremendous > effort of release verification and voting. > I would like to propose to wait 2.8 release become stable (may be 2nd > release in 2.8 branch cause first release is alpha due to discussion in > another email thread), then we can move to 3.0 as the only alpha release. > In the meantime, we can bring more significant features (like ATS v2, etc.) > to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe that > make life easier. :) > Thoughts? > > Based on some earlier mails in this chain, I was planning to release off trunk. This way we avoid having to commit to yet-another-branch, and makes tracking easier since trunk will always be a superset of the branch-2's. This does mean though that trunk needs to be stable, and we need to be more judicious with branch merges, and quickly revert broken code. Regarding RM/voting/validation efforts, Steve mentioned some scripts that he uses to automate Slider releases. This is something I'd like to bring over to Hadoop. Ideally, publishing an RC is push-button, and it comes with automated validation. I think this will help with the overhead. Also, since these will be early alphas, and there will be a lot of them, I'm not expecting anyone to do endurance runs on a large cluster before casting a +1. Best, Andrew
Re: Looking to a Hadoop 3 release
Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds reasonable to have two alpha releases to go in parallel. Is EC feature the main motivation of releasing hadoop 3 here? If so, I don't understand why this feature cannot land on 2.8.x or 2.9.x as an alpha feature. If we release 3.0 in a month like plan proposed below, it means we will have 4 active releases going in parallel - two alpha releases (2.8 and 3.0) and two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in issues tracking and patch committing, not even mention the tremendous effort of release verification and voting. I would like to propose to wait 2.8 release become stable (may be 2nd release in 2.8 branch cause first release is alpha due to discussion in another email thread), then we can move to 3.0 as the only alpha release. In the meantime, we can bring more significant features (like ATS v2, etc.) to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe that make life easier. :) Thoughts? Thanks, Junping From: Yongjun Zhang <yzh...@cloudera.com> Sent: Friday, February 19, 2016 8:05 PM To: hdfs-...@hadoop.apache.org Cc: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Looking to a Hadoop 3 release Thanks Andrew for initiating the effort! +1 on pushing 3.x with extended alpha cycle, and continuing the more stable 2.x releases. --Yongjun On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang <andrew.w...@cloudera.com> wrote: > Hi Kai, > > Sure, I'm open to it. It's a new major release, so we're allowed to make > these kinds of big changes. The idea behind the extended alpha cycle is > that downstreams can give us feedback. This way if we do anything too > radical, we can address it in the next alpha and have downstreams re-test. > > Best, > Andrew > > On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <kai.zh...@intel.com> wrote: > > > Thanks Andrew for driving this. Wonder if it's a good chance for > > HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note > it's > > not an incompatible change, but feel better to be done in the major > release. > > > > Regards, > > Kai > > > > -Original Message- > > From: Andrew Wang [mailto:andrew.w...@cloudera.com] > > Sent: Friday, February 19, 2016 7:04 AM > > To: hdfs-...@hadoop.apache.org; Kihwal Lee <kih...@yahoo-inc.com> > > Cc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org; > > yarn-...@hadoop.apache.org > > Subject: Re: Looking to a Hadoop 3 release > > > > Hi Kihwal, > > > > I think there's still value in continuing the 2.x releases. 3.x comes > with > > the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't > > be beta or GA for some number of months. In the meanwhile, it'd be good > to > > keep putting out regular, stable 2.x releases. > > > > Best, > > Andrew > > > > > > On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid > > > > wrote: > > > > > Moving Hadoop 3 forward sounds fine. If EC is one of the main > > > motivations, are we getting rid of branch-2.8? > > > > > > Kihwal > > > > > > From: Andrew Wang <andrew.w...@cloudera.com> > > > To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org> > > > Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; " > > > mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>; > > > hdfs-dev <hdfs-...@hadoop.apache.org> > > > Sent: Thursday, February 18, 2016 4:35 PM > > > Subject: Re: Looking to a Hadoop 3 release > > > > > > Hi all, > > > > > > Reviving this thread. I've seen renewed interest in a trunk release > > > since HDFS erasure coding has not yet made it to branch-2. Along with > > > JDK8, the shell script rewrite, and many other improvements, I think > > > it's time to revisit Hadoop 3.0 release plans. > > > > > > My overall plan is still the same as in my original email: a series of > > > regular alpha releases leading up to beta and GA. Alpha releases make > > > it easier for downstreams to integrate with our code, and making them > > > regular means features can be included when they are ready. > > > > > > I know there are some incompatible changes waiting in the wings (i.e. > > > HDFS-6984 making FileStatus a PB rather than Writable, some of > > > HADOOP-9991 bumping dependency versions) that would be good to get
Re: Looking to a Hadoop 3 release
Thanks Andrew for initiating the effort! +1 on pushing 3.x with extended alpha cycle, and continuing the more stable 2.x releases. --Yongjun On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang <andrew.w...@cloudera.com> wrote: > Hi Kai, > > Sure, I'm open to it. It's a new major release, so we're allowed to make > these kinds of big changes. The idea behind the extended alpha cycle is > that downstreams can give us feedback. This way if we do anything too > radical, we can address it in the next alpha and have downstreams re-test. > > Best, > Andrew > > On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <kai.zh...@intel.com> wrote: > > > Thanks Andrew for driving this. Wonder if it's a good chance for > > HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note > it's > > not an incompatible change, but feel better to be done in the major > release. > > > > Regards, > > Kai > > > > -Original Message- > > From: Andrew Wang [mailto:andrew.w...@cloudera.com] > > Sent: Friday, February 19, 2016 7:04 AM > > To: hdfs-...@hadoop.apache.org; Kihwal Lee <kih...@yahoo-inc.com> > > Cc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org; > > yarn-...@hadoop.apache.org > > Subject: Re: Looking to a Hadoop 3 release > > > > Hi Kihwal, > > > > I think there's still value in continuing the 2.x releases. 3.x comes > with > > the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't > > be beta or GA for some number of months. In the meanwhile, it'd be good > to > > keep putting out regular, stable 2.x releases. > > > > Best, > > Andrew > > > > > > On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid > > > > wrote: > > > > > Moving Hadoop 3 forward sounds fine. If EC is one of the main > > > motivations, are we getting rid of branch-2.8? > > > > > > Kihwal > > > > > > From: Andrew Wang <andrew.w...@cloudera.com> > > > To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org> > > > Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; " > > > mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>; > > > hdfs-dev <hdfs-...@hadoop.apache.org> > > > Sent: Thursday, February 18, 2016 4:35 PM > > > Subject: Re: Looking to a Hadoop 3 release > > > > > > Hi all, > > > > > > Reviving this thread. I've seen renewed interest in a trunk release > > > since HDFS erasure coding has not yet made it to branch-2. Along with > > > JDK8, the shell script rewrite, and many other improvements, I think > > > it's time to revisit Hadoop 3.0 release plans. > > > > > > My overall plan is still the same as in my original email: a series of > > > regular alpha releases leading up to beta and GA. Alpha releases make > > > it easier for downstreams to integrate with our code, and making them > > > regular means features can be included when they are ready. > > > > > > I know there are some incompatible changes waiting in the wings (i.e. > > > HDFS-6984 making FileStatus a PB rather than Writable, some of > > > HADOOP-9991 bumping dependency versions) that would be good to get in. > > > If you have changes like this, please set the target version to 3.0.0 > > > and mark them "Incompatible". We can use this JIRA query to track: > > > > > > > > > https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2 > > > 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20% > > > 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado > > > op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority > > > > > > There's some release-related stuff that needs to be sorted out > > > (namely, the new CHANGES.txt and release note generation from Yetus), > > > but I'd tentatively like to roll the first alpha a month out, so third > > > week of March. > > > > > > Best, > > > Andrew > > > > > > On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rst...@altiscale.com> > > wrote: > > > > > > > Avoiding the use of JDK8 language features (and, presumably, APIs) > > > > means you've abandoned #1, i.e., you haven't (really) bumped the JDK > > > > source version to JDK8. > > > > > > > > Also, note that releasing from trunk
Re: Looking to a Hadoop 3 release
+1 for the plan to start cutting 3.x alpha releases. Thanks for the initiative Andrew! On Fri, Feb 19, 2016 at 6:19 AM, Steve Loughranwrote: > > > On 19 Feb 2016, at 11:27, Dmitry Sivachenko wrote: > > > > > >> On 19 Feb 2016, at 01:35, Andrew Wang wrote: > >> > >> Hi all, > >> > >> Reviving this thread. I've seen renewed interest in a trunk release > since > >> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, > the > >> shell script rewrite, and many other improvements, I think it's time to > >> revisit Hadoop 3.0 release plans. > >> > > > > It's time to start ... I suspect it'll take a while to stabilise. I look > forward to the new shell scripts already > > One thing I do want there is for all the alpha releases to make clear that > there are no compatibility policies here; protocols may change and there is > no requirement of the first 3.x release to be compatible with all the 3.0.x > alphas. That's something we missed out on the 2.0.x-alpha process, or at > least not repeated often enough. > > > > > Hello, > > > > any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes > out? > > > > Thanks! > > > > > > sounds like a good time for a status update on the FB work —and anything > people can do to test it would be appreciated by all. That includes testing > on ipv4 systems, and especially, IPv4/v6 systems with Kerberos turned on > and both MIT and AD kerberos servers. At the same time, IPv6 support ought > to be something that could be added in. > > > I don't have any opinions on timescale, but > > +1 to anything related to classpath isolation > +1 to a careful bump of versions of dependencies. > +1 to fixing the outstanding Java 8 migration issues, especially the big > Jersey patch that's just been updated. > +1 to switching to JIRA-created release notes > > Having been doing the slider releases recently, it's clear to me that you > can do a lot in automating the release process itself. All those steps in > the release runbook can be turned into targets in a special ant release.xml > build file, calling maven, gpg, etc. > > I think doing something like this for 3.0 will significantly benefit both > the release phase here but the future releases > > This is the slider one: > https://github.com/apache/incubator-slider/blob/develop/bin/release.xml > > It doesn't replace maven, instead it choreographs that along with all the > other steps: signing and checksumming artifacts, publishing them, voting > > it includes > -refusing to release if the git repo is modified > -making the various git branch/tag/push operations > -issuing the various mvn versions:update commands > -signing > -publishing via asf SVN > -using GET calls too verify the artifacts made it > -generating the vote and vote result emails (it even counts the votes) > > I recommend this is included as part of the release process. It does make > a difference; we can now cut new releases with no human intervention other > than editing a properties file and running different targets as the process > goes through its release and vote phases. > > -Steve
Re: Looking to a Hadoop 3 release
> On 19 Feb 2016, at 11:27, Dmitry Sivachenkowrote: > > >> On 19 Feb 2016, at 01:35, Andrew Wang wrote: >> >> Hi all, >> >> Reviving this thread. I've seen renewed interest in a trunk release since >> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the >> shell script rewrite, and many other improvements, I think it's time to >> revisit Hadoop 3.0 release plans. >> > It's time to start ... I suspect it'll take a while to stabilise. I look forward to the new shell scripts already One thing I do want there is for all the alpha releases to make clear that there are no compatibility policies here; protocols may change and there is no requirement of the first 3.x release to be compatible with all the 3.0.x alphas. That's something we missed out on the 2.0.x-alpha process, or at least not repeated often enough. > > Hello, > > any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes out? > > Thanks! > > sounds like a good time for a status update on the FB work —and anything people can do to test it would be appreciated by all. That includes testing on ipv4 systems, and especially, IPv4/v6 systems with Kerberos turned on and both MIT and AD kerberos servers. At the same time, IPv6 support ought to be something that could be added in. I don't have any opinions on timescale, but +1 to anything related to classpath isolation +1 to a careful bump of versions of dependencies. +1 to fixing the outstanding Java 8 migration issues, especially the big Jersey patch that's just been updated. +1 to switching to JIRA-created release notes Having been doing the slider releases recently, it's clear to me that you can do a lot in automating the release process itself. All those steps in the release runbook can be turned into targets in a special ant release.xml build file, calling maven, gpg, etc. I think doing something like this for 3.0 will significantly benefit both the release phase here but the future releases This is the slider one: https://github.com/apache/incubator-slider/blob/develop/bin/release.xml It doesn't replace maven, instead it choreographs that along with all the other steps: signing and checksumming artifacts, publishing them, voting it includes -refusing to release if the git repo is modified -making the various git branch/tag/push operations -issuing the various mvn versions:update commands -signing -publishing via asf SVN -using GET calls too verify the artifacts made it -generating the vote and vote result emails (it even counts the votes) I recommend this is included as part of the release process. It does make a difference; we can now cut new releases with no human intervention other than editing a properties file and running different targets as the process goes through its release and vote phases. -Steve
Re: Looking to a Hadoop 3 release
> On 19 Feb 2016, at 01:35, Andrew Wangwrote: > > Hi all, > > Reviving this thread. I've seen renewed interest in a trunk release since > HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the > shell script rewrite, and many other improvements, I think it's time to > revisit Hadoop 3.0 release plans. > Hello, any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes out? Thanks!
Re: Looking to a Hadoop 3 release
+1 for the 3.0 release plan and continuing 2.x releases. I'm thinking we should consider stopping new 2.x minor releases after 3.x reaches GA. Thanks, Akira On 2/19/16 10:33, Gangumalla, Uma wrote: Yes. I think starting 3.0 release with alpha is good idea. So it would get some time to reach the beta or GA. +1 for the plan. For the compatibility purposes and as current stable versions, we should continue 2.x releases anyway. Thanks Andrew for starting the thread. Regards, Uma On 2/18/16, 3:04 PM, "Andrew Wang" <andrew.w...@cloudera.com> wrote: Hi Kihwal, I think there's still value in continuing the 2.x releases. 3.x comes with the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't be beta or GA for some number of months. In the meanwhile, it'd be good to keep putting out regular, stable 2.x releases. Best, Andrew On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid> wrote: Moving Hadoop 3 forward sounds fine. If EC is one of the main motivations, are we getting rid of branch-2.8? Kihwal From: Andrew Wang <andrew.w...@cloudera.com> To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org> Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; " mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>; hdfs-dev <hdfs-...@hadoop.apache.org> Sent: Thursday, February 18, 2016 4:35 PM Subject: Re: Looking to a Hadoop 3 release Hi all, Reviving this thread. I've seen renewed interest in a trunk release since HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the shell script rewrite, and many other improvements, I think it's time to revisit Hadoop 3.0 release plans. My overall plan is still the same as in my original email: a series of regular alpha releases leading up to beta and GA. Alpha releases make it easier for downstreams to integrate with our code, and making them regular means features can be included when they are ready. I know there are some incompatible changes waiting in the wings (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of HADOOP-9991 bumping dependency versions) that would be good to get in. If you have changes like this, please set the target version to 3.0.0 and mark them "Incompatible". We can use this JIRA query to track: https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HD FS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20% 223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flag s%22%3D%22Incompatible%20change%22%20order%20by%20priority There's some release-related stuff that needs to be sorted out (namely, the new CHANGES.txt and release note generation from Yetus), but I'd tentatively like to roll the first alpha a month out, so third week of March. Best, Andrew On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rst...@altiscale.com> wrote: Avoiding the use of JDK8 language features (and, presumably, APIs) means you've abandoned #1, i.e., you haven't (really) bumped the JDK source version to JDK8. Also, note that releasing from trunk is a way of achieving #3, it's not a way of abandoning it. On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <andrew.w...@cloudera.com> wrote: Hi Raymie, Konst proposed just releasing off of trunk rather than cutting a branch-2, and there was general agreement there. So, consider #3 abandoned. 1&2 can be achieved at the same time, we just need to avoid using JDK8 language features in trunk so things can be backported. Best, Andrew On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rst...@altiscale.com> wrote: In this (and the related threads), I see the following three requirements: 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support). 2. "We'll still be releasing 2.x releases for a while, with similar feature sets as 3.x." 3. Avoid the "risk of split-brain behavior" by "minimize backporting headaches. Pulling trunk > branch-2 > branch-2.x is already tedious. Adding a branch-3, branch-3.x would be obnoxious." These three cannot be achieved at the same time. Which do we abandon? On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sanjayo...@gmail.com> wrote: On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote: 2) Simplification of configs - potentially separating client side configs and those used by daemons. This is another source of perpetual confusion for users. + 1 on this. sanjay
Re: Looking to a Hadoop 3 release
Hi Kai, Sure, I'm open to it. It's a new major release, so we're allowed to make these kinds of big changes. The idea behind the extended alpha cycle is that downstreams can give us feedback. This way if we do anything too radical, we can address it in the next alpha and have downstreams re-test. Best, Andrew On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <kai.zh...@intel.com> wrote: > Thanks Andrew for driving this. Wonder if it's a good chance for > HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note it's > not an incompatible change, but feel better to be done in the major release. > > Regards, > Kai > > -Original Message- > From: Andrew Wang [mailto:andrew.w...@cloudera.com] > Sent: Friday, February 19, 2016 7:04 AM > To: hdfs-...@hadoop.apache.org; Kihwal Lee <kih...@yahoo-inc.com> > Cc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org; > yarn-...@hadoop.apache.org > Subject: Re: Looking to a Hadoop 3 release > > Hi Kihwal, > > I think there's still value in continuing the 2.x releases. 3.x comes with > the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't > be beta or GA for some number of months. In the meanwhile, it'd be good to > keep putting out regular, stable 2.x releases. > > Best, > Andrew > > > On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid> > wrote: > > > Moving Hadoop 3 forward sounds fine. If EC is one of the main > > motivations, are we getting rid of branch-2.8? > > > > Kihwal > > > > From: Andrew Wang <andrew.w...@cloudera.com> > > To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org> > > Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; " > > mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>; > > hdfs-dev <hdfs-...@hadoop.apache.org> > > Sent: Thursday, February 18, 2016 4:35 PM > > Subject: Re: Looking to a Hadoop 3 release > > > > Hi all, > > > > Reviving this thread. I've seen renewed interest in a trunk release > > since HDFS erasure coding has not yet made it to branch-2. Along with > > JDK8, the shell script rewrite, and many other improvements, I think > > it's time to revisit Hadoop 3.0 release plans. > > > > My overall plan is still the same as in my original email: a series of > > regular alpha releases leading up to beta and GA. Alpha releases make > > it easier for downstreams to integrate with our code, and making them > > regular means features can be included when they are ready. > > > > I know there are some incompatible changes waiting in the wings (i.e. > > HDFS-6984 making FileStatus a PB rather than Writable, some of > > HADOOP-9991 bumping dependency versions) that would be good to get in. > > If you have changes like this, please set the target version to 3.0.0 > > and mark them "Incompatible". We can use this JIRA query to track: > > > > > > https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2 > > 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20% > > 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado > > op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority > > > > There's some release-related stuff that needs to be sorted out > > (namely, the new CHANGES.txt and release note generation from Yetus), > > but I'd tentatively like to roll the first alpha a month out, so third > > week of March. > > > > Best, > > Andrew > > > > On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rst...@altiscale.com> > wrote: > > > > > Avoiding the use of JDK8 language features (and, presumably, APIs) > > > means you've abandoned #1, i.e., you haven't (really) bumped the JDK > > > source version to JDK8. > > > > > > Also, note that releasing from trunk is a way of achieving #3, it's > > > not a way of abandoning it. > > > > > > > > > > > > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang > > > <andrew.w...@cloudera.com> > > > wrote: > > > > Hi Raymie, > > > > > > > > Konst proposed just releasing off of trunk rather than cutting a > > > branch-2, > > > > and there was general agreement there. So, consider #3 abandoned. > > > > 1&2 > > can > > > > be achieved at the same time, we just need to avoid using JDK8 > > > > language features in trunk so things can be backported. > > > > > > > > Best,
Re: Looking to a Hadoop 3 release
Another thing to throw in there is the dependency/classpath isolation (HADOOP-11656). Some efforts have already been made by Sean, and it'd be great to complete this to have a much better dependency isolation solution for 3.x. On Thu, Feb 18, 2016 at 5:33 PM, Gangumalla, Uma <uma.ganguma...@intel.com> wrote: > Yes. I think starting 3.0 release with alpha is good idea. So it would get > some time to reach the beta or GA. > > +1 for the plan. > > For the compatibility purposes and as current stable versions, we should > continue 2.x releases anyway. > > Thanks Andrew for starting the thread. > > Regards, > Uma > > On 2/18/16, 3:04 PM, "Andrew Wang" <andrew.w...@cloudera.com> wrote: > > >Hi Kihwal, > > > >I think there's still value in continuing the 2.x releases. 3.x comes with > >the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't > >be beta or GA for some number of months. In the meanwhile, it'd be good to > >keep putting out regular, stable 2.x releases. > > > >Best, > >Andrew > > > > > >On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid > > > >wrote: > > > >> Moving Hadoop 3 forward sounds fine. If EC is one of the main > >>motivations, > >> are we getting rid of branch-2.8? > >> > >> Kihwal > >> > >> From: Andrew Wang <andrew.w...@cloudera.com> > >> To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org> > >> Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; " > >> mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>; > >> hdfs-dev <hdfs-...@hadoop.apache.org> > >> Sent: Thursday, February 18, 2016 4:35 PM > >> Subject: Re: Looking to a Hadoop 3 release > >> > >> Hi all, > >> > >> Reviving this thread. I've seen renewed interest in a trunk release > >>since > >> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, > >>the > >> shell script rewrite, and many other improvements, I think it's time to > >> revisit Hadoop 3.0 release plans. > >> > >> My overall plan is still the same as in my original email: a series of > >> regular alpha releases leading up to beta and GA. Alpha releases make it > >> easier for downstreams to integrate with our code, and making them > >>regular > >> means features can be included when they are ready. > >> > >> I know there are some incompatible changes waiting in the wings > >> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of > >> HADOOP-9991 bumping dependency versions) that would be good to get in. > >>If > >> you have changes like this, please set the target version to 3.0.0 and > >>mark > >> them "Incompatible". We can use this JIRA query to track: > >> > >> > >> > >> > https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HD > >>FS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20% > >>223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flag > >>s%22%3D%22Incompatible%20change%22%20order%20by%20priority > >> > >> There's some release-related stuff that needs to be sorted out (namely, > >>the > >> new CHANGES.txt and release note generation from Yetus), but I'd > >> tentatively like to roll the first alpha a month out, so third week of > >> March. > >> > >> Best, > >> Andrew > >> > >> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rst...@altiscale.com> > >>wrote: > >> > >> > Avoiding the use of JDK8 language features (and, presumably, APIs) > >> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK > >> > source version to JDK8. > >> > > >> > Also, note that releasing from trunk is a way of achieving #3, it's > >> > not a way of abandoning it. > >> > > >> > > >> > > >> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <andrew.w...@cloudera.com > > > >> > wrote: > >> > > Hi Raymie, > >> > > > >> > > Konst proposed just releasing off of trunk rather than cutting a > >> > branch-2, > >> > > and there was general agreement there. So, consider #3 abandoned. > >>1&2 > >> can > >> > > be achieved at the
Re: Looking to a Hadoop 3 release
Yes. I think starting 3.0 release with alpha is good idea. So it would get some time to reach the beta or GA. +1 for the plan. For the compatibility purposes and as current stable versions, we should continue 2.x releases anyway. Thanks Andrew for starting the thread. Regards, Uma On 2/18/16, 3:04 PM, "Andrew Wang" <andrew.w...@cloudera.com> wrote: >Hi Kihwal, > >I think there's still value in continuing the 2.x releases. 3.x comes with >the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't >be beta or GA for some number of months. In the meanwhile, it'd be good to >keep putting out regular, stable 2.x releases. > >Best, >Andrew > > >On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid> >wrote: > >> Moving Hadoop 3 forward sounds fine. If EC is one of the main >>motivations, >> are we getting rid of branch-2.8? >> >> Kihwal >> >> From: Andrew Wang <andrew.w...@cloudera.com> >> To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org> >> Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; " >> mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>; >> hdfs-dev <hdfs-...@hadoop.apache.org> >> Sent: Thursday, February 18, 2016 4:35 PM >> Subject: Re: Looking to a Hadoop 3 release >> >> Hi all, >> >> Reviving this thread. I've seen renewed interest in a trunk release >>since >> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, >>the >> shell script rewrite, and many other improvements, I think it's time to >> revisit Hadoop 3.0 release plans. >> >> My overall plan is still the same as in my original email: a series of >> regular alpha releases leading up to beta and GA. Alpha releases make it >> easier for downstreams to integrate with our code, and making them >>regular >> means features can be included when they are ready. >> >> I know there are some incompatible changes waiting in the wings >> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of >> HADOOP-9991 bumping dependency versions) that would be good to get in. >>If >> you have changes like this, please set the target version to 3.0.0 and >>mark >> them "Incompatible". We can use this JIRA query to track: >> >> >> >>https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HD >>FS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20% >>223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flag >>s%22%3D%22Incompatible%20change%22%20order%20by%20priority >> >> There's some release-related stuff that needs to be sorted out (namely, >>the >> new CHANGES.txt and release note generation from Yetus), but I'd >> tentatively like to roll the first alpha a month out, so third week of >> March. >> >> Best, >> Andrew >> >> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rst...@altiscale.com> >>wrote: >> >> > Avoiding the use of JDK8 language features (and, presumably, APIs) >> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK >> > source version to JDK8. >> > >> > Also, note that releasing from trunk is a way of achieving #3, it's >> > not a way of abandoning it. >> > >> > >> > >> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <andrew.w...@cloudera.com> >> > wrote: >> > > Hi Raymie, >> > > >> > > Konst proposed just releasing off of trunk rather than cutting a >> > branch-2, >> > > and there was general agreement there. So, consider #3 abandoned. >>1&2 >> can >> > > be achieved at the same time, we just need to avoid using JDK8 >>language >> > > features in trunk so things can be backported. >> > > >> > > Best, >> > > Andrew >> > > >> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rst...@altiscale.com> >> > wrote: >> > > >> > >> In this (and the related threads), I see the following three >> > requirements: >> > >> >> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support). >> > >> >> > >> 2. "We'll still be releasing 2.x releases for a while, with similar >> > >> feature sets as 3.x." >> > >> >> > >> 3. Avoid the "risk of split-brain behavior" by "minimize >>backporting >> > >> headaches. Pulling trunk > branch-2 > branch-2.x is already >>tedious. >> > >> Adding a branch-3, branch-3.x would be obnoxious." >> > >> >> > >> These three cannot be achieved at the same time. Which do we >>abandon? >> > >> >> > >> >> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia >><sanjayo...@gmail.com> >> > >> wrote: >> > >> > >> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> >> wrote: >> > >> >> >> > >> >> 2) Simplification of configs - potentially separating client >>side >> > >> configs >> > >> >> and those used by daemons. This is another source of perpetual >> > confusion >> > >> >> for users. >> > >> > + 1 on this. >> > >> > >> > >> > sanjay >> > >> >> > >> >> >>
RE: Looking to a Hadoop 3 release
Thanks Andrew for driving this. Wonder if it's a good chance for HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note it's not an incompatible change, but feel better to be done in the major release. Regards, Kai -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Friday, February 19, 2016 7:04 AM To: hdfs-...@hadoop.apache.org; Kihwal Lee <kih...@yahoo-inc.com> Cc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Looking to a Hadoop 3 release Hi Kihwal, I think there's still value in continuing the 2.x releases. 3.x comes with the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't be beta or GA for some number of months. In the meanwhile, it'd be good to keep putting out regular, stable 2.x releases. Best, Andrew On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid> wrote: > Moving Hadoop 3 forward sounds fine. If EC is one of the main > motivations, are we getting rid of branch-2.8? > > Kihwal > > From: Andrew Wang <andrew.w...@cloudera.com> > To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org> > Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; " > mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>; > hdfs-dev <hdfs-...@hadoop.apache.org> > Sent: Thursday, February 18, 2016 4:35 PM > Subject: Re: Looking to a Hadoop 3 release > > Hi all, > > Reviving this thread. I've seen renewed interest in a trunk release > since HDFS erasure coding has not yet made it to branch-2. Along with > JDK8, the shell script rewrite, and many other improvements, I think > it's time to revisit Hadoop 3.0 release plans. > > My overall plan is still the same as in my original email: a series of > regular alpha releases leading up to beta and GA. Alpha releases make > it easier for downstreams to integrate with our code, and making them > regular means features can be included when they are ready. > > I know there are some incompatible changes waiting in the wings (i.e. > HDFS-6984 making FileStatus a PB rather than Writable, some of > HADOOP-9991 bumping dependency versions) that would be good to get in. > If you have changes like this, please set the target version to 3.0.0 > and mark them "Incompatible". We can use this JIRA query to track: > > > https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2 > 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20% > 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado > op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority > > There's some release-related stuff that needs to be sorted out > (namely, the new CHANGES.txt and release note generation from Yetus), > but I'd tentatively like to roll the first alpha a month out, so third > week of March. > > Best, > Andrew > > On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rst...@altiscale.com> wrote: > > > Avoiding the use of JDK8 language features (and, presumably, APIs) > > means you've abandoned #1, i.e., you haven't (really) bumped the JDK > > source version to JDK8. > > > > Also, note that releasing from trunk is a way of achieving #3, it's > > not a way of abandoning it. > > > > > > > > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang > > <andrew.w...@cloudera.com> > > wrote: > > > Hi Raymie, > > > > > > Konst proposed just releasing off of trunk rather than cutting a > > branch-2, > > > and there was general agreement there. So, consider #3 abandoned. > > > 1&2 > can > > > be achieved at the same time, we just need to avoid using JDK8 > > > language features in trunk so things can be backported. > > > > > > Best, > > > Andrew > > > > > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata > > > <rst...@altiscale.com> > > wrote: > > > > > >> In this (and the related threads), I see the following three > > requirements: > > >> > > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support). > > >> > > >> 2. "We'll still be releasing 2.x releases for a while, with > > >> similar feature sets as 3.x." > > >> > > >> 3. Avoid the "risk of split-brain behavior" by "minimize > > >> backporting headaches. Pulling trunk > branch-2 > branch-2.x is already > > >> tedious. > > >> Adding a branch-3, branch-3.x would be obnoxious." > > >> > > >> These three cannot be achieved at the same time. Which do we abandon? > > >> > > >> > > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia > > >> <sanjayo...@gmail.com> > > >> wrote: > > >> > > > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> > wrote: > > >> >> > > >> >> 2) Simplification of configs - potentially separating client > > >> >> side > > >> configs > > >> >> and those used by daemons. This is another source of perpetual > > confusion > > >> >> for users. > > >> > + 1 on this. > > >> > > > >> > sanjay > > >> > > > > >
Re: Looking to a Hadoop 3 release
Hi Kihwal, I think there's still value in continuing the 2.x releases. 3.x comes with the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't be beta or GA for some number of months. In the meanwhile, it'd be good to keep putting out regular, stable 2.x releases. Best, Andrew On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid> wrote: > Moving Hadoop 3 forward sounds fine. If EC is one of the main motivations, > are we getting rid of branch-2.8? > > Kihwal > > From: Andrew Wang <andrew.w...@cloudera.com> > To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org> > Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; " > mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>; > hdfs-dev <hdfs-...@hadoop.apache.org> > Sent: Thursday, February 18, 2016 4:35 PM > Subject: Re: Looking to a Hadoop 3 release > > Hi all, > > Reviving this thread. I've seen renewed interest in a trunk release since > HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the > shell script rewrite, and many other improvements, I think it's time to > revisit Hadoop 3.0 release plans. > > My overall plan is still the same as in my original email: a series of > regular alpha releases leading up to beta and GA. Alpha releases make it > easier for downstreams to integrate with our code, and making them regular > means features can be included when they are ready. > > I know there are some incompatible changes waiting in the wings > (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of > HADOOP-9991 bumping dependency versions) that would be good to get in. If > you have changes like this, please set the target version to 3.0.0 and mark > them "Incompatible". We can use this JIRA query to track: > > > https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority > > There's some release-related stuff that needs to be sorted out (namely, the > new CHANGES.txt and release note generation from Yetus), but I'd > tentatively like to roll the first alpha a month out, so third week of > March. > > Best, > Andrew > > On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rst...@altiscale.com> wrote: > > > Avoiding the use of JDK8 language features (and, presumably, APIs) > > means you've abandoned #1, i.e., you haven't (really) bumped the JDK > > source version to JDK8. > > > > Also, note that releasing from trunk is a way of achieving #3, it's > > not a way of abandoning it. > > > > > > > > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <andrew.w...@cloudera.com> > > wrote: > > > Hi Raymie, > > > > > > Konst proposed just releasing off of trunk rather than cutting a > > branch-2, > > > and there was general agreement there. So, consider #3 abandoned. 1&2 > can > > > be achieved at the same time, we just need to avoid using JDK8 language > > > features in trunk so things can be backported. > > > > > > Best, > > > Andrew > > > > > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rst...@altiscale.com> > > wrote: > > > > > >> In this (and the related threads), I see the following three > > requirements: > > >> > > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support). > > >> > > >> 2. "We'll still be releasing 2.x releases for a while, with similar > > >> feature sets as 3.x." > > >> > > >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting > > >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious. > > >> Adding a branch-3, branch-3.x would be obnoxious." > > >> > > >> These three cannot be achieved at the same time. Which do we abandon? > > >> > > >> > > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sanjayo...@gmail.com> > > >> wrote: > > >> > > > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> > wrote: > > >> >> > > >> >> 2) Simplification of configs - potentially separating client side > > >> configs > > >> >> and those used by daemons. This is another source of perpetual > > confusion > > >> >> for users. > > >> > + 1 on this. > > >> > > > >> > sanjay > > >> > > > > >
Re: Looking to a Hadoop 3 release
Moving Hadoop 3 forward sounds fine. If EC is one of the main motivations, are we getting rid of branch-2.8? Kihwal From: Andrew Wang <andrew.w...@cloudera.com> To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org> Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; "mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>; hdfs-dev <hdfs-...@hadoop.apache.org> Sent: Thursday, February 18, 2016 4:35 PM Subject: Re: Looking to a Hadoop 3 release Hi all, Reviving this thread. I've seen renewed interest in a trunk release since HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the shell script rewrite, and many other improvements, I think it's time to revisit Hadoop 3.0 release plans. My overall plan is still the same as in my original email: a series of regular alpha releases leading up to beta and GA. Alpha releases make it easier for downstreams to integrate with our code, and making them regular means features can be included when they are ready. I know there are some incompatible changes waiting in the wings (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of HADOOP-9991 bumping dependency versions) that would be good to get in. If you have changes like this, please set the target version to 3.0.0 and mark them "Incompatible". We can use this JIRA query to track: https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority There's some release-related stuff that needs to be sorted out (namely, the new CHANGES.txt and release note generation from Yetus), but I'd tentatively like to roll the first alpha a month out, so third week of March. Best, Andrew On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rst...@altiscale.com> wrote: > Avoiding the use of JDK8 language features (and, presumably, APIs) > means you've abandoned #1, i.e., you haven't (really) bumped the JDK > source version to JDK8. > > Also, note that releasing from trunk is a way of achieving #3, it's > not a way of abandoning it. > > > > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <andrew.w...@cloudera.com> > wrote: > > Hi Raymie, > > > > Konst proposed just releasing off of trunk rather than cutting a > branch-2, > > and there was general agreement there. So, consider #3 abandoned. 1&2 can > > be achieved at the same time, we just need to avoid using JDK8 language > > features in trunk so things can be backported. > > > > Best, > > Andrew > > > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rst...@altiscale.com> > wrote: > > > >> In this (and the related threads), I see the following three > requirements: > >> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support). > >> > >> 2. "We'll still be releasing 2.x releases for a while, with similar > >> feature sets as 3.x." > >> > >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting > >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious. > >> Adding a branch-3, branch-3.x would be obnoxious." > >> > >> These three cannot be achieved at the same time. Which do we abandon? > >> > >> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sanjayo...@gmail.com> > >> wrote: > >> > > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote: > >> >> > >> >> 2) Simplification of configs - potentially separating client side > >> configs > >> >> and those used by daemons. This is another source of perpetual > confusion > >> >> for users. > >> > + 1 on this. > >> > > >> > sanjay > >> >
Re: Looking to a Hadoop 3 release
Hi all, Reviving this thread. I've seen renewed interest in a trunk release since HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the shell script rewrite, and many other improvements, I think it's time to revisit Hadoop 3.0 release plans. My overall plan is still the same as in my original email: a series of regular alpha releases leading up to beta and GA. Alpha releases make it easier for downstreams to integrate with our code, and making them regular means features can be included when they are ready. I know there are some incompatible changes waiting in the wings (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of HADOOP-9991 bumping dependency versions) that would be good to get in. If you have changes like this, please set the target version to 3.0.0 and mark them "Incompatible". We can use this JIRA query to track: https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority There's some release-related stuff that needs to be sorted out (namely, the new CHANGES.txt and release note generation from Yetus), but I'd tentatively like to roll the first alpha a month out, so third week of March. Best, Andrew On Mon, Mar 9, 2015 at 7:23 PM, Raymie Statawrote: > Avoiding the use of JDK8 language features (and, presumably, APIs) > means you've abandoned #1, i.e., you haven't (really) bumped the JDK > source version to JDK8. > > Also, note that releasing from trunk is a way of achieving #3, it's > not a way of abandoning it. > > > > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang > wrote: > > Hi Raymie, > > > > Konst proposed just releasing off of trunk rather than cutting a > branch-2, > > and there was general agreement there. So, consider #3 abandoned. 1&2 can > > be achieved at the same time, we just need to avoid using JDK8 language > > features in trunk so things can be backported. > > > > Best, > > Andrew > > > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata > wrote: > > > >> In this (and the related threads), I see the following three > requirements: > >> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support). > >> > >> 2. "We'll still be releasing 2.x releases for a while, with similar > >> feature sets as 3.x." > >> > >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting > >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious. > >> Adding a branch-3, branch-3.x would be obnoxious." > >> > >> These three cannot be achieved at the same time. Which do we abandon? > >> > >> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia > >> wrote: > >> > > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth wrote: > >> >> > >> >> 2) Simplification of configs - potentially separating client side > >> configs > >> >> and those used by daemons. This is another source of perpetual > confusion > >> >> for users. > >> > + 1 on this. > >> > > >> > sanjay > >> >
Re: Looking to a Hadoop 3 release
On Mar 6, 2015, at 5:20 PM, Chris Douglas cdoug...@apache.org wrote: On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: I'd encourage everyone to post their wish list on the Roadmap wiki that *warrants* making incompatible changes forcing us to go 3.x. This is a useful exercise, but not a prerequisite to releasing 3.0.0 as an alpha off of trunk, right? Andrew summarized the operating assumptions for anyone working on it: rolling upgrades still work, wire compat is preserved, breaking changes may get rolled back when branch-3 is in beta (so be very conservative, notify others loudly). This applies to branches merged to trunk, also. Not a prerequisite for alpha releases, yes. But it will be for a 'GA' release, because after that we will be back to restricting incompatible changes on 3.x line and we have to say no to features that need API breakage after that. If others feel there are features that warrant incompatibility, we should hear about them for inclusion in such a 3.x release. Till now, the operating assumption was to not break anything as much as possible. If we are opening the window on incompatibilities in 3.x, might as well get everyone to think about stuff that they want. +1 to Jason's comments on general. We can keep rolling alphas that downstream can pick up, but I'd also like us to clarify the exit criterion for a GA release of 3.0 and its relation to the life of 2.x if we are going this route. This brings us back to the roadmap discussion, and a collective agreement about a logical step at a future point in time where we say we have enough incompatible features in 3.x that we can stop putting more of them and start stabilizing it. We'll have this discussion again. We don't need to reach consensus on the roadmap, just that each artifact reflects the output of the project. Agreed. I wasn't requesting us to reach a consensus on the roadmap. Just requesting others to put their wish list up. Irrespective of that, here is my proposal in the interim: - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up the gauntlet on 3.0. - Continue working on the classpath isolation effort and try making it as compatible as is possible for users to opt in and migrate easily. +1 for 2.x, but again I don't understand the sequencing. -C There isn't. I was saying Irrespective of that.. Thanks, +Vinod
Re: Looking to a Hadoop 3 release
On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote: 2) Simplification of configs - potentially separating client side configs and those used by daemons. This is another source of perpetual confusion for users. + 1 on this. sanjay
Re: Looking to a Hadoop 3 release
Avoiding the use of JDK8 language features (and, presumably, APIs) means you've abandoned #1, i.e., you haven't (really) bumped the JDK source version to JDK8. Also, note that releasing from trunk is a way of achieving #3, it's not a way of abandoning it. On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi Raymie, Konst proposed just releasing off of trunk rather than cutting a branch-2, and there was general agreement there. So, consider #3 abandoned. 12 can be achieved at the same time, we just need to avoid using JDK8 language features in trunk so things can be backported. Best, Andrew On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata rst...@altiscale.com wrote: In this (and the related threads), I see the following three requirements: 1. Bump the source JDK version to JDK8 (ie, drop JDK7 support). 2. We'll still be releasing 2.x releases for a while, with similar feature sets as 3.x. 3. Avoid the risk of split-brain behavior by minimize backporting headaches. Pulling trunk branch-2 branch-2.x is already tedious. Adding a branch-3, branch-3.x would be obnoxious. These three cannot be achieved at the same time. Which do we abandon? On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia sanjayo...@gmail.com wrote: On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote: 2) Simplification of configs - potentially separating client side configs and those used by daemons. This is another source of perpetual confusion for users. + 1 on this. sanjay
Re: Looking to a Hadoop 3 release
In this (and the related threads), I see the following three requirements: 1. Bump the source JDK version to JDK8 (ie, drop JDK7 support). 2. We'll still be releasing 2.x releases for a while, with similar feature sets as 3.x. 3. Avoid the risk of split-brain behavior by minimize backporting headaches. Pulling trunk branch-2 branch-2.x is already tedious. Adding a branch-3, branch-3.x would be obnoxious. These three cannot be achieved at the same time. Which do we abandon? On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia sanjayo...@gmail.com wrote: On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote: 2) Simplification of configs - potentially separating client side configs and those used by daemons. This is another source of perpetual confusion for users. + 1 on this. sanjay
Re: Looking to a Hadoop 3 release
Hi Raymie, Konst proposed just releasing off of trunk rather than cutting a branch-2, and there was general agreement there. So, consider #3 abandoned. 12 can be achieved at the same time, we just need to avoid using JDK8 language features in trunk so things can be backported. Best, Andrew On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata rst...@altiscale.com wrote: In this (and the related threads), I see the following three requirements: 1. Bump the source JDK version to JDK8 (ie, drop JDK7 support). 2. We'll still be releasing 2.x releases for a while, with similar feature sets as 3.x. 3. Avoid the risk of split-brain behavior by minimize backporting headaches. Pulling trunk branch-2 branch-2.x is already tedious. Adding a branch-3, branch-3.x would be obnoxious. These three cannot be achieved at the same time. Which do we abandon? On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia sanjayo...@gmail.com wrote: On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote: 2) Simplification of configs - potentially separating client side configs and those used by daemons. This is another source of perpetual confusion for users. + 1 on this. sanjay
Re: Looking to a Hadoop 3 release
been mentioned as part of this. Any feature that breaks wire compatibility better be absolutely amazing, as it creates a huge hurdle for people to jump. To summarize:+1 for a community-discussed roadmap of what we're breaking in Hadoop 3 and why it's worth it for users -1 for creating branch-3 now, we can release from trunk until the next incompatibility for Hadoop 4 arrives +1 for baking classpath isolation as opt-in on 2.x and eventually default on in 3.0 Jason From: Andrew Wang andrew.w...@cloudera.com To: hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org Cc: common-dev@hadoop.apache.org common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org yarn-...@hadoop.apache.org Sent: Wednesday, March 4, 2015 12:15 PM Subject: Re: Looking to a Hadoop 3 release Let's not dismiss this quite so handily. Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we could make classpath isolation opt-in via configuration, what we really want longer term is to have it on by default (or just always on). Stack in particular points out the practical difficulties in using an opt-in method in 2.x from a downstream project perspective. It's not pretty. The plan that both Sean and Jason propose (which I support) is to have an opt-in solution in 2.x, bake it there, then turn it on by default (incompatible) in a new major release. I think this lines up well with my proposal of some alphas and betas leading up to a GA 3.x. I'm also willing to help with 2.x release management if that would help with testing this feature. Even setting aside classpath isolation, a new major release is still justified by JDK8. Somehow this is being ignored in the discussion. Allen, historically the voice of the user in our community, just highlighted it as a major compatibility issue, and myself and Tucu have also expressed our very strong concerns about bumping this in a minor release. 2.7's bump is a unique exception, but this is not something to be cited as precedent or policy. Where does this resistance to a new major release stem from? As I've described from the beginning, this will look basically like a 2.x release, except for the inclusion of classpath isolation by default and target version JDK8. I've expressed my desire to maintain API and wire compatibility, and we can audit the set of incompatible changes in trunk to ensure this. My proposal for doing alpha and beta releases leading up to GA also gives downstreams a nice amount of time for testing and validation. Regards, Andrew On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy a...@hortonworks.com wrote: Awesome, looks like we can just do this in a compatible manner - nothing else on the list seems like it warrants a (premature) major release. Thanks Vinod. Arun From: Vinod Kumar Vavilapalli vino...@hortonworks.com Sent: Tuesday, March 03, 2015 2:30 PM To: common-dev@hadoop.apache.org Cc: hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Looking to a Hadoop 3 release I started pitching in more on that JIRA. To add, I think we can and should strive for doing this in a compatible manner, whatever the approach. Marking and calling it incompatible before we see proposal/patch seems premature to me. Commented the same on JIRA: https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875 . Thanks +Vinod On Mar 2, 2015, at 8:08 PM, Andrew Wang andrew.w...@cloudera.com mailto: andrew.w...@cloudera.com wrote: Regarding classpath isolation, based on what I hear from our customers, it's still a big problem (even after the MR classloader work). The latest Jackson version bump was quite painful for our downstream projects, and the HDFS client still leaks a lot of dependencies. Would welcome more discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already chimed in.
Re: Looking to a Hadoop 3 release
I'd encourage everyone to post their wish list on the Roadmap wiki that *warrants* making incompatible changes forcing us to go 3.x. +1 to Jason's comments on general. We can keep rolling alphas that downstream can pick up, but I'd also like us to clarify the exit criterion for a GA release of 3.0 and its relation to the life of 2.x if we are going this route. This brings us back to the roadmap discussion, and a collective agreement about a logical step at a future point in time where we say we have enough incompatible features in 3.x that we can stop putting more of them and start stabilizing it. Irrespective of that, here is my proposal in the interim: - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up the gauntlet on 3.0. - Continue working on the classpath isolation effort and try making it as compatible as is possible for users to opt in and migrate easily. Thanks, +Vinod On Mar 5, 2015, at 1:44 PM, Jason Lowe jl...@yahoo-inc.com.INVALID wrote: I'm OK with a 3.0.0 release as long as we are minimizing the pain of maintaining yet another release line and conscious of the incompatibilities going into that release line. For the former, I would really rather not see a branch-3 cut so soon. It's yet another line onto which to cherry-pick, and I don't see why we need to add this overhead at such an early phase. We should only create branch-3 when there's an incompatible change that the community wants and it should _not_ go into the next major release (i.e.: it's for Hadoop 4.0). We can develop 3.0 alphas and betas on trunk and release from trunk in the interim. IMHO we need to stop treating trunk as a place to exile patches. For the latter, I think as a community we need to evaluate the benefits of breaking compatibility against the costs of migrating. Each time we break compatibility we create a hurdle for people to jump when they move to the new release, and we should make those hurdles worth their time. For example, wire-compatibility has been mentioned as part of this. Any feature that breaks wire compatibility better be absolutely amazing, as it creates a huge hurdle for people to jump. To summarize:+1 for a community-discussed roadmap of what we're breaking in Hadoop 3 and why it's worth it for users -1 for creating branch-3 now, we can release from trunk until the next incompatibility for Hadoop 4 arrives +1 for baking classpath isolation as opt-in on 2.x and eventually default on in 3.0 Jason From: Andrew Wang andrew.w...@cloudera.com To: hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org Cc: common-dev@hadoop.apache.org common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org yarn-...@hadoop.apache.org Sent: Wednesday, March 4, 2015 12:15 PM Subject: Re: Looking to a Hadoop 3 release Let's not dismiss this quite so handily. Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we could make classpath isolation opt-in via configuration, what we really want longer term is to have it on by default (or just always on). Stack in particular points out the practical difficulties in using an opt-in method in 2.x from a downstream project perspective. It's not pretty. The plan that both Sean and Jason propose (which I support) is to have an opt-in solution in 2.x, bake it there, then turn it on by default (incompatible) in a new major release. I think this lines up well with my proposal of some alphas and betas leading up to a GA 3.x. I'm also willing to help with 2.x release management if that would help with testing this feature. Even setting aside classpath isolation, a new major release is still justified by JDK8. Somehow this is being ignored in the discussion. Allen, historically the voice of the user in our community, just highlighted it as a major compatibility issue, and myself and Tucu have also expressed our very strong concerns about bumping this in a minor release. 2.7's bump is a unique exception, but this is not something to be cited as precedent or policy. Where does this resistance to a new major release stem from? As I've described from the beginning, this will look basically like a 2.x release, except for the inclusion of classpath isolation by default and target version JDK8. I've expressed my desire to maintain API and wire compatibility, and we can audit the set of incompatible changes in trunk to ensure this. My proposal for doing alpha and beta releases leading up to GA also gives downstreams a nice amount of time for testing and validation. Regards, Andrew On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy a...@hortonworks.com wrote: Awesome, looks like we can just do this in a compatible manner - nothing else on the list seems like it warrants a (premature) major release
Re: Looking to a Hadoop 3 release
On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: I'd encourage everyone to post their wish list on the Roadmap wiki that *warrants* making incompatible changes forcing us to go 3.x. This is a useful exercise, but not a prerequisite to releasing 3.0.0 as an alpha off of trunk, right? Andrew summarized the operating assumptions for anyone working on it: rolling upgrades still work, wire compat is preserved, breaking changes may get rolled back when branch-3 is in beta (so be very conservative, notify others loudly). This applies to branches merged to trunk, also. +1 to Jason's comments on general. We can keep rolling alphas that downstream can pick up, but I'd also like us to clarify the exit criterion for a GA release of 3.0 and its relation to the life of 2.x if we are going this route. This brings us back to the roadmap discussion, and a collective agreement about a logical step at a future point in time where we say we have enough incompatible features in 3.x that we can stop putting more of them and start stabilizing it. We'll have this discussion again. We don't need to reach consensus on the roadmap, just that each artifact reflects the output of the project. Irrespective of that, here is my proposal in the interim: - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up the gauntlet on 3.0. - Continue working on the classpath isolation effort and try making it as compatible as is possible for users to opt in and migrate easily. +1 for 2.x, but again I don't understand the sequencing. -C On Mar 5, 2015, at 1:44 PM, Jason Lowe jl...@yahoo-inc.com.INVALID wrote: I'm OK with a 3.0.0 release as long as we are minimizing the pain of maintaining yet another release line and conscious of the incompatibilities going into that release line. For the former, I would really rather not see a branch-3 cut so soon. It's yet another line onto which to cherry-pick, and I don't see why we need to add this overhead at such an early phase. We should only create branch-3 when there's an incompatible change that the community wants and it should _not_ go into the next major release (i.e.: it's for Hadoop 4.0). We can develop 3.0 alphas and betas on trunk and release from trunk in the interim. IMHO we need to stop treating trunk as a place to exile patches. For the latter, I think as a community we need to evaluate the benefits of breaking compatibility against the costs of migrating. Each time we break compatibility we create a hurdle for people to jump when they move to the new release, and we should make those hurdles worth their time. For example, wire-compatibility has been mentioned as part of this. Any feature that breaks wire compatibility better be absolutely amazing, as it creates a huge hurdle for people to jump. To summarize:+1 for a community-discussed roadmap of what we're breaking in Hadoop 3 and why it's worth it for users -1 for creating branch-3 now, we can release from trunk until the next incompatibility for Hadoop 4 arrives +1 for baking classpath isolation as opt-in on 2.x and eventually default on in 3.0 Jason From: Andrew Wang andrew.w...@cloudera.com To: hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org Cc: common-dev@hadoop.apache.org common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org yarn-...@hadoop.apache.org Sent: Wednesday, March 4, 2015 12:15 PM Subject: Re: Looking to a Hadoop 3 release Let's not dismiss this quite so handily. Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we could make classpath isolation opt-in via configuration, what we really want longer term is to have it on by default (or just always on). Stack in particular points out the practical difficulties in using an opt-in method in 2.x from a downstream project perspective. It's not pretty. The plan that both Sean and Jason propose (which I support) is to have an opt-in solution in 2.x, bake it there, then turn it on by default (incompatible) in a new major release. I think this lines up well with my proposal of some alphas and betas leading up to a GA 3.x. I'm also willing to help with 2.x release management if that would help with testing this feature. Even setting aside classpath isolation, a new major release is still justified by JDK8. Somehow this is being ignored in the discussion. Allen, historically the voice of the user in our community, just highlighted it as a major compatibility issue, and myself and Tucu have also expressed our very strong concerns about bumping this in a minor release. 2.7's bump is a unique exception, but this is not something to be cited as precedent or policy. Where does this resistance to a new major release stem from? As I've described
Re: Looking to a Hadoop 3 release
Yes, these are the kind of enhancements that need to be proposed and discussed for inclusion! Thanks, +Vinod On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote: Some features that come to mind immediately would be 1) enhancements to the RPC mechanics - specifically support for AsynRPC / two way communication. There's a lot of places where we re-use heartbeats to send more information than what would be done if the PRC layer supported these features. Some of this can be done in a compatible manner to the existing RPC sub-system. Others like 2 way communication probably cannot. After this, having HDFS/YARN actually make use of these changes. The other consideration is adoption of an alternate system ike gRpc which would be incompatible. 2) Simplification of configs - potentially separating client side configs and those used by daemons. This is another source of perpetual confusion for users. Thanks - Sid On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran ste...@hortonworks.com wrote: Sorry, outlook dequoted Alejandros's comments. Let me try again with his comments in italic and proofreading of mine On 05/03/2015 13:59, Steve Loughran ste...@hortonworks.commailto: ste...@hortonworks.com wrote: On 05/03/2015 13:05, Alejandro Abdelnur tuc...@gmail.commailto: tuc...@gmail.commailto:tuc...@gmail.com wrote: IMO, if part of the community wants to take on the responsibility and work that takes to do a new major release, we should not discourage them from doing that. Having multiple major branches active is a standard practice. Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long time to get out, and during that time 0.21, 0.22, got released and ignored; 0.23 picked up and used in production. The 2.04-alpha release was more of a troublespot as it got picked up widely enough to be used in products, and changes were made between that alpha 2.2 itself which raised compatibility issues. For 3.x I'd propose 1. Have less longevity of 3.x alpha/beta artifacts 2. Make clear there are no guarantees of compatibility from alpha/beta releases to shipping. Best effort, but not to the extent that it gets in the way. More succinctly: we will care more about seamless migration from 2.2+ to 3.x than from a 3.0-alpha to 3.3 production. 3. Anybody who ships code based on 3.x alpha/beta to recognise and accept policy (2). Hadoop's instability guarantee for the 3.x alpha/beta phase As well as backwards compatibility, we need to think about Forwards compatibility, with the goal being: Any app written/shipped with the 3.x release binaries (JAR and native) will work in and against a 3.y Hadoop cluster, for all x, y in Natural where y=x and is-release(x) and is-release(y) That's important, as it means all server-side changes in 3.x which are expected to to mandate client-side updates: protocols, HDFS erasure decoding, security features, must be considered complete and stable before we can say is-release(x). In an ideal world, we'll even get the semantics right with tests to show this. Fixing classpath hell downstream is certainly one feature I am +1 on. But: it's only one of the features, and given there's not any design doc on that JIRA, way too immature to set a release schedule on. An alpha schedule with no-guarantees and a regular alpha roll, could be viable, as new features go in and can then be used to experimentally try this stuff in branches of Hbase (well volunteered, Stack!), etc. Of course instability guarantees will be transitive downstream. This time around we are not replacing the guts as we did from Hadoop 1 to Hadoop 2, but superficial surgery to address issues were not considered (or was too much to take on top of the guts transplant). For the split brain concern, we did a great of job maintaining Hadoop 1 and Hadoop 2 until Hadoop 1 faded away. And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility. Based on that experience I would say that the coexistence of Hadoop 2 and Hadoop 3 will be much less demanding/traumatic. The re-layout of all the source trees was a major change there, assuming there's no refactoring or switch of build tools then picking things back will be tractable Also, to facilitate the coexistence we should limit Java language features to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore we can remove this limitation. +1; setting javac.version will fix this What is nice about having java 8 as the base JVM is that it means you can be confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs can use all Java 8 features they want to. There's one policy change to consider there which is possibly, just possibly, we could allow new modules in hadoop-tools to adopt Java 8 languages early, provided everyone recognised that backport to branch-2 isn't going to
Re: Looking to a Hadoop 3 release
Right, but that doesn't really answer the question…. On Mar 5, 2015, at 10:23 PM, Alejandro Abdelnur tuc...@gmail.com wrote: If classloader isolation is in place, then dependency versions can freely be upgraded as won't pollute apps space (things get trickier if there is an ON/OFF switch). On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer a...@altiscale.com wrote: Is there going to be a general upgrade of dependencies? I'm thinking of jetty jackson in particular. On Mar 5, 2015, at 5:24 PM, Andrew Wang andrew.w...@cloudera.com wrote: I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki page. In addition to the two things I've been pushing, I also looked through Allen's list (thanks Allen for making this) and picked out the shell script rewrite and the removal of HFTP as big changes. This would be the place to propose features for inclusion in 3.x, I'd particularly appreciate help on the YARN/MR side. Based on what I'm hearing, let me modulate my proposal to the following: - We avoid cutting branch-3, and release off of trunk. The trunk-only changes don't look that scary, so I think this is fine. This does mean we need to be more rigorous before merging branches to trunk. I think Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would be very helpful in this regard. - We do not include anything to break wire compatibility unless (as Jason says) it's an unbelievably awesome feature. - No harm in rolling alphas from trunk, as it doesn't lock us to anything compatibility wise. Downstreams like releases. I'll take Steve's advice about not locking GA to a given date, but I also share his belief that we can alpha/beta/GA faster than it took for Hadoop 2. Let's roll some intermediate releases, work on the roadmap items, and see how we're feeling in a few months. Best, Andrew On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth ss...@apache.org wrote: I think it'll be useful to have a discussion about what else people would like to see in Hadoop 3.x - especially if the change is potentially incompatible. Also, what we expect the release schedule to be for major releases and what triggers them - JVM version, major features, the need for incompatible changes ? Assuming major versions will not be released every 6 months/1 year (adoption time, fairly disruptive for downstream projects, and users) - considering additional features/incompatible changes for 3.x would be useful. Some features that come to mind immediately would be 1) enhancements to the RPC mechanics - specifically support for AsynRPC / two way communication. There's a lot of places where we re-use heartbeats to send more information than what would be done if the PRC layer supported these features. Some of this can be done in a compatible manner to the existing RPC sub-system. Others like 2 way communication probably cannot. After this, having HDFS/YARN actually make use of these changes. The other consideration is adoption of an alternate system ike gRpc which would be incompatible. 2) Simplification of configs - potentially separating client side configs and those used by daemons. This is another source of perpetual confusion for users. Thanks - Sid On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran ste...@hortonworks.com wrote: Sorry, outlook dequoted Alejandros's comments. Let me try again with his comments in italic and proofreading of mine On 05/03/2015 13:59, Steve Loughran ste...@hortonworks.commailto: ste...@hortonworks.com wrote: On 05/03/2015 13:05, Alejandro Abdelnur tuc...@gmail.commailto: tuc...@gmail.commailto:tuc...@gmail.com wrote: IMO, if part of the community wants to take on the responsibility and work that takes to do a new major release, we should not discourage them from doing that. Having multiple major branches active is a standard practice. Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long time to get out, and during that time 0.21, 0.22, got released and ignored; 0.23 picked up and used in production. The 2.04-alpha release was more of a troublespot as it got picked up widely enough to be used in products, and changes were made between that alpha 2.2 itself which raised compatibility issues. For 3.x I'd propose 1. Have less longevity of 3.x alpha/beta artifacts 2. Make clear there are no guarantees of compatibility from alpha/beta releases to shipping. Best effort, but not to the extent that it gets in the way. More succinctly: we will care more about seamless migration from 2.2+ to 3.x than from a 3.0-alpha to 3.3 production. 3. Anybody who ships code based on 3.x alpha/beta to recognise and accept policy (2). Hadoop's instability guarantee for the 3.x alpha/beta phase As well as backwards compatibility, we need to think about Forwards compatibility, with the goal being: Any app written/shipped with the
Re: Looking to a Hadoop 3 release
IMO, if part of the community wants to take on the responsibility and work that takes to do a new major release, we should not discourage them from doing that. Having multiple major branches active is a standard practice. This time around we are not replacing the guts as we did from Hadoop 1 to Hadoop 2, but superficial surgery to address issues were not considered (or was too much to take on top of the guts transplant). For the split brain concern, we did a great of job maintaining Hadoop 1 and Hadoop 2 until Hadoop 1 faded away. Based on that experience I would say that the coexistence of Hadoop 2 and Hadoop 3 will be much less demanding/traumatic. Also, to facilitate the coexistence we should limit Java language features to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore we can remove this limitation. Thanks. On Thu, Mar 5, 2015 at 11:40 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: The 'resistance' is not so much about a new major release, more so about the content and the roadmap of the release. Other than the two specific features raised (the need for breaking compat for them is something that I am debating), I haven't seen a roadmap of branch-3 about any more features that this community needs to discuss about. If all the difference between branch-2 and branch-3 is going to be JDK + a couple of incompat changes, it is a big problem in two dimensions (1) it's a burden keeping the branches in sync and avoiding the split-brain we experienced with 1.x, 2.x or worse branch-0.23, branch-2 and (2) very hard to ask people to not break more things in branch-3. We seem to have agreed upon a course of action for JDK7. And now we are taking a different direction for JDK8. Going by this new proposal, come 2016, we will have to deal with JDK9 and 3 mainline incompatible hadoop releases. Regarding, individual improvements like classpath isolation, shell script stuff, Jason Lowe captured it perfectly on HADOOP-11656 - it should be possible for every major feature that we develop to be a opt in, unless the change is so great and users can balance out the incompatibilities for the new stuff they are getting. Even with an ground breaking change like with YARN, we spent a bit of time to ensure compatibility (MAPREDUCE-5108) that has paid so many times over in return. Breaking compatibility shouldn't come across as too cheap a thing. Thanks, +Vinod On Mar 4, 2015, at 10:15 AM, Andrew Wang andrew.w...@cloudera.commailto: andrew.w...@cloudera.com wrote: Where does this resistance to a new major release stem from? As I've described from the beginning, this will look basically like a 2.x release, except for the inclusion of classpath isolation by default and target version JDK8. I've expressed my desire to maintain API and wire compatibility, and we can audit the set of incompatible changes in trunk to ensure this. My proposal for doing alpha and beta releases leading up to GA also gives downstreams a nice amount of time for testing and validation.
Re: Looking to a Hadoop 3 release
The 'resistance' is not so much about a new major release, more so about the content and the roadmap of the release. Other than the two specific features raised (the need for breaking compat for them is something that I am debating), I haven't seen a roadmap of branch-3 about any more features that this community needs to discuss about. If all the difference between branch-2 and branch-3 is going to be JDK + a couple of incompat changes, it is a big problem in two dimensions (1) it's a burden keeping the branches in sync and avoiding the split-brain we experienced with 1.x, 2.x or worse branch-0.23, branch-2 and (2) very hard to ask people to not break more things in branch-3. We seem to have agreed upon a course of action for JDK7. And now we are taking a different direction for JDK8. Going by this new proposal, come 2016, we will have to deal with JDK9 and 3 mainline incompatible hadoop releases. Regarding, individual improvements like classpath isolation, shell script stuff, Jason Lowe captured it perfectly on HADOOP-11656 - it should be possible for every major feature that we develop to be a opt in, unless the change is so great and users can balance out the incompatibilities for the new stuff they are getting. Even with an ground breaking change like with YARN, we spent a bit of time to ensure compatibility (MAPREDUCE-5108) that has paid so many times over in return. Breaking compatibility shouldn't come across as too cheap a thing. Thanks, +Vinod On Mar 4, 2015, at 10:15 AM, Andrew Wang andrew.w...@cloudera.commailto:andrew.w...@cloudera.com wrote: Where does this resistance to a new major release stem from? As I've described from the beginning, this will look basically like a 2.x release, except for the inclusion of classpath isolation by default and target version JDK8. I've expressed my desire to maintain API and wire compatibility, and we can audit the set of incompatible changes in trunk to ensure this. My proposal for doing alpha and beta releases leading up to GA also gives downstreams a nice amount of time for testing and validation.
Re: Looking to a Hadoop 3 release
Sorry, outlook dequoted Alejandros's comments. Let me try again with his comments in italic and proofreading of mine On 05/03/2015 13:59, Steve Loughran ste...@hortonworks.commailto:ste...@hortonworks.com wrote: On 05/03/2015 13:05, Alejandro Abdelnur tuc...@gmail.commailto:tuc...@gmail.commailto:tuc...@gmail.com wrote: IMO, if part of the community wants to take on the responsibility and work that takes to do a new major release, we should not discourage them from doing that. Having multiple major branches active is a standard practice. Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long time to get out, and during that time 0.21, 0.22, got released and ignored; 0.23 picked up and used in production. The 2.04-alpha release was more of a troublespot as it got picked up widely enough to be used in products, and changes were made between that alpha 2.2 itself which raised compatibility issues. For 3.x I'd propose 1. Have less longevity of 3.x alpha/beta artifacts 2. Make clear there are no guarantees of compatibility from alpha/beta releases to shipping. Best effort, but not to the extent that it gets in the way. More succinctly: we will care more about seamless migration from 2.2+ to 3.x than from a 3.0-alpha to 3.3 production. 3. Anybody who ships code based on 3.x alpha/beta to recognise and accept policy (2). Hadoop's instability guarantee for the 3.x alpha/beta phase As well as backwards compatibility, we need to think about Forwards compatibility, with the goal being: Any app written/shipped with the 3.x release binaries (JAR and native) will work in and against a 3.y Hadoop cluster, for all x, y in Natural where y=x and is-release(x) and is-release(y) That's important, as it means all server-side changes in 3.x which are expected to to mandate client-side updates: protocols, HDFS erasure decoding, security features, must be considered complete and stable before we can say is-release(x). In an ideal world, we'll even get the semantics right with tests to show this. Fixing classpath hell downstream is certainly one feature I am +1 on. But: it's only one of the features, and given there's not any design doc on that JIRA, way too immature to set a release schedule on. An alpha schedule with no-guarantees and a regular alpha roll, could be viable, as new features go in and can then be used to experimentally try this stuff in branches of Hbase (well volunteered, Stack!), etc. Of course instability guarantees will be transitive downstream. This time around we are not replacing the guts as we did from Hadoop 1 to Hadoop 2, but superficial surgery to address issues were not considered (or was too much to take on top of the guts transplant). For the split brain concern, we did a great of job maintaining Hadoop 1 and Hadoop 2 until Hadoop 1 faded away. And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility. Based on that experience I would say that the coexistence of Hadoop 2 and Hadoop 3 will be much less demanding/traumatic. The re-layout of all the source trees was a major change there, assuming there's no refactoring or switch of build tools then picking things back will be tractable Also, to facilitate the coexistence we should limit Java language features to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore we can remove this limitation. +1; setting javac.version will fix this What is nice about having java 8 as the base JVM is that it means you can be confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs can use all Java 8 features they want to. There's one policy change to consider there which is possibly, just possibly, we could allow new modules in hadoop-tools to adopt Java 8 languages early, provided everyone recognised that backport to branch-2 isn't going to happen. -Steve
Re: Looking to a Hadoop 3 release
I'm OK with a 3.0.0 release as long as we are minimizing the pain of maintaining yet another release line and conscious of the incompatibilities going into that release line. For the former, I would really rather not see a branch-3 cut so soon. It's yet another line onto which to cherry-pick, and I don't see why we need to add this overhead at such an early phase. We should only create branch-3 when there's an incompatible change that the community wants and it should _not_ go into the next major release (i.e.: it's for Hadoop 4.0). We can develop 3.0 alphas and betas on trunk and release from trunk in the interim. IMHO we need to stop treating trunk as a place to exile patches. For the latter, I think as a community we need to evaluate the benefits of breaking compatibility against the costs of migrating. Each time we break compatibility we create a hurdle for people to jump when they move to the new release, and we should make those hurdles worth their time. For example, wire-compatibility has been mentioned as part of this. Any feature that breaks wire compatibility better be absolutely amazing, as it creates a huge hurdle for people to jump. To summarize:+1 for a community-discussed roadmap of what we're breaking in Hadoop 3 and why it's worth it for users -1 for creating branch-3 now, we can release from trunk until the next incompatibility for Hadoop 4 arrives +1 for baking classpath isolation as opt-in on 2.x and eventually default on in 3.0 Jason From: Andrew Wang andrew.w...@cloudera.com To: hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org Cc: common-dev@hadoop.apache.org common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org yarn-...@hadoop.apache.org Sent: Wednesday, March 4, 2015 12:15 PM Subject: Re: Looking to a Hadoop 3 release Let's not dismiss this quite so handily. Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we could make classpath isolation opt-in via configuration, what we really want longer term is to have it on by default (or just always on). Stack in particular points out the practical difficulties in using an opt-in method in 2.x from a downstream project perspective. It's not pretty. The plan that both Sean and Jason propose (which I support) is to have an opt-in solution in 2.x, bake it there, then turn it on by default (incompatible) in a new major release. I think this lines up well with my proposal of some alphas and betas leading up to a GA 3.x. I'm also willing to help with 2.x release management if that would help with testing this feature. Even setting aside classpath isolation, a new major release is still justified by JDK8. Somehow this is being ignored in the discussion. Allen, historically the voice of the user in our community, just highlighted it as a major compatibility issue, and myself and Tucu have also expressed our very strong concerns about bumping this in a minor release. 2.7's bump is a unique exception, but this is not something to be cited as precedent or policy. Where does this resistance to a new major release stem from? As I've described from the beginning, this will look basically like a 2.x release, except for the inclusion of classpath isolation by default and target version JDK8. I've expressed my desire to maintain API and wire compatibility, and we can audit the set of incompatible changes in trunk to ensure this. My proposal for doing alpha and beta releases leading up to GA also gives downstreams a nice amount of time for testing and validation. Regards, Andrew On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy a...@hortonworks.com wrote: Awesome, looks like we can just do this in a compatible manner - nothing else on the list seems like it warrants a (premature) major release. Thanks Vinod. Arun From: Vinod Kumar Vavilapalli vino...@hortonworks.com Sent: Tuesday, March 03, 2015 2:30 PM To: common-dev@hadoop.apache.org Cc: hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Looking to a Hadoop 3 release I started pitching in more on that JIRA. To add, I think we can and should strive for doing this in a compatible manner, whatever the approach. Marking and calling it incompatible before we see proposal/patch seems premature to me. Commented the same on JIRA: https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875 . Thanks +Vinod On Mar 2, 2015, at 8:08 PM, Andrew Wang andrew.w...@cloudera.commailto: andrew.w...@cloudera.com wrote: Regarding classpath isolation, based on what I hear from our customers, it's still a big problem (even after the MR classloader work). The latest Jackson version bump was quite painful for our downstream projects, and the HDFS client still leaks a lot
Re: Looking to a Hadoop 3 release
On 05/03/2015 13:05, Alejandro Abdelnur tuc...@gmail.commailto:tuc...@gmail.com wrote: IMO, if part of the community wants to take on the responsibility and work that takes to do a new major release, we should not discourage them from doing that. Having multiple major branches active is a standard practice. Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long time to get out, and during that time 0.21, 0.22, got released and ignored; 0.23 picked up and used in production. The 2.04-alpha release was more of a troublespot as it got picked up widely enough to be used in products, and changes were made between that alpha 2.2 itself which raised compatibility issues. For 3.x I'd propose 1. Have less longevity of 3.x alpha/beta artifacts 2. Make clear there are no guarantees of compatibility from alpha/beta releases to shipping. Best effort, but not to the extent that it gets in the way. More succinctly: we will care more about seamless migration from 2.2+ to 3.x than from a 3.0-alpha to 3.3 production. 3. Anybody who ships code based on 3.x alpha/beta to recognise and accept policy (2). Hadoop's instability guarantee for the 3.x alpha/beta phase As well as backwards compatibility, we need to think about Forwards compatibility, with the goal being: Any app written/shipped with the 3.x release binaries (JAR and native) will work against a 3.y Hadoop release, for all x, y in Natural where y=x and is-release(x) and is-release(y) That's important, as it means all server-side changes in 3.x which are expected to to mandate client-side updates: protocols, HDFS erasure decoding, security features, must be considered complete and stable before we can say is-release(x). In an ideal world, we'll even get the semantics right with tests to show this. Fixing classpath hell downstream is certainly one feature I am +1 on this roadmap is classpath isolation. But: it's only one of the features, and given there's not any design doc on that JIRA, way too immature to set a release schedule on. An alpha schedule with no-guarantees and a regular alpha roll, could be viable, as new features go in and can then be used to experimentally try this stuff in branches of Hbase (well volunteered, Stack!), etc. Of course instability guarantees will transitive This time around we are not replacing the guts as we did from Hadoop 1 to Hadoop 2, but superficial surgery to address issues were not considered (or was too much to take on top of the guts transplant). For the split brain concern, we did a great of job maintaining Hadoop 1 and Hadoop 2 until Hadoop 1 faded away. And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility. Based on that experience I would say that the coexistence of Hadoop 2 and Hadoop 3 will be much less demanding/traumatic. The re-layout of all the source trees was a major change there, assuming there's no refactoring or switch of build tools then picking things back will be tractable Also, to facilitate the coexistence we should limit Java language features to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore we can remove this limitation. +1; setting javac.version will fix this What is nice about having java 8 as the base JVM is that it means you can be confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs can use all Java 8 features they want to. There's one policy change to consider there which is possibly, just possibly, we could allow new modules in hadoop-tools to adopt Java 8 languages early, provided everyone recognised that backport to branch-2 isn't going to happen. -Steve
Re: Looking to a Hadoop 3 release
I think it'll be useful to have a discussion about what else people would like to see in Hadoop 3.x - especially if the change is potentially incompatible. Also, what we expect the release schedule to be for major releases and what triggers them - JVM version, major features, the need for incompatible changes ? Assuming major versions will not be released every 6 months/1 year (adoption time, fairly disruptive for downstream projects, and users) - considering additional features/incompatible changes for 3.x would be useful. Some features that come to mind immediately would be 1) enhancements to the RPC mechanics - specifically support for AsynRPC / two way communication. There's a lot of places where we re-use heartbeats to send more information than what would be done if the PRC layer supported these features. Some of this can be done in a compatible manner to the existing RPC sub-system. Others like 2 way communication probably cannot. After this, having HDFS/YARN actually make use of these changes. The other consideration is adoption of an alternate system ike gRpc which would be incompatible. 2) Simplification of configs - potentially separating client side configs and those used by daemons. This is another source of perpetual confusion for users. Thanks - Sid On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran ste...@hortonworks.com wrote: Sorry, outlook dequoted Alejandros's comments. Let me try again with his comments in italic and proofreading of mine On 05/03/2015 13:59, Steve Loughran ste...@hortonworks.commailto: ste...@hortonworks.com wrote: On 05/03/2015 13:05, Alejandro Abdelnur tuc...@gmail.commailto: tuc...@gmail.commailto:tuc...@gmail.com wrote: IMO, if part of the community wants to take on the responsibility and work that takes to do a new major release, we should not discourage them from doing that. Having multiple major branches active is a standard practice. Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long time to get out, and during that time 0.21, 0.22, got released and ignored; 0.23 picked up and used in production. The 2.04-alpha release was more of a troublespot as it got picked up widely enough to be used in products, and changes were made between that alpha 2.2 itself which raised compatibility issues. For 3.x I'd propose 1. Have less longevity of 3.x alpha/beta artifacts 2. Make clear there are no guarantees of compatibility from alpha/beta releases to shipping. Best effort, but not to the extent that it gets in the way. More succinctly: we will care more about seamless migration from 2.2+ to 3.x than from a 3.0-alpha to 3.3 production. 3. Anybody who ships code based on 3.x alpha/beta to recognise and accept policy (2). Hadoop's instability guarantee for the 3.x alpha/beta phase As well as backwards compatibility, we need to think about Forwards compatibility, with the goal being: Any app written/shipped with the 3.x release binaries (JAR and native) will work in and against a 3.y Hadoop cluster, for all x, y in Natural where y=x and is-release(x) and is-release(y) That's important, as it means all server-side changes in 3.x which are expected to to mandate client-side updates: protocols, HDFS erasure decoding, security features, must be considered complete and stable before we can say is-release(x). In an ideal world, we'll even get the semantics right with tests to show this. Fixing classpath hell downstream is certainly one feature I am +1 on. But: it's only one of the features, and given there's not any design doc on that JIRA, way too immature to set a release schedule on. An alpha schedule with no-guarantees and a regular alpha roll, could be viable, as new features go in and can then be used to experimentally try this stuff in branches of Hbase (well volunteered, Stack!), etc. Of course instability guarantees will be transitive downstream. This time around we are not replacing the guts as we did from Hadoop 1 to Hadoop 2, but superficial surgery to address issues were not considered (or was too much to take on top of the guts transplant). For the split brain concern, we did a great of job maintaining Hadoop 1 and Hadoop 2 until Hadoop 1 faded away. And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility. Based on that experience I would say that the coexistence of Hadoop 2 and Hadoop 3 will be much less demanding/traumatic. The re-layout of all the source trees was a major change there, assuming there's no refactoring or switch of build tools then picking things back will be tractable Also, to facilitate the coexistence we should limit Java language features to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore we can remove this limitation. +1; setting javac.version will fix this What is nice about having java 8 as the base JVM is that it means you can be confident that all Hadoop 3 servers will be
Re: Looking to a Hadoop 3 release
If classloader isolation is in place, then dependency versions can freely be upgraded as won't pollute apps space (things get trickier if there is an ON/OFF switch). On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer a...@altiscale.com wrote: Is there going to be a general upgrade of dependencies? I'm thinking of jetty jackson in particular. On Mar 5, 2015, at 5:24 PM, Andrew Wang andrew.w...@cloudera.com wrote: I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki page. In addition to the two things I've been pushing, I also looked through Allen's list (thanks Allen for making this) and picked out the shell script rewrite and the removal of HFTP as big changes. This would be the place to propose features for inclusion in 3.x, I'd particularly appreciate help on the YARN/MR side. Based on what I'm hearing, let me modulate my proposal to the following: - We avoid cutting branch-3, and release off of trunk. The trunk-only changes don't look that scary, so I think this is fine. This does mean we need to be more rigorous before merging branches to trunk. I think Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would be very helpful in this regard. - We do not include anything to break wire compatibility unless (as Jason says) it's an unbelievably awesome feature. - No harm in rolling alphas from trunk, as it doesn't lock us to anything compatibility wise. Downstreams like releases. I'll take Steve's advice about not locking GA to a given date, but I also share his belief that we can alpha/beta/GA faster than it took for Hadoop 2. Let's roll some intermediate releases, work on the roadmap items, and see how we're feeling in a few months. Best, Andrew On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth ss...@apache.org wrote: I think it'll be useful to have a discussion about what else people would like to see in Hadoop 3.x - especially if the change is potentially incompatible. Also, what we expect the release schedule to be for major releases and what triggers them - JVM version, major features, the need for incompatible changes ? Assuming major versions will not be released every 6 months/1 year (adoption time, fairly disruptive for downstream projects, and users) - considering additional features/incompatible changes for 3.x would be useful. Some features that come to mind immediately would be 1) enhancements to the RPC mechanics - specifically support for AsynRPC / two way communication. There's a lot of places where we re-use heartbeats to send more information than what would be done if the PRC layer supported these features. Some of this can be done in a compatible manner to the existing RPC sub-system. Others like 2 way communication probably cannot. After this, having HDFS/YARN actually make use of these changes. The other consideration is adoption of an alternate system ike gRpc which would be incompatible. 2) Simplification of configs - potentially separating client side configs and those used by daemons. This is another source of perpetual confusion for users. Thanks - Sid On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran ste...@hortonworks.com wrote: Sorry, outlook dequoted Alejandros's comments. Let me try again with his comments in italic and proofreading of mine On 05/03/2015 13:59, Steve Loughran ste...@hortonworks.commailto: ste...@hortonworks.com wrote: On 05/03/2015 13:05, Alejandro Abdelnur tuc...@gmail.commailto: tuc...@gmail.commailto:tuc...@gmail.com wrote: IMO, if part of the community wants to take on the responsibility and work that takes to do a new major release, we should not discourage them from doing that. Having multiple major branches active is a standard practice. Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long time to get out, and during that time 0.21, 0.22, got released and ignored; 0.23 picked up and used in production. The 2.04-alpha release was more of a troublespot as it got picked up widely enough to be used in products, and changes were made between that alpha 2.2 itself which raised compatibility issues. For 3.x I'd propose 1. Have less longevity of 3.x alpha/beta artifacts 2. Make clear there are no guarantees of compatibility from alpha/beta releases to shipping. Best effort, but not to the extent that it gets in the way. More succinctly: we will care more about seamless migration from 2.2+ to 3.x than from a 3.0-alpha to 3.3 production. 3. Anybody who ships code based on 3.x alpha/beta to recognise and accept policy (2). Hadoop's instability guarantee for the 3.x alpha/beta phase As well as backwards compatibility, we need to think about Forwards compatibility, with the goal being: Any app written/shipped with the 3.x release binaries (JAR and native) will work in and
Re: Looking to a Hadoop 3 release
I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki page. In addition to the two things I've been pushing, I also looked through Allen's list (thanks Allen for making this) and picked out the shell script rewrite and the removal of HFTP as big changes. This would be the place to propose features for inclusion in 3.x, I'd particularly appreciate help on the YARN/MR side. Based on what I'm hearing, let me modulate my proposal to the following: - We avoid cutting branch-3, and release off of trunk. The trunk-only changes don't look that scary, so I think this is fine. This does mean we need to be more rigorous before merging branches to trunk. I think Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would be very helpful in this regard. - We do not include anything to break wire compatibility unless (as Jason says) it's an unbelievably awesome feature. - No harm in rolling alphas from trunk, as it doesn't lock us to anything compatibility wise. Downstreams like releases. I'll take Steve's advice about not locking GA to a given date, but I also share his belief that we can alpha/beta/GA faster than it took for Hadoop 2. Let's roll some intermediate releases, work on the roadmap items, and see how we're feeling in a few months. Best, Andrew On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth ss...@apache.org wrote: I think it'll be useful to have a discussion about what else people would like to see in Hadoop 3.x - especially if the change is potentially incompatible. Also, what we expect the release schedule to be for major releases and what triggers them - JVM version, major features, the need for incompatible changes ? Assuming major versions will not be released every 6 months/1 year (adoption time, fairly disruptive for downstream projects, and users) - considering additional features/incompatible changes for 3.x would be useful. Some features that come to mind immediately would be 1) enhancements to the RPC mechanics - specifically support for AsynRPC / two way communication. There's a lot of places where we re-use heartbeats to send more information than what would be done if the PRC layer supported these features. Some of this can be done in a compatible manner to the existing RPC sub-system. Others like 2 way communication probably cannot. After this, having HDFS/YARN actually make use of these changes. The other consideration is adoption of an alternate system ike gRpc which would be incompatible. 2) Simplification of configs - potentially separating client side configs and those used by daemons. This is another source of perpetual confusion for users. Thanks - Sid On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran ste...@hortonworks.com wrote: Sorry, outlook dequoted Alejandros's comments. Let me try again with his comments in italic and proofreading of mine On 05/03/2015 13:59, Steve Loughran ste...@hortonworks.commailto: ste...@hortonworks.com wrote: On 05/03/2015 13:05, Alejandro Abdelnur tuc...@gmail.commailto: tuc...@gmail.commailto:tuc...@gmail.com wrote: IMO, if part of the community wants to take on the responsibility and work that takes to do a new major release, we should not discourage them from doing that. Having multiple major branches active is a standard practice. Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long time to get out, and during that time 0.21, 0.22, got released and ignored; 0.23 picked up and used in production. The 2.04-alpha release was more of a troublespot as it got picked up widely enough to be used in products, and changes were made between that alpha 2.2 itself which raised compatibility issues. For 3.x I'd propose 1. Have less longevity of 3.x alpha/beta artifacts 2. Make clear there are no guarantees of compatibility from alpha/beta releases to shipping. Best effort, but not to the extent that it gets in the way. More succinctly: we will care more about seamless migration from 2.2+ to 3.x than from a 3.0-alpha to 3.3 production. 3. Anybody who ships code based on 3.x alpha/beta to recognise and accept policy (2). Hadoop's instability guarantee for the 3.x alpha/beta phase As well as backwards compatibility, we need to think about Forwards compatibility, with the goal being: Any app written/shipped with the 3.x release binaries (JAR and native) will work in and against a 3.y Hadoop cluster, for all x, y in Natural where y=x and is-release(x) and is-release(y) That's important, as it means all server-side changes in 3.x which are expected to to mandate client-side updates: protocols, HDFS erasure decoding, security features, must be considered complete and stable before we can say is-release(x). In an ideal world, we'll even get the semantics right with tests to show this. Fixing classpath hell downstream is certainly one feature I am +1 on. But: it's
Re: Looking to a Hadoop 3 release
Thanks all. There is an open issue HDFS-6962 (ACLs inheritance conflicts with umaskmode), for which the incompatibility appears to make it not suitable for 2.x and it's targetted 3.0, please see: https://issues.apache.org/jira/browse/HDFS-6962?focusedCommentId=14335418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14335418 Best, --Yongjun On Wed, Mar 4, 2015 at 8:13 PM, Allen Wittenauer a...@altiscale.com wrote: One of the questions that keeps popping up is “what exactly is in trunk?” As some may recall, I had done some experiments creating the change log based upon JIRA. While the interest level appeared to be approaching zero, I kept playing with it a bit and eventually also started playing with the release notes script (for various reasons I won’t bore you with.) In any case, I’ve started posting the results of these runs on one of my github repos if anyone was wanting a quick reference as to JIRA’s opinion on the matter: https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0
Re: Looking to a Hadoop 3 release
On Mon, Mar 2, 2015 at 11:04 PM, Konstantin Shvachko shv.had...@gmail.com wrote: 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and other versions. If that somehow beneficial for commercial vendors, which I don't see how, for the community it was proven to be very disruptive. Would be really good to avoid it this time. Agreed; let's try to minimize backporting headaches. Pulling trunk branch-2 branch-2.x is already tedious. Adding a branch-3, branch-3.x would be obnoxious. 3. Could we release Hadoop 3 directly from trunk? With a proper feature freeze in advance. Current trunk is in the best working condition I've seen in years - much better, than when hadoop-2 was coming to life. It could make a good alpha. +1 This sounds like a good approach. Marked as alpha, we can break compatibility in minor versions. Stabilizing a beta can correspond with cutting branch-3, since that will be winding down branch-2. This shouldn't disrupt existing plans for branch-2. However, this requires that committers not accumulate too much compatibility debt in trunk. Undoing all that in branch-3 imposes a burdensome tax. Scanning through Allen's diff: that doesn't appear to be the case so far, but it recommends against developing features in place on trunk. Just be considerate of users and developers who will need to move from (and maintain) branch-2. I believe we can start planning 3.0 from trunk right after 2.7 is out. If we're publishing a snapshot, we don't need too much planning. -C On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
Let's not dismiss this quite so handily. Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we could make classpath isolation opt-in via configuration, what we really want longer term is to have it on by default (or just always on). Stack in particular points out the practical difficulties in using an opt-in method in 2.x from a downstream project perspective. It's not pretty. The plan that both Sean and Jason propose (which I support) is to have an opt-in solution in 2.x, bake it there, then turn it on by default (incompatible) in a new major release. I think this lines up well with my proposal of some alphas and betas leading up to a GA 3.x. I'm also willing to help with 2.x release management if that would help with testing this feature. Even setting aside classpath isolation, a new major release is still justified by JDK8. Somehow this is being ignored in the discussion. Allen, historically the voice of the user in our community, just highlighted it as a major compatibility issue, and myself and Tucu have also expressed our very strong concerns about bumping this in a minor release. 2.7's bump is a unique exception, but this is not something to be cited as precedent or policy. Where does this resistance to a new major release stem from? As I've described from the beginning, this will look basically like a 2.x release, except for the inclusion of classpath isolation by default and target version JDK8. I've expressed my desire to maintain API and wire compatibility, and we can audit the set of incompatible changes in trunk to ensure this. My proposal for doing alpha and beta releases leading up to GA also gives downstreams a nice amount of time for testing and validation. Regards, Andrew On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy a...@hortonworks.com wrote: Awesome, looks like we can just do this in a compatible manner - nothing else on the list seems like it warrants a (premature) major release. Thanks Vinod. Arun From: Vinod Kumar Vavilapalli vino...@hortonworks.com Sent: Tuesday, March 03, 2015 2:30 PM To: common-dev@hadoop.apache.org Cc: hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Looking to a Hadoop 3 release I started pitching in more on that JIRA. To add, I think we can and should strive for doing this in a compatible manner, whatever the approach. Marking and calling it incompatible before we see proposal/patch seems premature to me. Commented the same on JIRA: https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875 . Thanks +Vinod On Mar 2, 2015, at 8:08 PM, Andrew Wang andrew.w...@cloudera.commailto: andrew.w...@cloudera.com wrote: Regarding classpath isolation, based on what I hear from our customers, it's still a big problem (even after the MR classloader work). The latest Jackson version bump was quite painful for our downstream projects, and the HDFS client still leaks a lot of dependencies. Would welcome more discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already chimed in.
Re: Looking to a Hadoop 3 release
In general +1 on 3.0.0. Its time. If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time. While two big items have been called out as possible incompatible changes, and there is ongoing discussion as to whether they are or not*, is there any chance of getting a longer list of big differences between the branches? In particular I'd be interested in improvements that are 'off' by default that would be better defaulted 'on'. Thanks, St.Ack * Let me note that 'compatible' around these parts is a trampled concept seemingly open to interpretation with a definition that is other than prevails elsewhere in software. See Allen's list above, and in our downstream project, the recent HBASE-13149 HBase server MR tools are broken on Hadoop 2.5+ Yarn, among others. Let 3.x be incompatible with 2.x if only so we can leave behind all current notions of 'compatibility' and just start over (as per Allen). On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
RE: Looking to a Hadoop 3 release
Might I have some comments for this, just providing my thought. Thanks. If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time. Not only for down streamers to align with the long term release, but also for contributors like me to align with their future effort, maybe. In addition to the JDK8 support and classpath isolation, might we add more possible candidate considerations. How would you like this one, HADOOP-9797 Pluggable and compatible UGI change ? https://issues.apache.org/jira/browse/HADOOP-9797 The benefits: 1) allow multiple login sessions/contexts and authentication methods to be used in the same Java application/process without conflicts, providing good isolation by getting rid of globals and statics. 2) allow to pluggable new authentication methods for UGI, in modular, manageable and maintainable manner. Another, we would also push the first release of Apache Kerby, preparing for a strong dedicated and clean Kerberos library in Java for both client and KDC sides, and by leveraging the library, update Hadoop-MiniKDC and perform more security tests. https://issues.apache.org/jira/browse/DIRKRB-102 Hope this makes sense. Thanks. Regards, Kai -Original Message- From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack Sent: Thursday, March 05, 2015 2:47 AM To: common-dev@hadoop.apache.org Cc: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Looking to a Hadoop 3 release In general +1 on 3.0.0. Its time. If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time. While two big items have been called out as possible incompatible changes, and there is ongoing discussion as to whether they are or not*, is there any chance of getting a longer list of big differences between the branches? In particular I'd be interested in improvements that are 'off' by default that would be better defaulted 'on'. Thanks, St.Ack * Let me note that 'compatible' around these parts is a trampled concept seemingly open to interpretation with a definition that is other than prevails elsewhere in software. See Allen's list above, and in our downstream project, the recent HBASE-13149 HBase server MR tools are broken on Hadoop 2.5+ Yarn, among others. Let 3.x be incompatible with 2.x if only so we can leave behind all current notions of 'compatibility' and just start over (as per Allen). On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
One of the questions that keeps popping up is “what exactly is in trunk?” As some may recall, I had done some experiments creating the change log based upon JIRA. While the interest level appeared to be approaching zero, I kept playing with it a bit and eventually also started playing with the release notes script (for various reasons I won’t bore you with.) In any case, I’ve started posting the results of these runs on one of my github repos if anyone was wanting a quick reference as to JIRA’s opinion on the matter: https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0
Re: Looking to a Hadoop 3 release
Hi Konst, thanks for taking a look. I think I essentially agree with your points. On Mon, Mar 2, 2015 at 11:04 PM, Konstantin Shvachko shv.had...@gmail.com wrote: Andrew, Hadoop 3 seems in general like a good idea to me. 1. I did not understand if you propose to release 3.0 instead of 2.7 or in addition? I think 2.7 is needed at least as a stabilization step for the 2.x line. I agree with this, 2.7 is needed, and I think Vinod/Arun are working on it now. I expect branch-2 to be maintained for a while yet, separate from a branch-3. 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and other versions. If that somehow beneficial for commercial vendors, which I don't see how, for the community it was proven to be very disruptive. Would be really good to avoid it this time. My motivations here are purely what I've stated above. I remember the pain of the branch-1 days as well, and this would be a far, far smaller difference. JDK8 min version and classpath isolation are compelling, yet incompatible, which is why I'm proposing Hadoop 3. Besides those two features, it should be approximately the same size as our 2.x releases. 3. Could we release Hadoop 3 directly from trunk? With a proper feature freeze in advance. Current trunk is in the best working condition I've seen in years - much better, than when hadoop-2 was coming to life. It could make a good alpha. I believe we can start planning 3.0 from trunk right after 2.7 is out. I agree with this, and would be okay with this if our audit of trunk reveals no incompatible changes we're uncomfortable releasing. I'll note though that committing to multiple branches is way easier now with git and cherry-pick, so that overhead is reduced. Rolling out an alpha now is strictly a good thing for our downstreams, even if it means we need to do extra commits. Thanks, Andrew
Re: Looking to a Hadoop 3 release
Hi Akira, thanks for responding, On Tue, Mar 3, 2015 at 4:04 AM, Akira AJISAKA ajisa...@oss.nttdata.co.jp wrote: Thanks Andrew for bringing this up. +1 mostly looks fine but I'm thinking it's not now to cut branch-3. classpath isolation IMHO, classpath isolation is a good thing to do. We should pay down the technical dept ASAP. I'm willing to help. I'm thinking we can cut branch-3 and release 3.0 alpha after HADOOP-11656 is fixed. That is, I'd like to mark this issue as a blocker for 3.0. I wonder that even if we cut branch-3 now, trunk and branch-3 would be the same for a while. That seems useless. I'm willing to wait a bit here, but I think even what we have now is worth kicking the tires, and either the JDK8 target version or classpath isolation would make it even more compelling. If you're worried about backport overheads, Konst's proposal of releasing directly from trunk might be appealing. Needs some more examination though. JDK8 As Steve suggested, JDK8 can be in both trunk and branch-2. +1 for moving to JDK8 ASAP. We can make sure branch-2 runs well under JDK8, but I'm against doing a target version bump to JDK8 like we're planning to do for JDK7 in a minor release. As I described in my reply to Arun, that was a special circumstance, and JDK target version bumps really are deserving of a new major release. maintaining 2.x For user side, now there is little merit to upgrade to 3.x. More important thing is how long 2.x will be maintained. Therefore we should consider when to stop backporting new features to 2.x, and when to stop maintaining 2.x. I'd like to maintain 2.x as long as possible, at least one year after 3.x GA release. The value in releasing alphas right now is not so much for end users, but for downstream projects which need time to integrate. I don't expect end-users to really jump on 3.x until the downstreams have also rolled new releases based on 3.x. Determining when support for 2.x is over is done by the community. I personally plan to keep backporting for a while after 3.x GA is released. If backports to branch-2 tail off, it just takes one committer with the interest to keep maintaining it. This has been a common thing in HBase for instance, Lars H maintained 0.92 for a long time because he had the interest. * Other issue What's the current status of HDFS symlink? If HADOOP-10019 requires some incompatible changes, I'd like to include in 3.x. There are still a lot of unresolved compatibility and security issues, especially with cross-filesystem symlinks. We tabled this work before, and frankly I'm not sure these issues will ever be satisfactorily resolved. Even today, there are plenty of Unix apps that don't handle symlinks correctly, and we still lack equivalents of more secure syscalls like openat() in the first place. Thanks, Andrew
Re: Looking to a Hadoop 3 release
Hi Junping, thanks for your response, I view branch-3 as essentially the same size as our recent 2.x releases, with the exception of incompatible changes like classpath isolation and JDK8 target version. These, while perhaps not revolutionary, are still incompatible, and require a major version bump. I don't see a forking of the community effort, since backports should flow pretty easily from branch-3 to branch-2 the same way they currently can flow from branch-2 to branch-2.6. It's just an extra git commit, not like what we had to deal with in the branch-1 days with a custom backport. Hopefully that addresses your concerns. Thanks, Andrew On Tue, Mar 3, 2015 at 6:12 AM, Junping Du j...@hortonworks.com wrote: Thanks all for good discussions here. +1 on supporting Java 8 ASAP. In addition, I agree that we should separating this effort with cutting down Hadoop 3. IMO, Hadoop is still very cool today, and we should only consider Hadoop 3 until we have revolutionary feature (like YARN for 2.0) which deserve to break fundamental compatibilities. Or it may just cause more distractions for community effort. Just 2 cents. Thanks, Junping From: Akira AJISAKA ajisa...@oss.nttdata.co.jp Sent: Tuesday, March 03, 2015 12:04 PM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Looking to a Hadoop 3 release Thanks Andrew for bringing this up. +1 mostly looks fine but I'm thinking it's not now to cut branch-3. classpath isolation IMHO, classpath isolation is a good thing to do. We should pay down the technical dept ASAP. I'm willing to help. I'm thinking we can cut branch-3 and release 3.0 alpha after HADOOP-11656 is fixed. That is, I'd like to mark this issue as a blocker for 3.0. I wonder that even if we cut branch-3 now, trunk and branch-3 would be the same for a while. That seems useless. JDK8 As Steve suggested, JDK8 can be in both trunk and branch-2. +1 for moving to JDK8 ASAP. maintaining 2.x For user side, now there is little merit to upgrade to 3.x. More important thing is how long 2.x will be maintained. Therefore we should consider when to stop backporting new features to 2.x, and when to stop maintaining 2.x. I'd like to maintain 2.x as long as possible, at least one year after 3.x GA release. * Other issue What's the current status of HDFS symlink? If HADOOP-10019 requires some incompatible changes, I'd like to include in 3.x. Regards, Akira On 3/2/15 15:19, Andrew Wang wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
I want to understand a lot more about the classpath isolation (HADOOP-11656) proposal, specifically, what is proposed and does it have to be tagged as incompatible? That's a bigger change than must setting javac.version=8 in the POM —though given what a fundamental problem it addresses, I'm in favour of doing something there. On 3 March 2015 at 08:05:46, Andrew Wang (andrew.w...@cloudera.commailto:andrew.w...@cloudera.com) wrote: I view branch-3 as essentially the same size as our recent 2.x releases, with the exception of incompatible changes like classpath isolation and JDK8 target version. These, while perhaps not revolutionary, are still incompatible, and require a major version bump.
Re: Looking to a Hadoop 3 release
Totally agreed. I just left a comment there on the current state and what is needed. As of now, I think the big (and only?) changes are flipping the default classloader for tasks and splitting the HDFS jar. Thanks, +Vinod On Mar 3, 2015, at 9:02 AM, Steve Loughran ste...@hortonworks.commailto:ste...@hortonworks.com wrote: I want to understand a lot more about the classpath isolation (HADOOP-11656) proposal, specifically, what is proposed and does it have to be tagged as incompatible? That's a bigger change than must setting javac.version=8 in the POM —though given what a fundamental problem it addresses, I'm in favour of doing something there. On 3 March 2015 at 08:05:46, Andrew Wang (andrew.w...@cloudera.commailto:andrew.w...@cloudera.com) wrote: I view branch-3 as essentially the same size as our recent 2.x releases, with the exception of incompatible changes like classpath isolation and JDK8 target version. These, while perhaps not revolutionary, are still incompatible, and require a major version bump.
Re: Looking to a Hadoop 3 release
Between: * removing -finalize * breaking HDFS browsing * changing du’s output (in the 2.7 branch) * changing various names of metrics (either intentionally or otherwise) * changing the JDK release … and probably lots of other stuff in branch-2 I haven’t seen/know about, our best course of action is to: $ git rm hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md At least this way we as caretakers don’t come across as hypocrits. It’s pretty clear the direction has shown we only care about API compatibility and the rest is ignored when it isn’t “convenient”. [The next time someone tells you that Hadoop is hard to operate, I want you think about this email.] (1) Making 2.7 build with JDK7 led to the *exact* situation I figured it would: now we have a precedent where we just say to the community “You know those guarantees? Yeah, you might as well ignore them because we’re going to change the core component any damn time we feel like it.” We haven’t made a release branch off of trunk since branch-0.23. If anyone thinks that’s healthy, there is some beach property in Alberta you might be interested in as well. Our release cycle came to a screeching halt after 0.20 and we’ve never recovered. However, I offer an alternative. This same circular argument comes up all the time: (2) * There aren’t enough changes in trunk to make a new branch. * We can’t upgrade/change component X because there is no plan to make a new major release. To quote Frozen: Let It Go We’re probably at the point where there aren’t likely to be very many more earth shattering changes to the Hadoop code base. The community has decided instead to push these types of changes as separate projects via incubator to avoid the committer paralysis that this community suffers. Because of this, I don’t think the “enough changes” argument works anymore. Instead, we need to pick a new metric to build a cadence to force regular updates. I’d offer that the “every two years” JDK EOL sets the perfect cadence, matched by many other enterprise and OSS software, and gives us an opportunity to reflect in the version number that the critical component of our software has changed. This cadence allows for people to plan appropriately and know what our roadmap and direction actually is. Folks are more likely to build “real” solutions rather than make compromises that suffer in quality in the name of compatibility simply because they don’t know when their work will actually show up. We’ll have a normal, regular opportunity to update dependencies (regardless of the state of HADOOP-11656). Now, if you’ll excuse me, I have more contributor's patches to go through. (1) FWIW, I made the decision not to worry about backward compatibility in the shell code rewrite when I made the realization that the jsvc log and pid file names were poorly chosen to allow for certain capabilities. Did anyone actually touch them from outside the software? Probably not. But it is still effectively an interface, so off to trunk it went. (2) … and that’s before we even get to the “Version numbers are cheap” arguments that were made during the Great Renames of 0.20 and 0.23.
Re: Looking to a Hadoop 3 release
On Mar 3, 2015, at 9:36 AM, Karthik Kambatla ka...@cloudera.com wrote: If we preserve API compat and try to preserve wire compat, I don't see the harm in bumping the major release. If we preserve compatibility, then there is no need to bump major number. It allows us to include several fixes/features in trunk in a release. If we are not actively thinking of a way to release items in trunk, why even have it? What are the fixes and features in trunk that you would like to see get out quickly? Can these be back ported easily to branch 2? sanjay
Re: Looking to a Hadoop 3 release
Awesome, looks like we can just do this in a compatible manner - nothing else on the list seems like it warrants a (premature) major release. Thanks Vinod. Arun From: Vinod Kumar Vavilapalli vino...@hortonworks.com Sent: Tuesday, March 03, 2015 2:30 PM To: common-dev@hadoop.apache.org Cc: hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Looking to a Hadoop 3 release I started pitching in more on that JIRA. To add, I think we can and should strive for doing this in a compatible manner, whatever the approach. Marking and calling it incompatible before we see proposal/patch seems premature to me. Commented the same on JIRA: https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875. Thanks +Vinod On Mar 2, 2015, at 8:08 PM, Andrew Wang andrew.w...@cloudera.commailto:andrew.w...@cloudera.com wrote: Regarding classpath isolation, based on what I hear from our customers, it's still a big problem (even after the MR classloader work). The latest Jackson version bump was quite painful for our downstream projects, and the HDFS client still leaks a lot of dependencies. Would welcome more discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already chimed in.
Re: Looking to a Hadoop 3 release
I started pitching in more on that JIRA. To add, I think we can and should strive for doing this in a compatible manner, whatever the approach. Marking and calling it incompatible before we see proposal/patch seems premature to me. Commented the same on JIRA: https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875. Thanks +Vinod On Mar 2, 2015, at 8:08 PM, Andrew Wang andrew.w...@cloudera.commailto:andrew.w...@cloudera.com wrote: Regarding classpath isolation, based on what I hear from our customers, it's still a big problem (even after the MR classloader work). The latest Jackson version bump was quite painful for our downstream projects, and the HDFS client still leaks a lot of dependencies. Would welcome more discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already chimed in.
Re: Looking to a Hadoop 3 release
I am surprised classpath-isolation is being called a minor issue. We have been hearing users complain about Hadoop leaking its dependencies into the classpath for a while now, Guava being the culprit often. Not being able to upgrade our dependencies without affecting users has started to hamper our development too; e.g. Guava conflict with upgrading Curator version. If we preserve API compat and try to preserve wire compat, I don't see the harm in bumping the major release. It allows us to include several fixes/features in trunk in a release. If we are not actively thinking of a way to release items in trunk, why even have it? If there are any disadvantages to doing a major release, I would like to know. May be, we could arrive at a plan to accomplish it without those problems. Thanks Karthik On Tue, Mar 3, 2015 at 9:02 AM, Steve Loughran ste...@hortonworks.com wrote: I want to understand a lot more about the classpath isolation (HADOOP-11656) proposal, specifically, what is proposed and does it have to be tagged as incompatible? That's a bigger change than must setting javac.version=8 in the POM —though given what a fundamental problem it addresses, I'm in favour of doing something there. On 3 March 2015 at 08:05:46, Andrew Wang (andrew.w...@cloudera.com) wrote: I view branch-3 as essentially the same size as our recent 2.x releases, with the exception of incompatible changes like classpath isolation and JDK8 target version. These, while perhaps not revolutionary, are still incompatible, and require a major version bump. -- Karthik Kambatla Software Engineer, Cloudera Inc. http://five.sentenc.es
Re: Looking to a Hadoop 3 release
Thanks all for good discussions here. +1 on supporting Java 8 ASAP. In addition, I agree that we should separating this effort with cutting down Hadoop 3. IMO, Hadoop is still very cool today, and we should only consider Hadoop 3 until we have revolutionary feature (like YARN for 2.0) which deserve to break fundamental compatibilities. Or it may just cause more distractions for community effort. Just 2 cents. Thanks, Junping From: Akira AJISAKA ajisa...@oss.nttdata.co.jp Sent: Tuesday, March 03, 2015 12:04 PM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Looking to a Hadoop 3 release Thanks Andrew for bringing this up. +1 mostly looks fine but I'm thinking it's not now to cut branch-3. classpath isolation IMHO, classpath isolation is a good thing to do. We should pay down the technical dept ASAP. I'm willing to help. I'm thinking we can cut branch-3 and release 3.0 alpha after HADOOP-11656 is fixed. That is, I'd like to mark this issue as a blocker for 3.0. I wonder that even if we cut branch-3 now, trunk and branch-3 would be the same for a while. That seems useless. JDK8 As Steve suggested, JDK8 can be in both trunk and branch-2. +1 for moving to JDK8 ASAP. maintaining 2.x For user side, now there is little merit to upgrade to 3.x. More important thing is how long 2.x will be maintained. Therefore we should consider when to stop backporting new features to 2.x, and when to stop maintaining 2.x. I'd like to maintain 2.x as long as possible, at least one year after 3.x GA release. * Other issue What's the current status of HDFS symlink? If HADOOP-10019 requires some incompatible changes, I'd like to include in 3.x. Regards, Akira On 3/2/15 15:19, Andrew Wang wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
+1 On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Guava etc. have been such a pain in the past. Can't wait to have a release we don't have to worry about what version of dependencies users want to use. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Are you saying we can use lambdas without re-writing all of Hadoop in Scala? Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. Will be glad to help. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew -- Karthik Kambatla Software Engineer, Cloudera Inc. http://five.sentenc.es
Re: Looking to a Hadoop 3 release
+1, this sounds like a good plan to me. Thanks a lot for volunteering to take this on, Andrew. Best, Aaron On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Looking to a Hadoop 3 release
Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
Thanks Andrew for the proposal. +1, and I will be happy to help. --Yongjun On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
Andrew Thanks for bringing up the issue of moving to Java8. Java8 is important However, I am not seeing a strong motivation for changing the major number. We can go to Java8 in the 2.series. The classpath issue for Hadoop-11656 is too minor to force a major number change (no pun intended). Lets separate the issue of Java8 and Hadoop 3.0 sanjay On Mar 2, 2015, at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Is moving to JDK8 fundamentally different from the move to JDK7? We are moving to JDK7 via release 2.7 that I am helping with now. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. Aren't the shell script rewrite changes supposed to be compatible? Thanks, +Vinod
Re: Looking to a Hadoop 3 release
+1 Happy to help too On Mon, Mar 2, 2015 at 3:57 PM, Yongjun Zhang yzh...@cloudera.com wrote: Thanks Andrew for the proposal. +1, and I will be happy to help. --Yongjun On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
RE: Looking to a Hadoop 3 release
JDK8 support is in the consideration, looks like many issues were reported and resolved already. https://issues.apache.org/jira/browse/HADOOP-11090 -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, March 03, 2015 7:20 AM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Looking to a Hadoop 3 release Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
RE: Looking to a Hadoop 3 release
Sorry for the bad. I thought it was sending to my colleagues. By the way, for the JDK8 support, we (Intel) would like to investigate further and help, thanks. Regards, Kai -Original Message- From: Zheng, Kai Sent: Tuesday, March 03, 2015 8:49 AM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: RE: Looking to a Hadoop 3 release JDK8 support is in the consideration, looks like many issues were reported and resolved already. https://issues.apache.org/jira/browse/HADOOP-11090 -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, March 03, 2015 7:20 AM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Looking to a Hadoop 3 release Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
RE: Looking to a Hadoop 3 release
+1 Regards, Yi Liu -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, March 03, 2015 7:20 AM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Looking to a Hadoop 3 release Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
+1 non-binding It is a nice to have hadoop 3.x release. My honor to help. Regards! Chen On Mon, Mar 2, 2015 at 4:58 PM, Zheng, Kai kai.zh...@intel.com wrote: Sorry for the bad. I thought it was sending to my colleagues. By the way, for the JDK8 support, we (Intel) would like to investigate further and help, thanks. Regards, Kai -Original Message- From: Zheng, Kai Sent: Tuesday, March 03, 2015 8:49 AM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: RE: Looking to a Hadoop 3 release JDK8 support is in the consideration, looks like many issues were reported and resolved already. https://issues.apache.org/jira/browse/HADOOP-11090 -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, March 03, 2015 7:20 AM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Looking to a Hadoop 3 release Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
Andrew, Thanks for bringing up this discussion. I'm a little puzzled for I feel like we are rehashing the same discussion from last year - where we agreed on a different course of action w.r.t switch to JDK7. IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount. Now, breaking compatibility is perfectly fine over time where there is sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the breakage. Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users? We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release. Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways. Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost. Overall, my biggest concern is the compatibility story vis-a-vis the benefit. Thoughts? thanks, Arun From: Andrew Wang andrew.w...@cloudera.com Sent: Monday, March 02, 2015 3:19 PM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Looking to a Hadoop 3 release Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
I'm +1 for a migrate to Java 8 as soon as possible. That's branch-2 trunk, as having them on the same language level makes cherrypicking stuff off trunk possible. That's particularly the case for Java 8 as it is the first major change to the language since Java 5. w.r.t shipping trunk as 3.x, it's going to take longer than planned. Hopefully not as long as the 2.x release process, but you never know. Which means I expect some more Hadoop 2 releases this year. We need to make the jump there too, get 2.7 out the door and include a roadmap in there to when the java 8+ only event happens across the codebase. -Steve ps. for anyone who wants a pure java8 build today, set -Djavac.version=1.8 on the classpath of a maven build. Last time I tried there were some (minor) bits of YARN that wouldn't compile... On 2 March 2015 at 18:31:00, Arun Murthy (a...@hortonworks.commailto:a...@hortonworks.com) wrote: Andrew, Thanks for bringing up this discussion. I'm a little puzzled for I feel like we are rehashing the same discussion from last year - where we agreed on a different course of action w.r.t switch to JDK7. IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount. Now, breaking compatibility is perfectly fine over time where there is sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the breakage. Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users? We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release. Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways. Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost. Overall, my biggest concern is the compatibility story vis-a-vis the benefit. Thoughts? thanks, Arun From: Andrew Wang andrew.w...@cloudera.com Sent: Monday, March 02, 2015 3:19 PM To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Looking to a Hadoop 3 release Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: Looking to a Hadoop 3 release
+1 It sounds like a good idea, especially regarding JDK. Regards JB On 03/03/2015 12:19 AM, Andrew Wang wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com
Re: Looking to a Hadoop 3 release
Thanks as always for the feedback everyone. Some inline comments to Arun's email, as his were the most extensive: Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor irritant given some existing solutions (e.g. a new default classloader), how do you quantify the benefit for users? I looked at our thread on this topic from last time, and we (meaning at least myself and Tucu) agreed to a one-time exception to the JDK7 bump in 2.x for practical reasons. We waited for so long that we had some assurance JDK6 was on the outs. Multiple distros also already had bumped their min version to JDK7. This is not true this time around. Bumping the JDK version is hugely impactful on the end user, and my email on the earlier thread still reflects my thoughts on JDK compatibility: http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3CCAGB5D2a5fEDfBApQyER_zyhc8a4Xd_ea1wJSsxxkiAiDZO9%2BNg%40mail.gmail.com%3E Regarding classpath isolation, based on what I hear from our customers, it's still a big problem (even after the MR classloader work). The latest Jackson version bump was quite painful for our downstream projects, and the HDFS client still leaks a lot of dependencies. Would welcome more discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already chimed in. Having the freedom to upgrade our dependencies at will would also be a big win for us as developers. We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM role for that release. Furthermore, I'm really concerned that this will be used as an opportunity to further break compat in more egregious ways. Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely prevent compat breakages such as the client-server wire protocol, I feel the point of a major release is kinda lost. Right now, the incompatible changes would be JDK8, classpath isolation, and whatever is already in trunk. I can audit these existing trunk changes when branch-3 is cut. I would like to keep this list as short as possible, to preserve wire compat and rolling upgrade. As far as major releases go, this is not one to be scared of. However, since it's incompatible, it still needs that major version bump. Best, Andrew P.S. Vinod, the shell script rewrite is incompatible. Allen intentionally excluded it from branch-2 for this reason.
Re: Looking to a Hadoop 3 release
Andrew, Hadoop 3 seems in general like a good idea to me. 1. I did not understand if you propose to release 3.0 instead of 2.7 or in addition? I think 2.7 is needed at least as a stabilization step for the 2.x line. 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and other versions. If that somehow beneficial for commercial vendors, which I don't see how, for the community it was proven to be very disruptive. Would be really good to avoid it this time. 3. Could we release Hadoop 3 directly from trunk? With a proper feature freeze in advance. Current trunk is in the best working condition I've seen in years - much better, than when hadoop-2 was coming to life. It could make a good alpha. I believe we can start planning 3.0 from trunk right after 2.7 is out. Thanks, --Konst On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew