Re: Looking to a Hadoop 3 release

2016-06-27 Thread Andrew Wang
A heads up that I think we're getting close on the blockers for the first
alpha. Looking at my list, I see two I'd like to get in still: YARN-5270
and HADOOP-13316. Will cut a branch and roll the release once those go in;
my test builds have looked good thus far.

My original plan was to do alphas and then beta in Aug/Sep, but given how
the create-release and L changes delayed us by a few months, it also
pushes out the beta timeframe. Given that Nov/Dec is often a quiet period
of development, I think a realistic new beta date is sometime early next
year (Jan/Feb). FYI.

Thanks,
Andrew

On Thu, May 12, 2016 at 5:20 PM, Karthik Kambatla 
wrote:

> I am with Vinod on avoiding merging mostly_complete_branches to trunk since
> we are not shipping any release off it. If 3.x releases going off of trunk
> is going to help with this, I am fine with that approach. We should still
> make sure to keep trunk-incompat small and not include large features.
>
> On Sat, Apr 23, 2016 at 6:53 PM, Chris Douglas 
> wrote:
>
> > If we're not starting branch-3/trunk, what would distinguish it from
> > trunk/trunk-incompat? Is it the same mechanism with different labels?
> >
> > That may be a reasonable strategy when we create branch-3, as a
> > release branch for beta. Releasing 3.x from trunk will help us figure
> > out which incompatibilities can be called out in an upgrade guide
> > (e.g., "new feature X is incompatible with uncommon configuration Y")
> > and which require code changes (e.g., "data loss upgrading a cluster
> > with feature X"). Given how long trunk has been unreleased, we need
> > more data from deployments to triage. How to manage transitions
> > between major versions will always be case-by-case; consensus on how
> > we'll address generic incompatible changes is not saving any work.
> >
> > Once created, removing functionality from branch-3 (leaving it in
> > trunk) _because_ nobody volunteers cycles to address urgent
> > compatibility issues is fair. It's also more workable than asking that
> > features be committed to a branch that we have no plan to release,
> > even as alpha. -C
> >
> > On Fri, Apr 22, 2016 at 6:50 PM, Vinod Kumar Vavilapalli
> >  wrote:
> > > Tx for your replies, Andrew.
> > >
> > >>> For exit criteria, how about we time box it? My plan was to do
> monthly
> > >> alphas through the summer, leading up to beta in late August / early
> > Sep.
> > >> At that point we freeze and stabilize for GA in Nov/Dec.
> > >
> > >
> > > Time-boxing is a reasonable exit-criterion.
> > >
> > >
> > >> In this case, does trunk-incompat essentially become the new trunk? Or
> > are
> > >> we treating trunk-incompat as a feature branch, which periodically
> > merges
> > >> changes from trunk?
> > >
> > >
> > > It’s the later. Essentially
> > >  - trunk-incompat = trunk + only incompatible changes, periodically
> kept
> > up-to-date to trunk
> > >  - trunk is always ready to ship
> > >  - and no compatible code gets left behind
> > >
> > > The reason for my proposal like this is to address the tension between
> > “there is lot of compatible code in trunk that we are not shipping” and
> > “don’t ship trunk, it has incompatibilities”. With this, we will not have
> > (compatible) code not getting shipped to users.
> > >
> > > Obviously, we can forget about all of my proposal completely if
> everyone
> > puts in all compatible code into branch-2 / branch-3 or whatever the main
> > releasable branch is. This didn’t work in practice, have seen this not
> > happening prominently during 0.21, and now 3.x.
> > >
> > > There is another related issue - "my feature is nearly ready, so I’ll
> > just merge it into trunk as we don’t release that anyways, but not the
> > current releasable branch - I’m lazy to fix the last few stability
> related
> > issues”. With this, we will (should) get more disciplined, take feature
> > stability on a branch seriously and merge a feature branch only when it
> is
> > truly ready!
> > >
> > >> For 3.x, my strawman was to release off trunk for the alphas, then
> > branch a
> > >> branch-3 for the beta and onwards.
> > >
> > >
> > > Repeating above, I’m proposing continuing to make GA 3.x releases also
> > off of trunk! This way only incompatible changes don’t get shipped to
> users
> > - by design! Eventually, trunk-incompat will be latest 3.x GA + enough
> > incompatible code to warrant a 4.x, 5.x etc.
> > >
> > > +Vinod
> >
>


Re: Looking to a Hadoop 3 release

2016-05-12 Thread Karthik Kambatla
I am with Vinod on avoiding merging mostly_complete_branches to trunk since
we are not shipping any release off it. If 3.x releases going off of trunk
is going to help with this, I am fine with that approach. We should still
make sure to keep trunk-incompat small and not include large features.

On Sat, Apr 23, 2016 at 6:53 PM, Chris Douglas  wrote:

> If we're not starting branch-3/trunk, what would distinguish it from
> trunk/trunk-incompat? Is it the same mechanism with different labels?
>
> That may be a reasonable strategy when we create branch-3, as a
> release branch for beta. Releasing 3.x from trunk will help us figure
> out which incompatibilities can be called out in an upgrade guide
> (e.g., "new feature X is incompatible with uncommon configuration Y")
> and which require code changes (e.g., "data loss upgrading a cluster
> with feature X"). Given how long trunk has been unreleased, we need
> more data from deployments to triage. How to manage transitions
> between major versions will always be case-by-case; consensus on how
> we'll address generic incompatible changes is not saving any work.
>
> Once created, removing functionality from branch-3 (leaving it in
> trunk) _because_ nobody volunteers cycles to address urgent
> compatibility issues is fair. It's also more workable than asking that
> features be committed to a branch that we have no plan to release,
> even as alpha. -C
>
> On Fri, Apr 22, 2016 at 6:50 PM, Vinod Kumar Vavilapalli
>  wrote:
> > Tx for your replies, Andrew.
> >
> >>> For exit criteria, how about we time box it? My plan was to do monthly
> >> alphas through the summer, leading up to beta in late August / early
> Sep.
> >> At that point we freeze and stabilize for GA in Nov/Dec.
> >
> >
> > Time-boxing is a reasonable exit-criterion.
> >
> >
> >> In this case, does trunk-incompat essentially become the new trunk? Or
> are
> >> we treating trunk-incompat as a feature branch, which periodically
> merges
> >> changes from trunk?
> >
> >
> > It’s the later. Essentially
> >  - trunk-incompat = trunk + only incompatible changes, periodically kept
> up-to-date to trunk
> >  - trunk is always ready to ship
> >  - and no compatible code gets left behind
> >
> > The reason for my proposal like this is to address the tension between
> “there is lot of compatible code in trunk that we are not shipping” and
> “don’t ship trunk, it has incompatibilities”. With this, we will not have
> (compatible) code not getting shipped to users.
> >
> > Obviously, we can forget about all of my proposal completely if everyone
> puts in all compatible code into branch-2 / branch-3 or whatever the main
> releasable branch is. This didn’t work in practice, have seen this not
> happening prominently during 0.21, and now 3.x.
> >
> > There is another related issue - "my feature is nearly ready, so I’ll
> just merge it into trunk as we don’t release that anyways, but not the
> current releasable branch - I’m lazy to fix the last few stability related
> issues”. With this, we will (should) get more disciplined, take feature
> stability on a branch seriously and merge a feature branch only when it is
> truly ready!
> >
> >> For 3.x, my strawman was to release off trunk for the alphas, then
> branch a
> >> branch-3 for the beta and onwards.
> >
> >
> > Repeating above, I’m proposing continuing to make GA 3.x releases also
> off of trunk! This way only incompatible changes don’t get shipped to users
> - by design! Eventually, trunk-incompat will be latest 3.x GA + enough
> incompatible code to warrant a 4.x, 5.x etc.
> >
> > +Vinod
>


Re: Looking to a Hadoop 3 release

2016-04-23 Thread Chris Douglas
If we're not starting branch-3/trunk, what would distinguish it from
trunk/trunk-incompat? Is it the same mechanism with different labels?

That may be a reasonable strategy when we create branch-3, as a
release branch for beta. Releasing 3.x from trunk will help us figure
out which incompatibilities can be called out in an upgrade guide
(e.g., "new feature X is incompatible with uncommon configuration Y")
and which require code changes (e.g., "data loss upgrading a cluster
with feature X"). Given how long trunk has been unreleased, we need
more data from deployments to triage. How to manage transitions
between major versions will always be case-by-case; consensus on how
we'll address generic incompatible changes is not saving any work.

Once created, removing functionality from branch-3 (leaving it in
trunk) _because_ nobody volunteers cycles to address urgent
compatibility issues is fair. It's also more workable than asking that
features be committed to a branch that we have no plan to release,
even as alpha. -C

On Fri, Apr 22, 2016 at 6:50 PM, Vinod Kumar Vavilapalli
 wrote:
> Tx for your replies, Andrew.
>
>>> For exit criteria, how about we time box it? My plan was to do monthly
>> alphas through the summer, leading up to beta in late August / early Sep.
>> At that point we freeze and stabilize for GA in Nov/Dec.
>
>
> Time-boxing is a reasonable exit-criterion.
>
>
>> In this case, does trunk-incompat essentially become the new trunk? Or are
>> we treating trunk-incompat as a feature branch, which periodically merges
>> changes from trunk?
>
>
> It’s the later. Essentially
>  - trunk-incompat = trunk + only incompatible changes, periodically kept 
> up-to-date to trunk
>  - trunk is always ready to ship
>  - and no compatible code gets left behind
>
> The reason for my proposal like this is to address the tension between “there 
> is lot of compatible code in trunk that we are not shipping” and “don’t ship 
> trunk, it has incompatibilities”. With this, we will not have (compatible) 
> code not getting shipped to users.
>
> Obviously, we can forget about all of my proposal completely if everyone puts 
> in all compatible code into branch-2 / branch-3 or whatever the main 
> releasable branch is. This didn’t work in practice, have seen this not 
> happening prominently during 0.21, and now 3.x.
>
> There is another related issue - "my feature is nearly ready, so I’ll just 
> merge it into trunk as we don’t release that anyways, but not the current 
> releasable branch - I’m lazy to fix the last few stability related issues”. 
> With this, we will (should) get more disciplined, take feature stability on a 
> branch seriously and merge a feature branch only when it is truly ready!
>
>> For 3.x, my strawman was to release off trunk for the alphas, then branch a
>> branch-3 for the beta and onwards.
>
>
> Repeating above, I’m proposing continuing to make GA 3.x releases also off of 
> trunk! This way only incompatible changes don’t get shipped to users - by 
> design! Eventually, trunk-incompat will be latest 3.x GA + enough 
> incompatible code to warrant a 4.x, 5.x etc.
>
> +Vinod


Re: Looking to a Hadoop 3 release

2016-04-22 Thread Vinod Kumar Vavilapalli
Tx for your replies, Andrew.

>> For exit criteria, how about we time box it? My plan was to do monthly
> alphas through the summer, leading up to beta in late August / early Sep.
> At that point we freeze and stabilize for GA in Nov/Dec.


Time-boxing is a reasonable exit-criterion.


> In this case, does trunk-incompat essentially become the new trunk? Or are
> we treating trunk-incompat as a feature branch, which periodically merges
> changes from trunk?


It’s the later. Essentially
 - trunk-incompat = trunk + only incompatible changes, periodically kept 
up-to-date to trunk
 - trunk is always ready to ship
 - and no compatible code gets left behind

The reason for my proposal like this is to address the tension between “there 
is lot of compatible code in trunk that we are not shipping” and “don’t ship 
trunk, it has incompatibilities”. With this, we will not have (compatible) code 
not getting shipped to users.

Obviously, we can forget about all of my proposal completely if everyone puts 
in all compatible code into branch-2 / branch-3 or whatever the main releasable 
branch is. This didn’t work in practice, have seen this not happening 
prominently during 0.21, and now 3.x.

There is another related issue - "my feature is nearly ready, so I’ll just 
merge it into trunk as we don’t release that anyways, but not the current 
releasable branch - I’m lazy to fix the last few stability related issues”. 
With this, we will (should) get more disciplined, take feature stability on a 
branch seriously and merge a feature branch only when it is truly ready!

> For 3.x, my strawman was to release off trunk for the alphas, then branch a
> branch-3 for the beta and onwards.


Repeating above, I’m proposing continuing to make GA 3.x releases also off of 
trunk! This way only incompatible changes don’t get shipped to users - by 
design! Eventually, trunk-incompat will be latest 3.x GA + enough incompatible 
code to warrant a 4.x, 5.x etc.

+Vinod

Re: Looking to a Hadoop 3 release

2016-04-22 Thread Allen Wittenauer

> On Apr 22, 2016, at 6:10 PM, Vinod Kumar Vavilapalli  
> wrote:
> 
> Nope.
> 
> I’m proposing making a new 3.x release (as has been discussed in this thread) 
> off today’s trunk (instead of creating a fresh branch-3) and create a new 
> trunk-incompt where incompatible changes that we don’t want in 3.x go.
> 
> This is mainly to avoid repeating the “we are not releasing 3.x off trunk” 
> issue when we start thinking about 4.x or any such major release in the 
> future.

The only difference between “we aren’t releasing 4.x off of trunk” and 
“we aren’t releasing 4.x off of trunk-incompat” is 10 characters.

Re: Looking to a Hadoop 3 release

2016-04-22 Thread Andrew Wang
Great comments Vinod, thanks for replying.

Since trunk is a superset of branch-2.8, I think the two efforts are mostly
aligned. The 2.8 blockers are likely also 3.0 blockers. For example, the
create-release and L JIRAs I mentioned are in this camp. The difference
between the two is the expectation as to the level of quality. Once we get
create-release and L settled, I think it's ready for an alpha. Yes, this
means we ship with some known issues, but right now there's no 3.0 artifact
for downstreams to compile and test against. Considering that we're
shipping incompatible changes, I want to give downstreams as much
opportunity to give feedback as possible.

While welcoming the push for alphas, i think we should set some exit
> criteria. Otherwise, I can imagine us doing 3/4/5 alpha releases, and then
> getting restless about calling it beta or GA of whatever. Essentially,
> instead of today’s questions as to "why we aren’t doing a 3.x release",
> we’d be fielding a "why is 3.x still considered alpha” question. This
> happened with 2.x alpha releases too and it wasn’t fun.
>
> For exit criteria, how about we time box it? My plan was to do monthly
alphas through the summer, leading up to beta in late August / early Sep.
At that point we freeze and stabilize for GA in Nov/Dec.

I think we all have an interest in declaring beta/GA, no one wants eternal
alpha releases.

On an unrelated note, offline I was pitching to a bunch of contributors
> another idea to deal with rotting trunk post 3.x: *Make 3.x releases off of
> trunk directly*.
>
> What this gains us is that
>  - Trunk is always nearly stable or nearly ready for releases
>  - We no longer have some code lying around in some branch (today’s trunk)
> that is not releasable because it gets mixed with other undesirable and
> incompatible changes.
>  - This needs to be coupled with more discipline on individual features -
> medium to to large features are always worked upon in branches and get
> merged into trunk (and a nearing release!) when they are ready
>  - All incompatible changes go into some sort of a trunk-incompat branch
> and stay there till we accumulate enough of those to warrant another major
> release.
>

In this case, does trunk-incompat essentially become the new trunk? Or are
we treating trunk-incompat as a feature branch, which periodically merges
changes from trunk?

Linux has a "next" branch for separate from master for integrating pending
feature branches. I think this is a good model, and would be even better if
we published artifacts to assist with testing. However, that depends on
someone stepping up to be the maintainer of the integration branch.

I really like a more stringent policy around branch merges and new feature
development. That'd be great.

For 3.x, my strawman was to release off trunk for the alphas, then branch a
branch-3 for the beta and onwards.

Best,
Andrew


Re: Looking to a Hadoop 3 release

2016-04-22 Thread Vinod Kumar Vavilapalli
Nope.

I’m proposing making a new 3.x release (as has been discussed in this thread) 
off today’s trunk (instead of creating a fresh branch-3) and create a new 
trunk-incompt where incompatible changes that we don’t want in 3.x go.

This is mainly to avoid repeating the “we are not releasing 3.x off trunk” 
issue when we start thinking about 4.x or any such major release in the future.

We’ll do 2.8.x independently and later figure out if 2.9 is needed or not.

+Vinod

> On Apr 22, 2016, at 5:59 PM, Allen Wittenauer  wrote:
> 
> 
>> On Apr 22, 2016, at 5:38 PM, Vinod Kumar Vavilapalli  
>> wrote:
>> 
>> On an unrelated note, offline I was pitching to a bunch of contributors 
>> another idea to deal with rotting trunk post 3.x: *Make 3.x releases off of 
>> trunk directly*.
>> 
>> What this gains us is that
>> - Trunk is always nearly stable or nearly ready for releases
>> - We no longer have some code lying around in some branch (today’s trunk) 
>> that is not releasable because it gets mixed with other undesirable and 
>> incompatible changes.
>> - This needs to be coupled with more discipline on individual features - 
>> medium to to large features are always worked upon in branches and get 
>> merged into trunk (and a nearing release!) when they are ready
>> - All incompatible changes go into some sort of a trunk-incompat branch and 
>> stay there till we accumulate enough of those to warrant another major 
>> release.
>> 
>> Thoughts?
> 
>   Unless I’m missing something, all this proposal does is (using today’s 
> branch names) effectively rename trunk to trunk-incompat and branch-2 to 
> trunk.  I’m unclear how moving "rotting trunk” to “rotting trunk-incompat” is 
> really progress.
> 
> 



Re: Looking to a Hadoop 3 release

2016-04-22 Thread Allen Wittenauer

> On Apr 22, 2016, at 5:38 PM, Vinod Kumar Vavilapalli  
> wrote:
> 
> On an unrelated note, offline I was pitching to a bunch of contributors 
> another idea to deal with rotting trunk post 3.x: *Make 3.x releases off of 
> trunk directly*.
> 
> What this gains us is that
> - Trunk is always nearly stable or nearly ready for releases
> - We no longer have some code lying around in some branch (today’s trunk) 
> that is not releasable because it gets mixed with other undesirable and 
> incompatible changes.
> - This needs to be coupled with more discipline on individual features - 
> medium to to large features are always worked upon in branches and get merged 
> into trunk (and a nearing release!) when they are ready
> - All incompatible changes go into some sort of a trunk-incompat branch and 
> stay there till we accumulate enough of those to warrant another major 
> release.
> 
> Thoughts?

Unless I’m missing something, all this proposal does is (using today’s 
branch names) effectively rename trunk to trunk-incompat and branch-2 to trunk. 
 I’m unclear how moving "rotting trunk” to “rotting trunk-incompat” is really 
progress.



Re: Looking to a Hadoop 3 release

2016-04-22 Thread Vinod Kumar Vavilapalli
.0)
>>>> and two stable releases (2.6.x and 2.7.x). It brings a lot of
>> challenges in
>>>> issues tracking and patch committing, not even mention the tremendous
>>>> effort of release verification and voting.
>>>>>> I would like to propose to wait 2.8 release become stable (may be 2nd
>>>> release in 2.8 branch cause first release is alpha due to discussion in
>>>> another email thread), then we can move to 3.0 as the only alpha
>> release.
>>>> In the meantime, we can bring more significant features (like ATS v2,
>> etc.)
>>>> to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe
>> that
>>>> make life easier. :)
>>>>>> Thoughts?
>>>>>> 
>>>>> 
>>>>> 2.8.0 is relatively close to shipping. I say relatively as I'm doing
>>>> some work with ATS 1.5 downstream and I'd like to make sure all that
>> works.
>>>> There's also a large collection of S3 and swift patches needing
>> attention
>>>> from any reviewers with time and credentials.
>>>>> 
>>>>> 3.x is going to take multiple iterations to stabilise, and with more
>>>> changes, more significant a rollout. I'd also like to do a complete
>> update
>>>> of all the dependencies before a final release, so we can have less
>>>> pressure to upgrade for a while, and get Sean's classloader patch in so
>>>> it's slightly less visible.
>>>>> 
>>>>> That means 3.0 is going to be an alpha release, not final.
>>>>> 
>>>>> one thing that could be shared is any build.xml automation of the
>>>> release process, to at least take away most of the manual steps in the
>>>> process, to have something more repeatable.
>>>>> 
>>>>> -steve
>>>>> 
>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Junping
>>>>>> 
>>>>>> From: Yongjun Zhang <yzh...@cloudera.com>
>>>>>> Sent: Friday, February 19, 2016 8:05 PM
>>>>>> To: hdfs-...@hadoop.apache.org
>>>>>> Cc: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
>>>> yarn-...@hadoop.apache.org
>>>>>> Subject: Re: Looking to a Hadoop 3 release
>>>>>> 
>>>>>> Thanks Andrew for initiating the effort!
>>>>>> 
>>>>>> +1 on pushing 3.x with extended alpha cycle, and continuing the more
>>>> stable
>>>>>> 2.x releases.
>>>>>> 
>>>>>> --Yongjun
>>>>>> 
>>>>>> On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang <
>> andrew.w...@cloudera.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Kai,
>>>>>>> 
>>>>>>> Sure, I'm open to it. It's a new major release, so we're allowed to
>>>> make
>>>>>>> these kinds of big changes. The idea behind the extended alpha
>> cycle is
>>>>>>> that downstreams can give us feedback. This way if we do anything
>> too
>>>>>>> radical, we can address it in the next alpha and have downstreams
>>>> re-test.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Andrew
>>>>>>> 
>>>>>>> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <kai.zh...@intel.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Andrew for driving this. Wonder if it's a good chance for
>>>>>>>> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in.
>> Note
>>>>>>> it's
>>>>>>>> not an incompatible change, but feel better to be done in the major
>>>>>>> release.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Kai
>>>>>>>> 
>>>>>>>> -Original Message-
>>>>>>>> From: Andrew Wang [mailto:andrew.w...@cloudera.com]
>>>>>>>> Sent: Friday, February 19, 2016 7:04 AM
>>>>>>>> To: hdfs-...@hadoop.apache.org; Kihwal Lee <kih...@yahoo-inc.com>
>>>>>>>> Cc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org;
>>>>>>>> yarn-...@hadoop.a

Re: Looking to a Hadoop 3 release

2016-04-22 Thread Vinod Kumar Vavilapalli
I kind of echo Junping’s comment too.

While 2.8 and 3.0 don’t need to be serialized in theory, in practice I’m 
desperately looking for help on 2.8.0. We haven’t been converging on 2.8.0 what 
with 50+ blocker / critical patches still unfinished. If postponing 3.x alpha 
to after a 2.8.0 alpha means undivided attention from the community, I’d 
strongly root for such a proposal.

Thanks
+Vinod

> On Feb 20, 2016, at 9:07 PM, Andrew Wang  wrote:
> 
> Hi Junping, thanks for the mail, inline:
> 
> On Sat, Feb 20, 2016 at 7:34 AM, Junping Du  wrote:
> 
>> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds
>> reasonable to have two alpha releases to go in parallel. Is EC feature the
>> main motivation of releasing hadoop 3 here? If so, I don't understand why
>> this feature cannot land on 2.8.x or 2.9.x as an alpha feature.
>> 
> 
> EC is one motivation, there are others too (JDK8, shell scripts, jar
> bumps). I'm open to EC going into branch-2, but I haven't seen any
> backporting yet and it's a lot of code.
> 
> 
>> If we release 3.0 in a month like plan proposed below, it means we will
>> have 4 active releases going in parallel - two alpha releases (2.8 and 3.0)
>> and two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in
>> issues tracking and patch committing, not even mention the tremendous
>> effort of release verification and voting.
>> I would like to propose to wait 2.8 release become stable (may be 2nd
>> release in 2.8 branch cause first release is alpha due to discussion in
>> another email thread), then we can move to 3.0 as the only alpha release.
>> In the meantime, we can bring more significant features (like ATS v2, etc.)
>> to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe that
>> make life easier. :)
>> Thoughts?
>> 
>> Based on some earlier mails in this chain, I was planning to release off
> trunk. This way we avoid having to commit to yet-another-branch, and makes
> tracking easier since trunk will always be a superset of the branch-2's.
> This does mean though that trunk needs to be stable, and we need to be more
> judicious with branch merges, and quickly revert broken code.
> 
> Regarding RM/voting/validation efforts, Steve mentioned some scripts that
> he uses to automate Slider releases. This is something I'd like to bring
> over to Hadoop. Ideally, publishing an RC is push-button, and it comes with
> automated validation. I think this will help with the overhead. Also, since
> these will be early alphas, and there will be a lot of them, I'm not
> expecting anyone to do endurance runs on a large cluster before casting a
> +1.
> 
> Best,
> Andrew



Re: Looking to a Hadoop 3 release

2016-04-21 Thread Andrew Wang
Hi folks,

Very optimistically, we're still on track for a 3.0 alpha this month.
Here's a JIRA query for 3.0 and 2.8:

https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20MAPREDUCE%2C%20YARN)%20AND%20%22Target%20Version%2Fs%22%20in%20(3.0.0%2C%202.8.0)%20AND%20statusCategory%20not%20in%20(Complete)%20ORDER%20BY%20priority

I think two of these are true alpha blockers: HADOOP-12892 and
HADOOP-12893. I'm trying to help push both of those forward.

For the rest, I think it's probably okay to delay until the next alpha,
since we're planning a few alphas leading up to beta. That said, if you are
the owner of a Blocker targeted at 3.0.0, I'd encourage reviving those
patches. The earlier the better for incompatible changes.

In all likelihood, this first release will slip into early May, but I'll be
disappointed if we don't have an RC out before ApacheCon.

Best,
Andrew

On Mon, Feb 22, 2016 at 3:19 PM, Colin P. McCabe <cmcc...@apache.org> wrote:

> I think starting a 3.0 alpha soon would be a great idea.  As some
> other people commented, this would come with no compatibility
> guarantees, so that we can iron out any issues.
>
> Colin
>
> On Mon, Feb 22, 2016 at 1:26 PM, Zhe Zhang <zhezh...@cloudera.com> wrote:
> > Thanks Andrew for driving the effort!
> >
> > +1 (non-binding) on starting the 3.0 release process now with 3.0 as an
> > alpha.
> >
> > I wanted to echo Andrew's point that backporting EC to branch-2 is a lot
> of
> > work. Considering that no concrete backporting plan has been proposed, it
> > seems quite uncertain whether / when it can be released in 2.9. I think
> we
> > should rather concentrate our EC dev efforts to harden key features under
> > the follow-on umbrella HDFS-8031 and make it solid for a 3.0 release.
> >
> > Sincerely,
> > Zhe
> >
> > On Mon, Feb 22, 2016 at 9:25 AM Colin P. McCabe <cmcc...@apache.org>
> wrote:
> >
> >> +1 for a release of 3.0.  There are a lot of significant,
> >> compatibility-breaking, but necessary changes in this release... we've
> >> touched on some of them in this thread.
> >>
> >> +1 for a parallel release of 2.8 as well.  I think we are pretty close
> >> to this, barring a dozen or so blockers.
> >>
> >> best,
> >> Colin
> >>
> >> On Mon, Feb 22, 2016 at 2:56 AM, Steve Loughran <ste...@hortonworks.com
> >
> >> wrote:
> >> >
> >> >> On 20 Feb 2016, at 15:34, Junping Du <j...@hortonworks.com> wrote:
> >> >>
> >> >> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds
> >> reasonable to have two alpha releases to go in parallel. Is EC feature
> the
> >> main motivation of releasing hadoop 3 here? If so, I don't understand
> why
> >> this feature cannot land on 2.8.x or 2.9.x as an alpha feature.
> >> >
> >> >
> >> >
> >> >> If we release 3.0 in a month like plan proposed below, it means we
> will
> >> have 4 active releases going in parallel - two alpha releases (2.8 and
> 3.0)
> >> and two stable releases (2.6.x and 2.7.x). It brings a lot of
> challenges in
> >> issues tracking and patch committing, not even mention the tremendous
> >> effort of release verification and voting.
> >> >> I would like to propose to wait 2.8 release become stable (may be 2nd
> >> release in 2.8 branch cause first release is alpha due to discussion in
> >> another email thread), then we can move to 3.0 as the only alpha
> release.
> >> In the meantime, we can bring more significant features (like ATS v2,
> etc.)
> >> to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe
> that
> >> make life easier. :)
> >> >> Thoughts?
> >> >>
> >> >
> >> > 2.8.0 is relatively close to shipping. I say relatively as I'm doing
> >> some work with ATS 1.5 downstream and I'd like to make sure all that
> works.
> >> There's also a large collection of S3 and swift patches needing
> attention
> >> from any reviewers with time and credentials.
> >> >
> >> > 3.x is going to take multiple iterations to stabilise, and with more
> >> changes, more significant a rollout. I'd also like to do a complete
> update
> >> of all the dependencies before a final release, so we can have less
> >> pressure to upgrade for a while, and get Sean's classloader patch in so
> >> it's slightly less visible.
> >> >
> >> > That means 3.0 is going to be an alpha release, not final.
>

Re: Looking to a Hadoop 3 release

2016-02-22 Thread Colin P. McCabe
I think starting a 3.0 alpha soon would be a great idea.  As some
other people commented, this would come with no compatibility
guarantees, so that we can iron out any issues.

Colin

On Mon, Feb 22, 2016 at 1:26 PM, Zhe Zhang <zhezh...@cloudera.com> wrote:
> Thanks Andrew for driving the effort!
>
> +1 (non-binding) on starting the 3.0 release process now with 3.0 as an
> alpha.
>
> I wanted to echo Andrew's point that backporting EC to branch-2 is a lot of
> work. Considering that no concrete backporting plan has been proposed, it
> seems quite uncertain whether / when it can be released in 2.9. I think we
> should rather concentrate our EC dev efforts to harden key features under
> the follow-on umbrella HDFS-8031 and make it solid for a 3.0 release.
>
> Sincerely,
> Zhe
>
> On Mon, Feb 22, 2016 at 9:25 AM Colin P. McCabe <cmcc...@apache.org> wrote:
>
>> +1 for a release of 3.0.  There are a lot of significant,
>> compatibility-breaking, but necessary changes in this release... we've
>> touched on some of them in this thread.
>>
>> +1 for a parallel release of 2.8 as well.  I think we are pretty close
>> to this, barring a dozen or so blockers.
>>
>> best,
>> Colin
>>
>> On Mon, Feb 22, 2016 at 2:56 AM, Steve Loughran <ste...@hortonworks.com>
>> wrote:
>> >
>> >> On 20 Feb 2016, at 15:34, Junping Du <j...@hortonworks.com> wrote:
>> >>
>> >> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds
>> reasonable to have two alpha releases to go in parallel. Is EC feature the
>> main motivation of releasing hadoop 3 here? If so, I don't understand why
>> this feature cannot land on 2.8.x or 2.9.x as an alpha feature.
>> >
>> >
>> >
>> >> If we release 3.0 in a month like plan proposed below, it means we will
>> have 4 active releases going in parallel - two alpha releases (2.8 and 3.0)
>> and two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in
>> issues tracking and patch committing, not even mention the tremendous
>> effort of release verification and voting.
>> >> I would like to propose to wait 2.8 release become stable (may be 2nd
>> release in 2.8 branch cause first release is alpha due to discussion in
>> another email thread), then we can move to 3.0 as the only alpha release.
>> In the meantime, we can bring more significant features (like ATS v2, etc.)
>> to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe that
>> make life easier. :)
>> >> Thoughts?
>> >>
>> >
>> > 2.8.0 is relatively close to shipping. I say relatively as I'm doing
>> some work with ATS 1.5 downstream and I'd like to make sure all that works.
>> There's also a large collection of S3 and swift patches needing attention
>> from any reviewers with time and credentials.
>> >
>> > 3.x is going to take multiple iterations to stabilise, and with more
>> changes, more significant a rollout. I'd also like to do a complete update
>> of all the dependencies before a final release, so we can have less
>> pressure to upgrade for a while, and get Sean's classloader patch in so
>> it's slightly less visible.
>> >
>> > That means 3.0 is going to be an alpha release, not final.
>> >
>> > one thing that could be shared is any build.xml automation of the
>> release process, to at least take away most of the manual steps in the
>> process, to have something more repeatable.
>> >
>> > -steve
>> >
>> >
>> >> Thanks,
>> >>
>> >> Junping
>> >> 
>> >> From: Yongjun Zhang <yzh...@cloudera.com>
>> >> Sent: Friday, February 19, 2016 8:05 PM
>> >> To: hdfs-...@hadoop.apache.org
>> >> Cc: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
>> yarn-...@hadoop.apache.org
>> >> Subject: Re: Looking to a Hadoop 3 release
>> >>
>> >> Thanks Andrew for initiating the effort!
>> >>
>> >> +1 on pushing 3.x with extended alpha cycle, and continuing the more
>> stable
>> >> 2.x releases.
>> >>
>> >> --Yongjun
>> >>
>> >> On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang <andrew.w...@cloudera.com>
>> >> wrote:
>> >>
>> >>> Hi Kai,
>> >>>
>> >>> Sure, I'm open to it. It's a new major release, so we're allowed to
>> make
>> >>> these kinds of big changes. The ide

Re: Looking to a Hadoop 3 release

2016-02-22 Thread Zhe Zhang
Thanks Andrew for driving the effort!

+1 (non-binding) on starting the 3.0 release process now with 3.0 as an
alpha.

I wanted to echo Andrew's point that backporting EC to branch-2 is a lot of
work. Considering that no concrete backporting plan has been proposed, it
seems quite uncertain whether / when it can be released in 2.9. I think we
should rather concentrate our EC dev efforts to harden key features under
the follow-on umbrella HDFS-8031 and make it solid for a 3.0 release.

Sincerely,
Zhe

On Mon, Feb 22, 2016 at 9:25 AM Colin P. McCabe <cmcc...@apache.org> wrote:

> +1 for a release of 3.0.  There are a lot of significant,
> compatibility-breaking, but necessary changes in this release... we've
> touched on some of them in this thread.
>
> +1 for a parallel release of 2.8 as well.  I think we are pretty close
> to this, barring a dozen or so blockers.
>
> best,
> Colin
>
> On Mon, Feb 22, 2016 at 2:56 AM, Steve Loughran <ste...@hortonworks.com>
> wrote:
> >
> >> On 20 Feb 2016, at 15:34, Junping Du <j...@hortonworks.com> wrote:
> >>
> >> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds
> reasonable to have two alpha releases to go in parallel. Is EC feature the
> main motivation of releasing hadoop 3 here? If so, I don't understand why
> this feature cannot land on 2.8.x or 2.9.x as an alpha feature.
> >
> >
> >
> >> If we release 3.0 in a month like plan proposed below, it means we will
> have 4 active releases going in parallel - two alpha releases (2.8 and 3.0)
> and two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in
> issues tracking and patch committing, not even mention the tremendous
> effort of release verification and voting.
> >> I would like to propose to wait 2.8 release become stable (may be 2nd
> release in 2.8 branch cause first release is alpha due to discussion in
> another email thread), then we can move to 3.0 as the only alpha release.
> In the meantime, we can bring more significant features (like ATS v2, etc.)
> to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe that
> make life easier. :)
> >> Thoughts?
> >>
> >
> > 2.8.0 is relatively close to shipping. I say relatively as I'm doing
> some work with ATS 1.5 downstream and I'd like to make sure all that works.
> There's also a large collection of S3 and swift patches needing attention
> from any reviewers with time and credentials.
> >
> > 3.x is going to take multiple iterations to stabilise, and with more
> changes, more significant a rollout. I'd also like to do a complete update
> of all the dependencies before a final release, so we can have less
> pressure to upgrade for a while, and get Sean's classloader patch in so
> it's slightly less visible.
> >
> > That means 3.0 is going to be an alpha release, not final.
> >
> > one thing that could be shared is any build.xml automation of the
> release process, to at least take away most of the manual steps in the
> process, to have something more repeatable.
> >
> > -steve
> >
> >
> >> Thanks,
> >>
> >> Junping
> >> ____
> >> From: Yongjun Zhang <yzh...@cloudera.com>
> >> Sent: Friday, February 19, 2016 8:05 PM
> >> To: hdfs-...@hadoop.apache.org
> >> Cc: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
> yarn-...@hadoop.apache.org
> >> Subject: Re: Looking to a Hadoop 3 release
> >>
> >> Thanks Andrew for initiating the effort!
> >>
> >> +1 on pushing 3.x with extended alpha cycle, and continuing the more
> stable
> >> 2.x releases.
> >>
> >> --Yongjun
> >>
> >> On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang <andrew.w...@cloudera.com>
> >> wrote:
> >>
> >>> Hi Kai,
> >>>
> >>> Sure, I'm open to it. It's a new major release, so we're allowed to
> make
> >>> these kinds of big changes. The idea behind the extended alpha cycle is
> >>> that downstreams can give us feedback. This way if we do anything too
> >>> radical, we can address it in the next alpha and have downstreams
> re-test.
> >>>
> >>> Best,
> >>> Andrew
> >>>
> >>> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <kai.zh...@intel.com>
> wrote:
> >>>
> >>>> Thanks Andrew for driving this. Wonder if it's a good chance for
> >>>> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note
> >>> it's
> >>>> not an 

Re: Looking to a Hadoop 3 release

2016-02-22 Thread Colin P. McCabe
+1 for a release of 3.0.  There are a lot of significant,
compatibility-breaking, but necessary changes in this release... we've
touched on some of them in this thread.

+1 for a parallel release of 2.8 as well.  I think we are pretty close
to this, barring a dozen or so blockers.

best,
Colin

On Mon, Feb 22, 2016 at 2:56 AM, Steve Loughran <ste...@hortonworks.com> wrote:
>
>> On 20 Feb 2016, at 15:34, Junping Du <j...@hortonworks.com> wrote:
>>
>> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds 
>> reasonable to have two alpha releases to go in parallel. Is EC feature the 
>> main motivation of releasing hadoop 3 here? If so, I don't understand why 
>> this feature cannot land on 2.8.x or 2.9.x as an alpha feature.
>
>
>
>> If we release 3.0 in a month like plan proposed below, it means we will have 
>> 4 active releases going in parallel - two alpha releases (2.8 and 3.0) and 
>> two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in 
>> issues tracking and patch committing, not even mention the tremendous effort 
>> of release verification and voting.
>> I would like to propose to wait 2.8 release become stable (may be 2nd 
>> release in 2.8 branch cause first release is alpha due to discussion in 
>> another email thread), then we can move to 3.0 as the only alpha release. In 
>> the meantime, we can bring more significant features (like ATS v2, etc.) to 
>> trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe that 
>> make life easier. :)
>> Thoughts?
>>
>
> 2.8.0 is relatively close to shipping. I say relatively as I'm doing some 
> work with ATS 1.5 downstream and I'd like to make sure all that works. 
> There's also a large collection of S3 and swift patches needing attention 
> from any reviewers with time and credentials.
>
> 3.x is going to take multiple iterations to stabilise, and with more changes, 
> more significant a rollout. I'd also like to do a complete update of all the 
> dependencies before a final release, so we can have less pressure to upgrade 
> for a while, and get Sean's classloader patch in so it's slightly less 
> visible.
>
> That means 3.0 is going to be an alpha release, not final.
>
> one thing that could be shared is any build.xml automation of the release 
> process, to at least take away most of the manual steps in the process, to 
> have something more repeatable.
>
> -steve
>
>
>> Thanks,
>>
>> Junping
>> 
>> From: Yongjun Zhang <yzh...@cloudera.com>
>> Sent: Friday, February 19, 2016 8:05 PM
>> To: hdfs-...@hadoop.apache.org
>> Cc: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
>> yarn-...@hadoop.apache.org
>> Subject: Re: Looking to a Hadoop 3 release
>>
>> Thanks Andrew for initiating the effort!
>>
>> +1 on pushing 3.x with extended alpha cycle, and continuing the more stable
>> 2.x releases.
>>
>> --Yongjun
>>
>> On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang <andrew.w...@cloudera.com>
>> wrote:
>>
>>> Hi Kai,
>>>
>>> Sure, I'm open to it. It's a new major release, so we're allowed to make
>>> these kinds of big changes. The idea behind the extended alpha cycle is
>>> that downstreams can give us feedback. This way if we do anything too
>>> radical, we can address it in the next alpha and have downstreams re-test.
>>>
>>> Best,
>>> Andrew
>>>
>>> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <kai.zh...@intel.com> wrote:
>>>
>>>> Thanks Andrew for driving this. Wonder if it's a good chance for
>>>> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note
>>> it's
>>>> not an incompatible change, but feel better to be done in the major
>>> release.
>>>>
>>>> Regards,
>>>> Kai
>>>>
>>>> -Original Message-
>>>> From: Andrew Wang [mailto:andrew.w...@cloudera.com]
>>>> Sent: Friday, February 19, 2016 7:04 AM
>>>> To: hdfs-...@hadoop.apache.org; Kihwal Lee <kih...@yahoo-inc.com>
>>>> Cc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org;
>>>> yarn-...@hadoop.apache.org
>>>> Subject: Re: Looking to a Hadoop 3 release
>>>>
>>>> Hi Kihwal,
>>>>
>>>> I think there's still value in continuing the 2.x releases. 3.x comes
>>> with
>>>> the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
>>>> be be

Re: Looking to a Hadoop 3 release

2016-02-22 Thread Steve Loughran

> On 20 Feb 2016, at 15:34, Junping Du <j...@hortonworks.com> wrote:
> 
> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds reasonable 
> to have two alpha releases to go in parallel. Is EC feature the main 
> motivation of releasing hadoop 3 here? If so, I don't understand why this 
> feature cannot land on 2.8.x or 2.9.x as an alpha feature. 



> If we release 3.0 in a month like plan proposed below, it means we will have 
> 4 active releases going in parallel - two alpha releases (2.8 and 3.0) and 
> two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in 
> issues tracking and patch committing, not even mention the tremendous effort 
> of release verification and voting.
> I would like to propose to wait 2.8 release become stable (may be 2nd release 
> in 2.8 branch cause first release is alpha due to discussion in another email 
> thread), then we can move to 3.0 as the only alpha release. In the meantime, 
> we can bring more significant features (like ATS v2, etc.) to trunk and 
> consolidate stable releases in 2.6.x and 2.7.x. I believe that make life 
> easier. :)
> Thoughts?
> 

2.8.0 is relatively close to shipping. I say relatively as I'm doing some work 
with ATS 1.5 downstream and I'd like to make sure all that works. There's also 
a large collection of S3 and swift patches needing attention from any reviewers 
with time and credentials. 

3.x is going to take multiple iterations to stabilise, and with more changes, 
more significant a rollout. I'd also like to do a complete update of all the 
dependencies before a final release, so we can have less pressure to upgrade 
for a while, and get Sean's classloader patch in so it's slightly less visible.

That means 3.0 is going to be an alpha release, not final. 

one thing that could be shared is any build.xml automation of the release 
process, to at least take away most of the manual steps in the process, to have 
something more repeatable.

-steve


> Thanks,
> 
> Junping 
> 
> From: Yongjun Zhang <yzh...@cloudera.com>
> Sent: Friday, February 19, 2016 8:05 PM
> To: hdfs-...@hadoop.apache.org
> Cc: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
> yarn-...@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
> 
> Thanks Andrew for initiating the effort!
> 
> +1 on pushing 3.x with extended alpha cycle, and continuing the more stable
> 2.x releases.
> 
> --Yongjun
> 
> On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
> 
>> Hi Kai,
>> 
>> Sure, I'm open to it. It's a new major release, so we're allowed to make
>> these kinds of big changes. The idea behind the extended alpha cycle is
>> that downstreams can give us feedback. This way if we do anything too
>> radical, we can address it in the next alpha and have downstreams re-test.
>> 
>> Best,
>> Andrew
>> 
>> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <kai.zh...@intel.com> wrote:
>> 
>>> Thanks Andrew for driving this. Wonder if it's a good chance for
>>> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note
>> it's
>>> not an incompatible change, but feel better to be done in the major
>> release.
>>> 
>>> Regards,
>>> Kai
>>> 
>>> -Original Message-----
>>> From: Andrew Wang [mailto:andrew.w...@cloudera.com]
>>> Sent: Friday, February 19, 2016 7:04 AM
>>> To: hdfs-...@hadoop.apache.org; Kihwal Lee <kih...@yahoo-inc.com>
>>> Cc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org;
>>> yarn-...@hadoop.apache.org
>>> Subject: Re: Looking to a Hadoop 3 release
>>> 
>>> Hi Kihwal,
>>> 
>>> I think there's still value in continuing the 2.x releases. 3.x comes
>> with
>>> the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
>>> be beta or GA for some number of months. In the meanwhile, it'd be good
>> to
>>> keep putting out regular, stable 2.x releases.
>>> 
>>> Best,
>>> Andrew
>>> 
>>> 
>>> On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid
>>> 
>>> wrote:
>>> 
>>>> Moving Hadoop 3 forward sounds fine. If EC is one of the main
>>>> motivations, are we getting rid of branch-2.8?
>>>> 
>>>> Kihwal
>>>> 
>>>>  From: Andrew Wang <andrew.w...@cloudera.com>
>>>> To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>
>>>> Cc: "yarn-...@hadoop.apache

Re: Looking to a Hadoop 3 release

2016-02-20 Thread Andrew Wang
Hi Junping, thanks for the mail, inline:

On Sat, Feb 20, 2016 at 7:34 AM, Junping Du  wrote:

> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds
> reasonable to have two alpha releases to go in parallel. Is EC feature the
> main motivation of releasing hadoop 3 here? If so, I don't understand why
> this feature cannot land on 2.8.x or 2.9.x as an alpha feature.
>

EC is one motivation, there are others too (JDK8, shell scripts, jar
bumps). I'm open to EC going into branch-2, but I haven't seen any
backporting yet and it's a lot of code.


> If we release 3.0 in a month like plan proposed below, it means we will
> have 4 active releases going in parallel - two alpha releases (2.8 and 3.0)
> and two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in
> issues tracking and patch committing, not even mention the tremendous
> effort of release verification and voting.
> I would like to propose to wait 2.8 release become stable (may be 2nd
> release in 2.8 branch cause first release is alpha due to discussion in
> another email thread), then we can move to 3.0 as the only alpha release.
> In the meantime, we can bring more significant features (like ATS v2, etc.)
> to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe that
> make life easier. :)
> Thoughts?
>
> Based on some earlier mails in this chain, I was planning to release off
trunk. This way we avoid having to commit to yet-another-branch, and makes
tracking easier since trunk will always be a superset of the branch-2's.
This does mean though that trunk needs to be stable, and we need to be more
judicious with branch merges, and quickly revert broken code.

Regarding RM/voting/validation efforts, Steve mentioned some scripts that
he uses to automate Slider releases. This is something I'd like to bring
over to Hadoop. Ideally, publishing an RC is push-button, and it comes with
automated validation. I think this will help with the overhead. Also, since
these will be early alphas, and there will be a lot of them, I'm not
expecting anyone to do endurance runs on a large cluster before casting a
+1.

Best,
Andrew


Re: Looking to a Hadoop 3 release

2016-02-20 Thread Junping Du
Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds reasonable 
to have two alpha releases to go in parallel. Is EC feature the main motivation 
of releasing hadoop 3 here? If so, I don't understand why this feature cannot 
land on 2.8.x or 2.9.x as an alpha feature. 
If we release 3.0 in a month like plan proposed below, it means we will have 4 
active releases going in parallel - two alpha releases (2.8 and 3.0) and two 
stable releases (2.6.x and 2.7.x). It brings a lot of challenges in issues 
tracking and patch committing, not even mention the tremendous effort of 
release verification and voting.
I would like to propose to wait 2.8 release become stable (may be 2nd release 
in 2.8 branch cause first release is alpha due to discussion in another email 
thread), then we can move to 3.0 as the only alpha release. In the meantime, we 
can bring more significant features (like ATS v2, etc.) to trunk and 
consolidate stable releases in 2.6.x and 2.7.x. I believe that make life 
easier. :)
Thoughts?

Thanks,

Junping 

From: Yongjun Zhang <yzh...@cloudera.com>
Sent: Friday, February 19, 2016 8:05 PM
To: hdfs-...@hadoop.apache.org
Cc: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

Thanks Andrew for initiating the effort!

+1 on pushing 3.x with extended alpha cycle, and continuing the more stable
2.x releases.

--Yongjun

On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang <andrew.w...@cloudera.com>
wrote:

> Hi Kai,
>
> Sure, I'm open to it. It's a new major release, so we're allowed to make
> these kinds of big changes. The idea behind the extended alpha cycle is
> that downstreams can give us feedback. This way if we do anything too
> radical, we can address it in the next alpha and have downstreams re-test.
>
> Best,
> Andrew
>
> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <kai.zh...@intel.com> wrote:
>
> > Thanks Andrew for driving this. Wonder if it's a good chance for
> > HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note
> it's
> > not an incompatible change, but feel better to be done in the major
> release.
> >
> > Regards,
> > Kai
> >
> > -Original Message-
> > From: Andrew Wang [mailto:andrew.w...@cloudera.com]
> > Sent: Friday, February 19, 2016 7:04 AM
> > To: hdfs-...@hadoop.apache.org; Kihwal Lee <kih...@yahoo-inc.com>
> > Cc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org;
> > yarn-...@hadoop.apache.org
> > Subject: Re: Looking to a Hadoop 3 release
> >
> > Hi Kihwal,
> >
> > I think there's still value in continuing the 2.x releases. 3.x comes
> with
> > the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
> > be beta or GA for some number of months. In the meanwhile, it'd be good
> to
> > keep putting out regular, stable 2.x releases.
> >
> > Best,
> > Andrew
> >
> >
> > On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid
> >
> > wrote:
> >
> > > Moving Hadoop 3 forward sounds fine. If EC is one of the main
> > > motivations, are we getting rid of branch-2.8?
> > >
> > > Kihwal
> > >
> > >   From: Andrew Wang <andrew.w...@cloudera.com>
> > >  To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>
> > > Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; "
> > > mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>;
> > > hdfs-dev <hdfs-...@hadoop.apache.org>
> > >  Sent: Thursday, February 18, 2016 4:35 PM
> > >  Subject: Re: Looking to a Hadoop 3 release
> > >
> > > Hi all,
> > >
> > > Reviving this thread. I've seen renewed interest in a trunk release
> > > since HDFS erasure coding has not yet made it to branch-2. Along with
> > > JDK8, the shell script rewrite, and many other improvements, I think
> > > it's time to revisit Hadoop 3.0 release plans.
> > >
> > > My overall plan is still the same as in my original email: a series of
> > > regular alpha releases leading up to beta and GA. Alpha releases make
> > > it easier for downstreams to integrate with our code, and making them
> > > regular means features can be included when they are ready.
> > >
> > > I know there are some incompatible changes waiting in the wings (i.e.
> > > HDFS-6984 making FileStatus a PB rather than Writable, some of
> > > HADOOP-9991 bumping dependency versions) that would be good to get 

Re: Looking to a Hadoop 3 release

2016-02-19 Thread Yongjun Zhang
Thanks Andrew for initiating the effort!

+1 on pushing 3.x with extended alpha cycle, and continuing the more stable
2.x releases.

--Yongjun

On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang <andrew.w...@cloudera.com>
wrote:

> Hi Kai,
>
> Sure, I'm open to it. It's a new major release, so we're allowed to make
> these kinds of big changes. The idea behind the extended alpha cycle is
> that downstreams can give us feedback. This way if we do anything too
> radical, we can address it in the next alpha and have downstreams re-test.
>
> Best,
> Andrew
>
> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <kai.zh...@intel.com> wrote:
>
> > Thanks Andrew for driving this. Wonder if it's a good chance for
> > HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note
> it's
> > not an incompatible change, but feel better to be done in the major
> release.
> >
> > Regards,
> > Kai
> >
> > -Original Message-
> > From: Andrew Wang [mailto:andrew.w...@cloudera.com]
> > Sent: Friday, February 19, 2016 7:04 AM
> > To: hdfs-...@hadoop.apache.org; Kihwal Lee <kih...@yahoo-inc.com>
> > Cc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org;
> > yarn-...@hadoop.apache.org
> > Subject: Re: Looking to a Hadoop 3 release
> >
> > Hi Kihwal,
> >
> > I think there's still value in continuing the 2.x releases. 3.x comes
> with
> > the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
> > be beta or GA for some number of months. In the meanwhile, it'd be good
> to
> > keep putting out regular, stable 2.x releases.
> >
> > Best,
> > Andrew
> >
> >
> > On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid
> >
> > wrote:
> >
> > > Moving Hadoop 3 forward sounds fine. If EC is one of the main
> > > motivations, are we getting rid of branch-2.8?
> > >
> > > Kihwal
> > >
> > >   From: Andrew Wang <andrew.w...@cloudera.com>
> > >  To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>
> > > Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; "
> > > mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>;
> > > hdfs-dev <hdfs-...@hadoop.apache.org>
> > >  Sent: Thursday, February 18, 2016 4:35 PM
> > >  Subject: Re: Looking to a Hadoop 3 release
> > >
> > > Hi all,
> > >
> > > Reviving this thread. I've seen renewed interest in a trunk release
> > > since HDFS erasure coding has not yet made it to branch-2. Along with
> > > JDK8, the shell script rewrite, and many other improvements, I think
> > > it's time to revisit Hadoop 3.0 release plans.
> > >
> > > My overall plan is still the same as in my original email: a series of
> > > regular alpha releases leading up to beta and GA. Alpha releases make
> > > it easier for downstreams to integrate with our code, and making them
> > > regular means features can be included when they are ready.
> > >
> > > I know there are some incompatible changes waiting in the wings (i.e.
> > > HDFS-6984 making FileStatus a PB rather than Writable, some of
> > > HADOOP-9991 bumping dependency versions) that would be good to get in.
> > > If you have changes like this, please set the target version to 3.0.0
> > > and mark them "Incompatible". We can use this JIRA query to track:
> > >
> > >
> > > https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> > > 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%
> > > 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado
> > > op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
> > >
> > > There's some release-related stuff that needs to be sorted out
> > > (namely, the new CHANGES.txt and release note generation from Yetus),
> > > but I'd tentatively like to roll the first alpha a month out, so third
> > > week of March.
> > >
> > > Best,
> > > Andrew
> > >
> > > On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rst...@altiscale.com>
> > wrote:
> > >
> > > > Avoiding the use of JDK8 language features (and, presumably, APIs)
> > > > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> > > > source version to JDK8.
> > > >
> > > > Also, note that releasing from trunk 

Re: Looking to a Hadoop 3 release

2016-02-19 Thread Ravi Prakash
+1 for the plan to start cutting 3.x alpha releases. Thanks for the
initiative Andrew!

On Fri, Feb 19, 2016 at 6:19 AM, Steve Loughran 
wrote:

>
> > On 19 Feb 2016, at 11:27, Dmitry Sivachenko  wrote:
> >
> >
> >> On 19 Feb 2016, at 01:35, Andrew Wang  wrote:
> >>
> >> Hi all,
> >>
> >> Reviving this thread. I've seen renewed interest in a trunk release
> since
> >> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
> the
> >> shell script rewrite, and many other improvements, I think it's time to
> >> revisit Hadoop 3.0 release plans.
> >>
> >
>
> It's time to start ... I suspect it'll take a while to stabilise. I look
> forward to the new shell scripts already
>
> One thing I do want there is for all the alpha releases to make clear that
> there are no compatibility policies here; protocols may change and there is
> no requirement of the first 3.x release to be compatible with all the 3.0.x
> alphas. That's something we missed out on the 2.0.x-alpha process, or at
> least not repeated often enough.
>
> >
> > Hello,
> >
> > any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes
> out?
> >
> > Thanks!
> >
> >
>
> sounds like a good time for a status update on the FB work —and anything
> people can do to test it would be appreciated by all. That includes testing
> on ipv4 systems, and especially, IPv4/v6 systems with Kerberos turned on
> and both MIT and AD kerberos servers. At the same time, IPv6 support ought
> to be something that could be added in.
>
>
> I don't have any opinions on timescale, but
>
> +1 to anything related to classpath isolation
> +1 to a careful bump of versions of dependencies.
> +1 to fixing the outstanding Java 8 migration issues, especially the big
> Jersey patch that's just been updated.
> +1 to switching to JIRA-created release notes
>
> Having been doing the slider releases recently, it's clear to me that you
> can do a lot in automating the release process itself. All those steps in
> the release runbook can be turned into targets in a special ant release.xml
> build file, calling maven, gpg, etc.
>
> I think doing something like this for 3.0 will significantly benefit both
> the release phase here but the future releases
>
> This is the slider one:
> https://github.com/apache/incubator-slider/blob/develop/bin/release.xml
>
> It doesn't replace maven, instead it choreographs that along with all the
> other steps: signing and checksumming artifacts, publishing them, voting
>
> it includes
>  -refusing to release if the git repo is modified
>  -making the various git branch/tag/push operations
>  -issuing the various mvn versions:update commands
>  -signing
>  -publishing via asf SVN
>  -using GET calls too verify the artifacts made it
>  -generating the vote and vote result emails (it even counts the votes)
>
> I recommend this is included as part of the release process. It does make
> a difference; we can now cut new releases with no human intervention other
> than editing a properties file and running different targets as the process
> goes through its release and vote phases.
>
> -Steve


Re: Looking to a Hadoop 3 release

2016-02-19 Thread Steve Loughran

> On 19 Feb 2016, at 11:27, Dmitry Sivachenko  wrote:
> 
> 
>> On 19 Feb 2016, at 01:35, Andrew Wang  wrote:
>> 
>> Hi all,
>> 
>> Reviving this thread. I've seen renewed interest in a trunk release since
>> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
>> shell script rewrite, and many other improvements, I think it's time to
>> revisit Hadoop 3.0 release plans.
>> 
> 

It's time to start ... I suspect it'll take a while to stabilise. I look 
forward to the new shell scripts already

One thing I do want there is for all the alpha releases to make clear that 
there are no compatibility policies here; protocols may change and there is no 
requirement of the first 3.x release to be compatible with all the 3.0.x 
alphas. That's something we missed out on the 2.0.x-alpha process, or at least 
not repeated often enough.

> 
> Hello,
> 
> any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes out?
> 
> Thanks!
> 
> 

sounds like a good time for a status update on the FB work —and anything people 
can do to test it would be appreciated by all. That includes testing on ipv4 
systems, and especially, IPv4/v6 systems with Kerberos turned on and both MIT 
and AD kerberos servers. At the same time, IPv6 support ought to be something 
that could be added in.


I don't have any opinions on timescale, but

+1 to anything related to classpath isolation
+1 to a careful bump of versions of dependencies.
+1 to fixing the outstanding Java 8 migration issues, especially the big Jersey 
patch that's just been updated.
+1 to switching to JIRA-created release notes

Having been doing the slider releases recently, it's clear to me that you can 
do a lot in automating the release process itself. All those steps in the 
release runbook can be turned into targets in a special ant release.xml build 
file, calling maven, gpg, etc.

I think doing something like this for 3.0 will significantly benefit both the 
release phase here but the future releases

This is the slider one: 
https://github.com/apache/incubator-slider/blob/develop/bin/release.xml

It doesn't replace maven, instead it choreographs that along with all the other 
steps: signing and checksumming artifacts, publishing them, voting

it includes
 -refusing to release if the git repo is modified
 -making the various git branch/tag/push operations
 -issuing the various mvn versions:update commands
 -signing
 -publishing via asf SVN 
 -using GET calls too verify the artifacts made it
 -generating the vote and vote result emails (it even counts the votes)

I recommend this is included as part of the release process. It does make a 
difference; we can now cut new releases with no human intervention other than 
editing a properties file and running different targets as the process goes 
through its release and vote phases.

-Steve

Re: Looking to a Hadoop 3 release

2016-02-19 Thread Dmitry Sivachenko

> On 19 Feb 2016, at 01:35, Andrew Wang  wrote:
> 
> Hi all,
> 
> Reviving this thread. I've seen renewed interest in a trunk release since
> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
> shell script rewrite, and many other improvements, I think it's time to
> revisit Hadoop 3.0 release plans.
> 


Hello,

any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes out?

Thanks!



Re: Looking to a Hadoop 3 release

2016-02-18 Thread Akira AJISAKA

+1 for the 3.0 release plan and continuing 2.x releases.
I'm thinking we should consider stopping new 2.x minor releases after 
3.x reaches GA.


Thanks,
Akira

On 2/19/16 10:33, Gangumalla, Uma wrote:

Yes. I think starting 3.0 release with alpha is good idea. So it would get
some time to reach the beta or GA.

+1 for the plan.

For the compatibility purposes and as current stable versions, we should
continue 2.x releases anyway.

Thanks Andrew for starting the thread.

Regards,
Uma

On 2/18/16, 3:04 PM, "Andrew Wang" <andrew.w...@cloudera.com> wrote:


Hi Kihwal,

I think there's still value in continuing the 2.x releases. 3.x comes with
the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
be beta or GA for some number of months. In the meanwhile, it'd be good to
keep putting out regular, stable 2.x releases.

Best,
Andrew


On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid>
wrote:


Moving Hadoop 3 forward sounds fine. If EC is one of the main
motivations,
are we getting rid of branch-2.8?

Kihwal

   From: Andrew Wang <andrew.w...@cloudera.com>
  To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>
Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; "
mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>;
hdfs-dev <hdfs-...@hadoop.apache.org>
  Sent: Thursday, February 18, 2016 4:35 PM
  Subject: Re: Looking to a Hadoop 3 release

Hi all,

Reviving this thread. I've seen renewed interest in a trunk release
since
HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
the
shell script rewrite, and many other improvements, I think it's time to
revisit Hadoop 3.0 release plans.

My overall plan is still the same as in my original email: a series of
regular alpha releases leading up to beta and GA. Alpha releases make it
easier for downstreams to integrate with our code, and making them
regular
means features can be included when they are ready.

I know there are some incompatible changes waiting in the wings
(i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
HADOOP-9991 bumping dependency versions) that would be good to get in.
If
you have changes like this, please set the target version to 3.0.0 and
mark
them "Incompatible". We can use this JIRA query to track:



https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HD
FS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%
223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flag
s%22%3D%22Incompatible%20change%22%20order%20by%20priority

There's some release-related stuff that needs to be sorted out (namely,
the
new CHANGES.txt and release note generation from Yetus), but I'd
tentatively like to roll the first alpha a month out, so third week of
March.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rst...@altiscale.com>
wrote:


Avoiding the use of JDK8 language features (and, presumably, APIs)
means you've abandoned #1, i.e., you haven't (really) bumped the JDK
source version to JDK8.

Also, note that releasing from trunk is a way of achieving #3, it's
not a way of abandoning it.



On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <andrew.w...@cloudera.com>
wrote:

Hi Raymie,

Konst proposed just releasing off of trunk rather than cutting a

branch-2,

and there was general agreement there. So, consider #3 abandoned.

1&2
can

be achieved at the same time, we just need to avoid using JDK8

language

features in trunk so things can be backported.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rst...@altiscale.com>

wrote:



In this (and the related threads), I see the following three

requirements:


1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).

2. "We'll still be releasing 2.x releases for a while, with similar
feature sets as 3.x."

3. Avoid the "risk of split-brain behavior" by "minimize

backporting

headaches. Pulling trunk > branch-2 > branch-2.x is already

tedious.

Adding a branch-3, branch-3.x would be obnoxious."

These three cannot be achieved at the same time.  Which do we

abandon?



On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia

<sanjayo...@gmail.com>

wrote:



On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>

wrote:


2) Simplification of configs - potentially separating client

side

configs

and those used by daemons. This is another source of perpetual

confusion

for users.

+ 1 on this.

sanjay














Re: Looking to a Hadoop 3 release

2016-02-18 Thread Andrew Wang
Hi Kai,

Sure, I'm open to it. It's a new major release, so we're allowed to make
these kinds of big changes. The idea behind the extended alpha cycle is
that downstreams can give us feedback. This way if we do anything too
radical, we can address it in the next alpha and have downstreams re-test.

Best,
Andrew

On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai <kai.zh...@intel.com> wrote:

> Thanks Andrew for driving this. Wonder if it's a good chance for
> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note it's
> not an incompatible change, but feel better to be done in the major release.
>
> Regards,
> Kai
>
> -Original Message-
> From: Andrew Wang [mailto:andrew.w...@cloudera.com]
> Sent: Friday, February 19, 2016 7:04 AM
> To: hdfs-...@hadoop.apache.org; Kihwal Lee <kih...@yahoo-inc.com>
> Cc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org;
> yarn-...@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> Hi Kihwal,
>
> I think there's still value in continuing the 2.x releases. 3.x comes with
> the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
> be beta or GA for some number of months. In the meanwhile, it'd be good to
> keep putting out regular, stable 2.x releases.
>
> Best,
> Andrew
>
>
> On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid>
> wrote:
>
> > Moving Hadoop 3 forward sounds fine. If EC is one of the main
> > motivations, are we getting rid of branch-2.8?
> >
> > Kihwal
> >
> >   From: Andrew Wang <andrew.w...@cloudera.com>
> >  To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>
> > Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; "
> > mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>;
> > hdfs-dev <hdfs-...@hadoop.apache.org>
> >  Sent: Thursday, February 18, 2016 4:35 PM
> >  Subject: Re: Looking to a Hadoop 3 release
> >
> > Hi all,
> >
> > Reviving this thread. I've seen renewed interest in a trunk release
> > since HDFS erasure coding has not yet made it to branch-2. Along with
> > JDK8, the shell script rewrite, and many other improvements, I think
> > it's time to revisit Hadoop 3.0 release plans.
> >
> > My overall plan is still the same as in my original email: a series of
> > regular alpha releases leading up to beta and GA. Alpha releases make
> > it easier for downstreams to integrate with our code, and making them
> > regular means features can be included when they are ready.
> >
> > I know there are some incompatible changes waiting in the wings (i.e.
> > HDFS-6984 making FileStatus a PB rather than Writable, some of
> > HADOOP-9991 bumping dependency versions) that would be good to get in.
> > If you have changes like this, please set the target version to 3.0.0
> > and mark them "Incompatible". We can use this JIRA query to track:
> >
> >
> > https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> > 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%
> > 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado
> > op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
> >
> > There's some release-related stuff that needs to be sorted out
> > (namely, the new CHANGES.txt and release note generation from Yetus),
> > but I'd tentatively like to roll the first alpha a month out, so third
> > week of March.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rst...@altiscale.com>
> wrote:
> >
> > > Avoiding the use of JDK8 language features (and, presumably, APIs)
> > > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> > > source version to JDK8.
> > >
> > > Also, note that releasing from trunk is a way of achieving #3, it's
> > > not a way of abandoning it.
> > >
> > >
> > >
> > > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang
> > > <andrew.w...@cloudera.com>
> > > wrote:
> > > > Hi Raymie,
> > > >
> > > > Konst proposed just releasing off of trunk rather than cutting a
> > > branch-2,
> > > > and there was general agreement there. So, consider #3 abandoned.
> > > > 1&2
> > can
> > > > be achieved at the same time, we just need to avoid using JDK8
> > > > language features in trunk so things can be backported.
> > > >
> > > > Best,

Re: Looking to a Hadoop 3 release

2016-02-18 Thread Sangjin Lee
Another thing to throw in there is the dependency/classpath isolation
(HADOOP-11656). Some efforts have already been made by Sean, and it'd be
great to complete this to have a much better dependency isolation solution
for 3.x.

On Thu, Feb 18, 2016 at 5:33 PM, Gangumalla, Uma <uma.ganguma...@intel.com>
wrote:

> Yes. I think starting 3.0 release with alpha is good idea. So it would get
> some time to reach the beta or GA.
>
> +1 for the plan.
>
> For the compatibility purposes and as current stable versions, we should
> continue 2.x releases anyway.
>
> Thanks Andrew for starting the thread.
>
> Regards,
> Uma
>
> On 2/18/16, 3:04 PM, "Andrew Wang" <andrew.w...@cloudera.com> wrote:
>
> >Hi Kihwal,
> >
> >I think there's still value in continuing the 2.x releases. 3.x comes with
> >the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
> >be beta or GA for some number of months. In the meanwhile, it'd be good to
> >keep putting out regular, stable 2.x releases.
> >
> >Best,
> >Andrew
> >
> >
> >On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid
> >
> >wrote:
> >
> >> Moving Hadoop 3 forward sounds fine. If EC is one of the main
> >>motivations,
> >> are we getting rid of branch-2.8?
> >>
> >> Kihwal
> >>
> >>   From: Andrew Wang <andrew.w...@cloudera.com>
> >>  To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>
> >> Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; "
> >> mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>;
> >> hdfs-dev <hdfs-...@hadoop.apache.org>
> >>  Sent: Thursday, February 18, 2016 4:35 PM
> >>  Subject: Re: Looking to a Hadoop 3 release
> >>
> >> Hi all,
> >>
> >> Reviving this thread. I've seen renewed interest in a trunk release
> >>since
> >> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
> >>the
> >> shell script rewrite, and many other improvements, I think it's time to
> >> revisit Hadoop 3.0 release plans.
> >>
> >> My overall plan is still the same as in my original email: a series of
> >> regular alpha releases leading up to beta and GA. Alpha releases make it
> >> easier for downstreams to integrate with our code, and making them
> >>regular
> >> means features can be included when they are ready.
> >>
> >> I know there are some incompatible changes waiting in the wings
> >> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
> >> HADOOP-9991 bumping dependency versions) that would be good to get in.
> >>If
> >> you have changes like this, please set the target version to 3.0.0 and
> >>mark
> >> them "Incompatible". We can use this JIRA query to track:
> >>
> >>
> >>
> >>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HD
> >>FS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%
> >>223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flag
> >>s%22%3D%22Incompatible%20change%22%20order%20by%20priority
> >>
> >> There's some release-related stuff that needs to be sorted out (namely,
> >>the
> >> new CHANGES.txt and release note generation from Yetus), but I'd
> >> tentatively like to roll the first alpha a month out, so third week of
> >> March.
> >>
> >> Best,
> >> Andrew
> >>
> >> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rst...@altiscale.com>
> >>wrote:
> >>
> >> > Avoiding the use of JDK8 language features (and, presumably, APIs)
> >> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> >> > source version to JDK8.
> >> >
> >> > Also, note that releasing from trunk is a way of achieving #3, it's
> >> > not a way of abandoning it.
> >> >
> >> >
> >> >
> >> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <andrew.w...@cloudera.com
> >
> >> > wrote:
> >> > > Hi Raymie,
> >> > >
> >> > > Konst proposed just releasing off of trunk rather than cutting a
> >> > branch-2,
> >> > > and there was general agreement there. So, consider #3 abandoned.
> >>1&2
> >> can
> >> > > be achieved at the 

Re: Looking to a Hadoop 3 release

2016-02-18 Thread Gangumalla, Uma
Yes. I think starting 3.0 release with alpha is good idea. So it would get
some time to reach the beta or GA.

+1 for the plan.

For the compatibility purposes and as current stable versions, we should
continue 2.x releases anyway.

Thanks Andrew for starting the thread.

Regards,
Uma

On 2/18/16, 3:04 PM, "Andrew Wang" <andrew.w...@cloudera.com> wrote:

>Hi Kihwal,
>
>I think there's still value in continuing the 2.x releases. 3.x comes with
>the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
>be beta or GA for some number of months. In the meanwhile, it'd be good to
>keep putting out regular, stable 2.x releases.
>
>Best,
>Andrew
>
>
>On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid>
>wrote:
>
>> Moving Hadoop 3 forward sounds fine. If EC is one of the main
>>motivations,
>> are we getting rid of branch-2.8?
>>
>> Kihwal
>>
>>   From: Andrew Wang <andrew.w...@cloudera.com>
>>  To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>
>> Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; "
>> mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>;
>> hdfs-dev <hdfs-...@hadoop.apache.org>
>>  Sent: Thursday, February 18, 2016 4:35 PM
>>  Subject: Re: Looking to a Hadoop 3 release
>>
>> Hi all,
>>
>> Reviving this thread. I've seen renewed interest in a trunk release
>>since
>> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
>>the
>> shell script rewrite, and many other improvements, I think it's time to
>> revisit Hadoop 3.0 release plans.
>>
>> My overall plan is still the same as in my original email: a series of
>> regular alpha releases leading up to beta and GA. Alpha releases make it
>> easier for downstreams to integrate with our code, and making them
>>regular
>> means features can be included when they are ready.
>>
>> I know there are some incompatible changes waiting in the wings
>> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
>> HADOOP-9991 bumping dependency versions) that would be good to get in.
>>If
>> you have changes like this, please set the target version to 3.0.0 and
>>mark
>> them "Incompatible". We can use this JIRA query to track:
>>
>>
>> 
>>https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HD
>>FS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%
>>223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flag
>>s%22%3D%22Incompatible%20change%22%20order%20by%20priority
>>
>> There's some release-related stuff that needs to be sorted out (namely,
>>the
>> new CHANGES.txt and release note generation from Yetus), but I'd
>> tentatively like to roll the first alpha a month out, so third week of
>> March.
>>
>> Best,
>> Andrew
>>
>> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rst...@altiscale.com>
>>wrote:
>>
>> > Avoiding the use of JDK8 language features (and, presumably, APIs)
>> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
>> > source version to JDK8.
>> >
>> > Also, note that releasing from trunk is a way of achieving #3, it's
>> > not a way of abandoning it.
>> >
>> >
>> >
>> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <andrew.w...@cloudera.com>
>> > wrote:
>> > > Hi Raymie,
>> > >
>> > > Konst proposed just releasing off of trunk rather than cutting a
>> > branch-2,
>> > > and there was general agreement there. So, consider #3 abandoned.
>>1&2
>> can
>> > > be achieved at the same time, we just need to avoid using JDK8
>>language
>> > > features in trunk so things can be backported.
>> > >
>> > > Best,
>> > > Andrew
>> > >
>> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rst...@altiscale.com>
>> > wrote:
>> > >
>> > >> In this (and the related threads), I see the following three
>> > requirements:
>> > >>
>> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
>> > >>
>> > >> 2. "We'll still be releasing 2.x releases for a while, with similar
>> > >> feature sets as 3.x."
>> > >>
>> > >> 3. Avoid the "risk of split-brain behavior" by "minimize
>>backporting
>> > >> headaches. Pulling trunk > branch-2 > branch-2.x is already
>>tedious.
>> > >> Adding a branch-3, branch-3.x would be obnoxious."
>> > >>
>> > >> These three cannot be achieved at the same time.  Which do we
>>abandon?
>> > >>
>> > >>
>> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia
>><sanjayo...@gmail.com>
>> > >> wrote:
>> > >> >
>> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
>> wrote:
>> > >> >>
>> > >> >> 2) Simplification of configs - potentially separating client
>>side
>> > >> configs
>> > >> >> and those used by daemons. This is another source of perpetual
>> > confusion
>> > >> >> for users.
>> > >> > + 1 on this.
>> > >> >
>> > >> > sanjay
>> > >>
>> >
>>
>>
>>



RE: Looking to a Hadoop 3 release

2016-02-18 Thread Zheng, Kai
Thanks Andrew for driving this. Wonder if it's a good chance for HADOOP-12579 
(Deprecate and remove WriteableRPCEngine) to be in. Note it's not an 
incompatible change, but feel better to be done in the major release.

Regards,
Kai

-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Friday, February 19, 2016 7:04 AM
To: hdfs-...@hadoop.apache.org; Kihwal Lee <kih...@yahoo-inc.com>
Cc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

Hi Kihwal,

I think there's still value in continuing the 2.x releases. 3.x comes with the 
incompatible bump to a JDK8 runtime, and also the fact that 3.x won't be beta 
or GA for some number of months. In the meanwhile, it'd be good to keep putting 
out regular, stable 2.x releases.

Best,
Andrew


On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid>
wrote:

> Moving Hadoop 3 forward sounds fine. If EC is one of the main 
> motivations, are we getting rid of branch-2.8?
>
> Kihwal
>
>   From: Andrew Wang <andrew.w...@cloudera.com>
>  To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>
> Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; "
> mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>;
> hdfs-dev <hdfs-...@hadoop.apache.org>
>  Sent: Thursday, February 18, 2016 4:35 PM
>  Subject: Re: Looking to a Hadoop 3 release
>
> Hi all,
>
> Reviving this thread. I've seen renewed interest in a trunk release 
> since HDFS erasure coding has not yet made it to branch-2. Along with 
> JDK8, the shell script rewrite, and many other improvements, I think 
> it's time to revisit Hadoop 3.0 release plans.
>
> My overall plan is still the same as in my original email: a series of 
> regular alpha releases leading up to beta and GA. Alpha releases make 
> it easier for downstreams to integrate with our code, and making them 
> regular means features can be included when they are ready.
>
> I know there are some incompatible changes waiting in the wings (i.e. 
> HDFS-6984 making FileStatus a PB rather than Writable, some of
> HADOOP-9991 bumping dependency versions) that would be good to get in. 
> If you have changes like this, please set the target version to 3.0.0 
> and mark them "Incompatible". We can use this JIRA query to track:
>
>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%
> 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado
> op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
>
> There's some release-related stuff that needs to be sorted out 
> (namely, the new CHANGES.txt and release note generation from Yetus), 
> but I'd tentatively like to roll the first alpha a month out, so third 
> week of March.
>
> Best,
> Andrew
>
> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rst...@altiscale.com> wrote:
>
> > Avoiding the use of JDK8 language features (and, presumably, APIs) 
> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK 
> > source version to JDK8.
> >
> > Also, note that releasing from trunk is a way of achieving #3, it's 
> > not a way of abandoning it.
> >
> >
> >
> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang 
> > <andrew.w...@cloudera.com>
> > wrote:
> > > Hi Raymie,
> > >
> > > Konst proposed just releasing off of trunk rather than cutting a
> > branch-2,
> > > and there was general agreement there. So, consider #3 abandoned. 
> > > 1&2
> can
> > > be achieved at the same time, we just need to avoid using JDK8 
> > > language features in trunk so things can be backported.
> > >
> > > Best,
> > > Andrew
> > >
> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata 
> > > <rst...@altiscale.com>
> > wrote:
> > >
> > >> In this (and the related threads), I see the following three
> > requirements:
> > >>
> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > >>
> > >> 2. "We'll still be releasing 2.x releases for a while, with 
> > >> similar feature sets as 3.x."
> > >>
> > >> 3. Avoid the "risk of split-brain behavior" by "minimize 
> > >> backporting headaches. Pulling trunk > branch-2 > branch-2.x is already 
> > >> tedious.
> > >> Adding a branch-3, branch-3.x would be obnoxious."
> > >>
> > >> These three cannot be achieved at the same time.  Which do we abandon?
> > >>
> > >>
> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia 
> > >> <sanjayo...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
> wrote:
> > >> >>
> > >> >> 2) Simplification of configs - potentially separating client 
> > >> >> side
> > >> configs
> > >> >> and those used by daemons. This is another source of perpetual
> > confusion
> > >> >> for users.
> > >> > + 1 on this.
> > >> >
> > >> > sanjay
> > >>
> >
>
>
>


Re: Looking to a Hadoop 3 release

2016-02-18 Thread Andrew Wang
Hi Kihwal,

I think there's still value in continuing the 2.x releases. 3.x comes with
the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
be beta or GA for some number of months. In the meanwhile, it'd be good to
keep putting out regular, stable 2.x releases.

Best,
Andrew


On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee <kih...@yahoo-inc.com.invalid>
wrote:

> Moving Hadoop 3 forward sounds fine. If EC is one of the main motivations,
> are we getting rid of branch-2.8?
>
> Kihwal
>
>   From: Andrew Wang <andrew.w...@cloudera.com>
>  To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>
> Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; "
> mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>;
> hdfs-dev <hdfs-...@hadoop.apache.org>
>  Sent: Thursday, February 18, 2016 4:35 PM
>  Subject: Re: Looking to a Hadoop 3 release
>
> Hi all,
>
> Reviving this thread. I've seen renewed interest in a trunk release since
> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
> shell script rewrite, and many other improvements, I think it's time to
> revisit Hadoop 3.0 release plans.
>
> My overall plan is still the same as in my original email: a series of
> regular alpha releases leading up to beta and GA. Alpha releases make it
> easier for downstreams to integrate with our code, and making them regular
> means features can be included when they are ready.
>
> I know there are some incompatible changes waiting in the wings
> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
> HADOOP-9991 bumping dependency versions) that would be good to get in. If
> you have changes like this, please set the target version to 3.0.0 and mark
> them "Incompatible". We can use this JIRA query to track:
>
>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
>
> There's some release-related stuff that needs to be sorted out (namely, the
> new CHANGES.txt and release note generation from Yetus), but I'd
> tentatively like to roll the first alpha a month out, so third week of
> March.
>
> Best,
> Andrew
>
> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rst...@altiscale.com> wrote:
>
> > Avoiding the use of JDK8 language features (and, presumably, APIs)
> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> > source version to JDK8.
> >
> > Also, note that releasing from trunk is a way of achieving #3, it's
> > not a way of abandoning it.
> >
> >
> >
> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <andrew.w...@cloudera.com>
> > wrote:
> > > Hi Raymie,
> > >
> > > Konst proposed just releasing off of trunk rather than cutting a
> > branch-2,
> > > and there was general agreement there. So, consider #3 abandoned. 1&2
> can
> > > be achieved at the same time, we just need to avoid using JDK8 language
> > > features in trunk so things can be backported.
> > >
> > > Best,
> > > Andrew
> > >
> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rst...@altiscale.com>
> > wrote:
> > >
> > >> In this (and the related threads), I see the following three
> > requirements:
> > >>
> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > >>
> > >> 2. "We'll still be releasing 2.x releases for a while, with similar
> > >> feature sets as 3.x."
> > >>
> > >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> > >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> > >> Adding a branch-3, branch-3.x would be obnoxious."
> > >>
> > >> These three cannot be achieved at the same time.  Which do we abandon?
> > >>
> > >>
> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sanjayo...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org>
> wrote:
> > >> >>
> > >> >> 2) Simplification of configs - potentially separating client side
> > >> configs
> > >> >> and those used by daemons. This is another source of perpetual
> > confusion
> > >> >> for users.
> > >> > + 1 on this.
> > >> >
> > >> > sanjay
> > >>
> >
>
>
>


Re: Looking to a Hadoop 3 release

2016-02-18 Thread Kihwal Lee
Moving Hadoop 3 forward sounds fine. If EC is one of the main motivations, are 
we getting rid of branch-2.8? 

Kihwal

  From: Andrew Wang <andrew.w...@cloudera.com>
 To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org> 
Cc: "yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>; 
"mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>; hdfs-dev 
<hdfs-...@hadoop.apache.org>
 Sent: Thursday, February 18, 2016 4:35 PM
 Subject: Re: Looking to a Hadoop 3 release
   
Hi all,

Reviving this thread. I've seen renewed interest in a trunk release since
HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
shell script rewrite, and many other improvements, I think it's time to
revisit Hadoop 3.0 release plans.

My overall plan is still the same as in my original email: a series of
regular alpha releases leading up to beta and GA. Alpha releases make it
easier for downstreams to integrate with our code, and making them regular
means features can be included when they are ready.

I know there are some incompatible changes waiting in the wings
(i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
HADOOP-9991 bumping dependency versions) that would be good to get in. If
you have changes like this, please set the target version to 3.0.0 and mark
them "Incompatible". We can use this JIRA query to track:

https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority

There's some release-related stuff that needs to be sorted out (namely, the
new CHANGES.txt and release note generation from Yetus), but I'd
tentatively like to roll the first alpha a month out, so third week of
March.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata <rst...@altiscale.com> wrote:

> Avoiding the use of JDK8 language features (and, presumably, APIs)
> means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> source version to JDK8.
>
> Also, note that releasing from trunk is a way of achieving #3, it's
> not a way of abandoning it.
>
>
>
> On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
> > Hi Raymie,
> >
> > Konst proposed just releasing off of trunk rather than cutting a
> branch-2,
> > and there was general agreement there. So, consider #3 abandoned. 1&2 can
> > be achieved at the same time, we just need to avoid using JDK8 language
> > features in trunk so things can be backported.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata <rst...@altiscale.com>
> wrote:
> >
> >> In this (and the related threads), I see the following three
> requirements:
> >>
> >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> >>
> >> 2. "We'll still be releasing 2.x releases for a while, with similar
> >> feature sets as 3.x."
> >>
> >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> >> Adding a branch-3, branch-3.x would be obnoxious."
> >>
> >> These three cannot be achieved at the same time.  Which do we abandon?
> >>
> >>
> >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia <sanjayo...@gmail.com>
> >> wrote:
> >> >
> >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:
> >> >>
> >> >> 2) Simplification of configs - potentially separating client side
> >> configs
> >> >> and those used by daemons. This is another source of perpetual
> confusion
> >> >> for users.
> >> > + 1 on this.
> >> >
> >> > sanjay
> >>
>

  

Re: Looking to a Hadoop 3 release

2016-02-18 Thread Andrew Wang
Hi all,

Reviving this thread. I've seen renewed interest in a trunk release since
HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
shell script rewrite, and many other improvements, I think it's time to
revisit Hadoop 3.0 release plans.

My overall plan is still the same as in my original email: a series of
regular alpha releases leading up to beta and GA. Alpha releases make it
easier for downstreams to integrate with our code, and making them regular
means features can be included when they are ready.

I know there are some incompatible changes waiting in the wings
(i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
HADOOP-9991 bumping dependency versions) that would be good to get in. If
you have changes like this, please set the target version to 3.0.0 and mark
them "Incompatible". We can use this JIRA query to track:

https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority

There's some release-related stuff that needs to be sorted out (namely, the
new CHANGES.txt and release note generation from Yetus), but I'd
tentatively like to roll the first alpha a month out, so third week of
March.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata  wrote:

> Avoiding the use of JDK8 language features (and, presumably, APIs)
> means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> source version to JDK8.
>
> Also, note that releasing from trunk is a way of achieving #3, it's
> not a way of abandoning it.
>
>
>
> On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang 
> wrote:
> > Hi Raymie,
> >
> > Konst proposed just releasing off of trunk rather than cutting a
> branch-2,
> > and there was general agreement there. So, consider #3 abandoned. 1&2 can
> > be achieved at the same time, we just need to avoid using JDK8 language
> > features in trunk so things can be backported.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata 
> wrote:
> >
> >> In this (and the related threads), I see the following three
> requirements:
> >>
> >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> >>
> >> 2. "We'll still be releasing 2.x releases for a while, with similar
> >> feature sets as 3.x."
> >>
> >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> >> Adding a branch-3, branch-3.x would be obnoxious."
> >>
> >> These three cannot be achieved at the same time.  Which do we abandon?
> >>
> >>
> >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia 
> >> wrote:
> >> >
> >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth  wrote:
> >> >>
> >> >> 2) Simplification of configs - potentially separating client side
> >> configs
> >> >> and those used by daemons. This is another source of perpetual
> confusion
> >> >> for users.
> >> > + 1 on this.
> >> >
> >> > sanjay
> >>
>


Re: Looking to a Hadoop 3 release

2015-03-09 Thread Vinod Kumar Vavilapalli

On Mar 6, 2015, at 5:20 PM, Chris Douglas cdoug...@apache.org wrote:

 On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
 vino...@hortonworks.com wrote:
 I'd encourage everyone to post their wish list on the Roadmap wiki that 
 *warrants* making incompatible changes forcing us to go 3.x.
 
 This is a useful exercise, but not a prerequisite to releasing 3.0.0
 as an alpha off of trunk, right? Andrew summarized the operating
 assumptions for anyone working on it: rolling upgrades still work,
 wire compat is preserved, breaking changes may get rolled back when
 branch-3 is in beta (so be very conservative, notify others loudly).
 This applies to branches merged to trunk, also.


Not a prerequisite for alpha releases, yes. But it will be for a 'GA' release, 
because after that we will be back to restricting incompatible changes on 3.x 
line and we have to say no to features that need API breakage after that. If 
others feel there are features that warrant incompatibility, we should hear 
about them for inclusion in such a 3.x release. Till now, the operating 
assumption was to not break anything as much as possible. If we are opening the 
window on incompatibilities in 3.x, might as well get everyone to think about 
stuff that they want.



 +1 to Jason's comments on general. We can keep rolling alphas that 
 downstream can pick up, but I'd also like us to clarify the exit criterion 
 for a GA release of 3.0 and its relation to the life of 2.x if we are going 
 this route. This brings us back to the roadmap discussion, and a collective 
 agreement about a logical step at a future point in time where we say we 
 have enough incompatible features in 3.x that we can stop putting more of 
 them and start stabilizing it.
 
 We'll have this discussion again. We don't need to reach consensus on
 the roadmap, just that each artifact reflects the output of the
 project.


Agreed. I wasn't requesting us to reach a consensus on the roadmap. Just 
requesting others to put their wish list up.



 Irrespective of that, here is my proposal in the interim:
 - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for 
 atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking 
 up the gauntlet on 3.0.
 - Continue working on the classpath isolation effort and try making it as 
 compatible as is possible for users to opt in and migrate easily.
 
 +1 for 2.x, but again I don't understand the sequencing. -C

There isn't. I was saying Irrespective of that..

Thanks,
+Vinod


Re: Looking to a Hadoop 3 release

2015-03-09 Thread sanjay Radia

 On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote:
 
 2) Simplification of configs - potentially separating client side configs
 and those used by daemons. This is another source of perpetual confusion
 for users.
+ 1 on this.

sanjay

Re: Looking to a Hadoop 3 release

2015-03-09 Thread Raymie Stata
Avoiding the use of JDK8 language features (and, presumably, APIs)
means you've abandoned #1, i.e., you haven't (really) bumped the JDK
source version to JDK8.

Also, note that releasing from trunk is a way of achieving #3, it's
not a way of abandoning it.



On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang andrew.w...@cloudera.com wrote:
 Hi Raymie,

 Konst proposed just releasing off of trunk rather than cutting a branch-2,
 and there was general agreement there. So, consider #3 abandoned. 12 can
 be achieved at the same time, we just need to avoid using JDK8 language
 features in trunk so things can be backported.

 Best,
 Andrew

 On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata rst...@altiscale.com wrote:

 In this (and the related threads), I see the following three requirements:

 1. Bump the source JDK version to JDK8 (ie, drop JDK7 support).

 2. We'll still be releasing 2.x releases for a while, with similar
 feature sets as 3.x.

 3. Avoid the risk of split-brain behavior by minimize backporting
 headaches. Pulling trunk  branch-2  branch-2.x is already tedious.
 Adding a branch-3, branch-3.x would be obnoxious.

 These three cannot be achieved at the same time.  Which do we abandon?


 On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia sanjayo...@gmail.com
 wrote:
 
  On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote:
 
  2) Simplification of configs - potentially separating client side
 configs
  and those used by daemons. This is another source of perpetual confusion
  for users.
  + 1 on this.
 
  sanjay



Re: Looking to a Hadoop 3 release

2015-03-09 Thread Raymie Stata
In this (and the related threads), I see the following three requirements:

1. Bump the source JDK version to JDK8 (ie, drop JDK7 support).

2. We'll still be releasing 2.x releases for a while, with similar
feature sets as 3.x.

3. Avoid the risk of split-brain behavior by minimize backporting
headaches. Pulling trunk  branch-2  branch-2.x is already tedious.
Adding a branch-3, branch-3.x would be obnoxious.

These three cannot be achieved at the same time.  Which do we abandon?


On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia sanjayo...@gmail.com wrote:

 On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote:

 2) Simplification of configs - potentially separating client side configs
 and those used by daemons. This is another source of perpetual confusion
 for users.
 + 1 on this.

 sanjay


Re: Looking to a Hadoop 3 release

2015-03-09 Thread Andrew Wang
Hi Raymie,

Konst proposed just releasing off of trunk rather than cutting a branch-2,
and there was general agreement there. So, consider #3 abandoned. 12 can
be achieved at the same time, we just need to avoid using JDK8 language
features in trunk so things can be backported.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata rst...@altiscale.com wrote:

 In this (and the related threads), I see the following three requirements:

 1. Bump the source JDK version to JDK8 (ie, drop JDK7 support).

 2. We'll still be releasing 2.x releases for a while, with similar
 feature sets as 3.x.

 3. Avoid the risk of split-brain behavior by minimize backporting
 headaches. Pulling trunk  branch-2  branch-2.x is already tedious.
 Adding a branch-3, branch-3.x would be obnoxious.

 These three cannot be achieved at the same time.  Which do we abandon?


 On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia sanjayo...@gmail.com
 wrote:
 
  On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote:
 
  2) Simplification of configs - potentially separating client side
 configs
  and those used by daemons. This is another source of perpetual confusion
  for users.
  + 1 on this.
 
  sanjay



Re: Looking to a Hadoop 3 release

2015-03-07 Thread Eric Yang
 been mentioned as part of this.  Any
 feature that breaks wire compatibility better be absolutely amazing, as it
 creates a huge hurdle for people to jump.
  To summarize:+1 for a community-discussed roadmap of what we're
 breaking in Hadoop 3 and why it's worth it for users
  -1 for creating branch-3 now, we can release from trunk until the next
 incompatibility for Hadoop 4 arrives
  +1 for baking classpath isolation as opt-in on 2.x and eventually
 default on in 3.0
  Jason
   From: Andrew Wang andrew.w...@cloudera.com
  To: hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org
  Cc: common-dev@hadoop.apache.org common-dev@hadoop.apache.org; 
 mapreduce-...@hadoop.apache.org mapreduce-...@hadoop.apache.org; 
 yarn-...@hadoop.apache.org yarn-...@hadoop.apache.org
  Sent: Wednesday, March 4, 2015 12:15 PM
  Subject: Re: Looking to a Hadoop 3 release
 
  Let's not dismiss this quite so handily.
 
  Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while
 we
  could make classpath isolation opt-in via configuration, what we really
  want longer term is to have it on by default (or just always on). Stack
 in
  particular points out the practical difficulties in using an opt-in
 method
  in 2.x from a downstream project perspective. It's not pretty.
 
  The plan that both Sean and Jason propose (which I support) is to have
 an
  opt-in solution in 2.x, bake it there, then turn it on by default
  (incompatible) in a new major release. I think this lines up well with
 my
  proposal of some alphas and betas leading up to a GA 3.x. I'm also
 willing
  to help with 2.x release management if that would help with testing this
  feature.
 
  Even setting aside classpath isolation, a new major release is still
  justified by JDK8. Somehow this is being ignored in the discussion.
 Allen,
  historically the voice of the user in our community, just highlighted
 it as
  a major compatibility issue, and myself and Tucu have also expressed our
  very strong concerns about bumping this in a minor release. 2.7's bump
 is a
  unique exception, but this is not something to be cited as precedent or
  policy.
 
  Where does this resistance to a new major release stem from? As I've
  described from the beginning, this will look basically like a 2.x
 release,
  except for the inclusion of classpath isolation by default and target
  version JDK8. I've expressed my desire to maintain API and wire
  compatibility, and we can audit the set of incompatible changes in
 trunk to
  ensure this. My proposal for doing alpha and beta releases leading up
 to GA
  also gives downstreams a nice amount of time for testing and validation.
 
  Regards,
  Andrew
 
 
 
  On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy a...@hortonworks.com
 wrote:
 
  Awesome, looks like we can just do this in a compatible manner -
 nothing
  else on the list seems like it warrants a (premature) major release.
 
  Thanks Vinod.
 
  Arun
 
  
  From: Vinod Kumar Vavilapalli vino...@hortonworks.com
  Sent: Tuesday, March 03, 2015 2:30 PM
  To: common-dev@hadoop.apache.org
  Cc: hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
  yarn-...@hadoop.apache.org
  Subject: Re: Looking to a Hadoop 3 release
 
  I started pitching in more on that JIRA.
 
  To add, I think we can and should strive for doing this in a compatible
  manner, whatever the approach. Marking and calling it incompatible
 before
  we see proposal/patch seems premature to me. Commented the same on
 JIRA:
 
 https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
  .
 
  Thanks
  +Vinod
 
  On Mar 2, 2015, at 8:08 PM, Andrew Wang andrew.w...@cloudera.com
 mailto:
  andrew.w...@cloudera.com wrote:
 
  Regarding classpath isolation, based on what I hear from our customers,
  it's still a big problem (even after the MR classloader work). The
 latest
  Jackson version bump was quite painful for our downstream projects,
 and the
  HDFS client still leaks a lot of dependencies. Would welcome more
  discussion of this on HADOOP-11656, Steve, Colin, and Haohui have
 already
  chimed in.
 
 
 
 
 



Re: Looking to a Hadoop 3 release

2015-03-06 Thread Vinod Kumar Vavilapalli
I'd encourage everyone to post their wish list on the Roadmap wiki that 
*warrants* making incompatible changes forcing us to go 3.x.

+1 to Jason's comments on general. We can keep rolling alphas that downstream 
can pick up, but I'd also like us to clarify the exit criterion for a GA 
release of 3.0 and its relation to the life of 2.x if we are going this route. 
This brings us back to the roadmap discussion, and a collective agreement about 
a logical step at a future point in time where we say we have enough 
incompatible features in 3.x that we can stop putting more of them and start 
stabilizing it.

Irrespective of that, here is my proposal in the interim:
 - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for 
atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up 
the gauntlet on 3.0.
 - Continue working on the classpath isolation effort and try making it as 
compatible as is possible for users to opt in and migrate easily.

Thanks,
+Vinod

On Mar 5, 2015, at 1:44 PM, Jason Lowe jl...@yahoo-inc.com.INVALID wrote:

 I'm OK with a 3.0.0 release as long as we are minimizing the pain of 
 maintaining yet another release line and conscious of the incompatibilities 
 going into that release line.
 For the former, I would really rather not see a branch-3 cut so soon.  It's 
 yet another line onto which to cherry-pick, and I don't see why we need to 
 add this overhead at such an early phase.  We should only create branch-3 
 when there's an incompatible change that the community wants and it should 
 _not_ go into the next major release (i.e.: it's for Hadoop 4.0).  We can 
 develop 3.0 alphas and betas on trunk and release from trunk in the interim.  
 IMHO we need to stop treating trunk as a place to exile patches.
 
 For the latter, I think as a community we need to evaluate the benefits of 
 breaking compatibility against the costs of migrating.  Each time we break 
 compatibility we create a hurdle for people to jump when they move to the new 
 release, and we should make those hurdles worth their time.  For example, 
 wire-compatibility has been mentioned as part of this.  Any feature that 
 breaks wire compatibility better be absolutely amazing, as it creates a huge 
 hurdle for people to jump.
 To summarize:+1 for a community-discussed roadmap of what we're breaking in 
 Hadoop 3 and why it's worth it for users
 -1 for creating branch-3 now, we can release from trunk until the next 
 incompatibility for Hadoop 4 arrives
 +1 for baking classpath isolation as opt-in on 2.x and eventually default on 
 in 3.0
 Jason
  From: Andrew Wang andrew.w...@cloudera.com
 To: hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org 
 Cc: common-dev@hadoop.apache.org common-dev@hadoop.apache.org; 
 mapreduce-...@hadoop.apache.org mapreduce-...@hadoop.apache.org; 
 yarn-...@hadoop.apache.org yarn-...@hadoop.apache.org 
 Sent: Wednesday, March 4, 2015 12:15 PM
 Subject: Re: Looking to a Hadoop 3 release
 
 Let's not dismiss this quite so handily.
 
 Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
 could make classpath isolation opt-in via configuration, what we really
 want longer term is to have it on by default (or just always on). Stack in
 particular points out the practical difficulties in using an opt-in method
 in 2.x from a downstream project perspective. It's not pretty.
 
 The plan that both Sean and Jason propose (which I support) is to have an
 opt-in solution in 2.x, bake it there, then turn it on by default
 (incompatible) in a new major release. I think this lines up well with my
 proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
 to help with 2.x release management if that would help with testing this
 feature.
 
 Even setting aside classpath isolation, a new major release is still
 justified by JDK8. Somehow this is being ignored in the discussion. Allen,
 historically the voice of the user in our community, just highlighted it as
 a major compatibility issue, and myself and Tucu have also expressed our
 very strong concerns about bumping this in a minor release. 2.7's bump is a
 unique exception, but this is not something to be cited as precedent or
 policy.
 
 Where does this resistance to a new major release stem from? As I've
 described from the beginning, this will look basically like a 2.x release,
 except for the inclusion of classpath isolation by default and target
 version JDK8. I've expressed my desire to maintain API and wire
 compatibility, and we can audit the set of incompatible changes in trunk to
 ensure this. My proposal for doing alpha and beta releases leading up to GA
 also gives downstreams a nice amount of time for testing and validation.
 
 Regards,
 Andrew
 
 
 
 On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy a...@hortonworks.com wrote:
 
 Awesome, looks like we can just do this in a compatible manner - nothing
 else on the list seems like it warrants a (premature) major release

Re: Looking to a Hadoop 3 release

2015-03-06 Thread Chris Douglas
On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
vino...@hortonworks.com wrote:
 I'd encourage everyone to post their wish list on the Roadmap wiki that 
 *warrants* making incompatible changes forcing us to go 3.x.

This is a useful exercise, but not a prerequisite to releasing 3.0.0
as an alpha off of trunk, right? Andrew summarized the operating
assumptions for anyone working on it: rolling upgrades still work,
wire compat is preserved, breaking changes may get rolled back when
branch-3 is in beta (so be very conservative, notify others loudly).
This applies to branches merged to trunk, also.

 +1 to Jason's comments on general. We can keep rolling alphas that downstream 
 can pick up, but I'd also like us to clarify the exit criterion for a GA 
 release of 3.0 and its relation to the life of 2.x if we are going this 
 route. This brings us back to the roadmap discussion, and a collective 
 agreement about a logical step at a future point in time where we say we have 
 enough incompatible features in 3.x that we can stop putting more of them and 
 start stabilizing it.

We'll have this discussion again. We don't need to reach consensus on
the roadmap, just that each artifact reflects the output of the
project.

 Irrespective of that, here is my proposal in the interim:
  - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for 
 atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking 
 up the gauntlet on 3.0.
  - Continue working on the classpath isolation effort and try making it as 
 compatible as is possible for users to opt in and migrate easily.

+1 for 2.x, but again I don't understand the sequencing. -C

 On Mar 5, 2015, at 1:44 PM, Jason Lowe jl...@yahoo-inc.com.INVALID wrote:

 I'm OK with a 3.0.0 release as long as we are minimizing the pain of 
 maintaining yet another release line and conscious of the incompatibilities 
 going into that release line.
 For the former, I would really rather not see a branch-3 cut so soon.  It's 
 yet another line onto which to cherry-pick, and I don't see why we need to 
 add this overhead at such an early phase.  We should only create branch-3 
 when there's an incompatible change that the community wants and it should 
 _not_ go into the next major release (i.e.: it's for Hadoop 4.0).  We can 
 develop 3.0 alphas and betas on trunk and release from trunk in the interim. 
  IMHO we need to stop treating trunk as a place to exile patches.

 For the latter, I think as a community we need to evaluate the benefits of 
 breaking compatibility against the costs of migrating.  Each time we break 
 compatibility we create a hurdle for people to jump when they move to the 
 new release, and we should make those hurdles worth their time.  For 
 example, wire-compatibility has been mentioned as part of this.  Any feature 
 that breaks wire compatibility better be absolutely amazing, as it creates a 
 huge hurdle for people to jump.
 To summarize:+1 for a community-discussed roadmap of what we're breaking in 
 Hadoop 3 and why it's worth it for users
 -1 for creating branch-3 now, we can release from trunk until the next 
 incompatibility for Hadoop 4 arrives
 +1 for baking classpath isolation as opt-in on 2.x and eventually default on 
 in 3.0
 Jason
  From: Andrew Wang andrew.w...@cloudera.com
 To: hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org
 Cc: common-dev@hadoop.apache.org common-dev@hadoop.apache.org; 
 mapreduce-...@hadoop.apache.org mapreduce-...@hadoop.apache.org; 
 yarn-...@hadoop.apache.org yarn-...@hadoop.apache.org
 Sent: Wednesday, March 4, 2015 12:15 PM
 Subject: Re: Looking to a Hadoop 3 release

 Let's not dismiss this quite so handily.

 Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
 could make classpath isolation opt-in via configuration, what we really
 want longer term is to have it on by default (or just always on). Stack in
 particular points out the practical difficulties in using an opt-in method
 in 2.x from a downstream project perspective. It's not pretty.

 The plan that both Sean and Jason propose (which I support) is to have an
 opt-in solution in 2.x, bake it there, then turn it on by default
 (incompatible) in a new major release. I think this lines up well with my
 proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
 to help with 2.x release management if that would help with testing this
 feature.

 Even setting aside classpath isolation, a new major release is still
 justified by JDK8. Somehow this is being ignored in the discussion. Allen,
 historically the voice of the user in our community, just highlighted it as
 a major compatibility issue, and myself and Tucu have also expressed our
 very strong concerns about bumping this in a minor release. 2.7's bump is a
 unique exception, but this is not something to be cited as precedent or
 policy.

 Where does this resistance to a new major release stem from? As I've
 described

Re: Looking to a Hadoop 3 release

2015-03-06 Thread Vinod Kumar Vavilapalli
Yes, these are the kind of enhancements that need to be proposed and discussed 
for inclusion!

Thanks,
+Vinod

On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote:


 Some features that come to mind immediately would be
 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
 two way communication. There's a lot of places where we re-use heartbeats
 to send more information than what would be done if the PRC layer supported
 these features. Some of this can be done in a compatible manner to the
 existing RPC sub-system. Others like 2 way communication probably cannot.
 After this, having HDFS/YARN actually make use of these changes. The other
 consideration is adoption of an alternate system ike gRpc which would be
 incompatible.
 2) Simplification of configs - potentially separating client side configs
 and those used by daemons. This is another source of perpetual confusion
 for users.
 
 Thanks
 - Sid
 
 
 On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran ste...@hortonworks.com
 wrote:
 
 Sorry, outlook dequoted Alejandros's comments.
 
 Let me try again with his comments in italic and proofreading of mine
 
 On 05/03/2015 13:59, Steve Loughran ste...@hortonworks.commailto:
 ste...@hortonworks.com wrote:
 
 
 
 On 05/03/2015 13:05, Alejandro Abdelnur tuc...@gmail.commailto:
 tuc...@gmail.commailto:tuc...@gmail.com wrote:
 
 IMO, if part of the community wants to take on the responsibility and work
 that takes to do a new major release, we should not discourage them from
 doing that.
 
 Having multiple major branches active is a standard practice.
 
 Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
 long time to get out, and during that time 0.21, 0.22, got released and
 ignored; 0.23 picked up and used in production.
 
 The 2.04-alpha release was more of a troublespot as it got picked up
 widely enough to be used in products, and changes were made between that
 alpha  2.2 itself which raised compatibility issues.
 
 For 3.x I'd propose
 
 
  1.  Have less longevity of 3.x alpha/beta artifacts
  2.  Make clear there are no guarantees of compatibility from alpha/beta
 releases to shipping. Best effort, but not to the extent that it gets in
 the way. More succinctly: we will care more about seamless migration from
 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
 accept policy (2). Hadoop's instability guarantee for the 3.x alpha/beta
 phase
 
 As well as backwards compatibility, we need to think about Forwards
 compatibility, with the goal being:
 
 Any app written/shipped with the 3.x release binaries (JAR and native)
 will work in and against a 3.y Hadoop cluster, for all x, y in Natural
 where y=x  and is-release(x) and is-release(y)
 
 That's important, as it means all server-side changes in 3.x which are
 expected to to mandate client-side updates: protocols, HDFS erasure
 decoding, security features, must be considered complete and stable before
 we can say is-release(x). In an ideal world, we'll even get the semantics
 right with tests to show this.
 
 Fixing classpath hell downstream is certainly one feature I am +1 on. But:
 it's only one of the features, and given there's not any design doc on that
 JIRA, way too immature to set a release schedule on. An alpha schedule with
 no-guarantees and a regular alpha roll, could be viable, as new features go
 in and can then be used to experimentally try this stuff in branches of
 Hbase (well volunteered, Stack!), etc. Of course instability guarantees
 will be transitive downstream.
 
 
 This time around we are not replacing the guts as we did from Hadoop 1 to
 Hadoop 2, but superficial surgery to address issues were not considered (or
 was too much to take on top of the guts transplant).
 
 For the split brain concern, we did a great of job maintaining Hadoop 1 and
 Hadoop 2 until Hadoop 1 faded away.
 
 And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
 compatibility.
 
 
 Based on that experience I would say that the coexistence of Hadoop 2 and
 Hadoop 3 will be much less demanding/traumatic.
 
 The re-layout of all the source trees was a major change there, assuming
 there's no refactoring or switch of build tools then picking things back
 will be tractable
 
 
 Also, to facilitate the coexistence we should limit Java language features
 to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
 we can remove this limitation.
 
 +1; setting javac.version will fix this
 
 What is nice about having java 8 as the base JVM is that it means you can
 be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
 and libs can use all Java 8 features they want to.
 
 There's one policy change to consider there which is possibly, just
 possibly, we could allow new modules in hadoop-tools to adopt Java 8
 languages early, provided everyone recognised that backport to branch-2
 isn't going to 

Re: Looking to a Hadoop 3 release

2015-03-06 Thread Allen Wittenauer

Right, but that doesn't really answer the question….

On Mar 5, 2015, at 10:23 PM, Alejandro Abdelnur tuc...@gmail.com wrote:

 If classloader isolation is in place, then dependency versions can freely
 be upgraded as won't pollute apps space (things get trickier if there is an
 ON/OFF switch).
 
 On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer a...@altiscale.com wrote:
 
 
 Is there going to be a general upgrade of dependencies?  I'm thinking of
 jetty  jackson in particular.
 
 On Mar 5, 2015, at 5:24 PM, Andrew Wang andrew.w...@cloudera.com wrote:
 
 I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
 page. In addition to the two things I've been pushing, I also looked
 through Allen's list (thanks Allen for making this) and picked out the
 shell script rewrite and the removal of HFTP as big changes. This would
 be
 the place to propose features for inclusion in 3.x, I'd particularly
 appreciate help on the YARN/MR side.
 
 Based on what I'm hearing, let me modulate my proposal to the following:
 
 - We avoid cutting branch-3, and release off of trunk. The trunk-only
 changes don't look that scary, so I think this is fine. This does mean we
 need to be more rigorous before merging branches to trunk. I think
 Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
 would
 be very helpful in this regard.
 - We do not include anything to break wire compatibility unless (as Jason
 says) it's an unbelievably awesome feature.
 - No harm in rolling alphas from trunk, as it doesn't lock us to anything
 compatibility wise. Downstreams like releases.
 
 I'll take Steve's advice about not locking GA to a given date, but I also
 share his belief that we can alpha/beta/GA faster than it took for Hadoop
 2. Let's roll some intermediate releases, work on the roadmap items, and
 see how we're feeling in a few months.
 
 Best,
 Andrew
 
 On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth ss...@apache.org wrote:
 
 I think it'll be useful to have a discussion about what else people
 would
 like to see in Hadoop 3.x - especially if the change is potentially
 incompatible. Also, what we expect the release schedule to be for major
 releases and what triggers them - JVM version, major features, the need
 for
 incompatible changes ? Assuming major versions will not be released
 every 6
 months/1 year (adoption time, fairly disruptive for downstream projects,
 and users) -  considering additional features/incompatible changes for
 3.x
 would be useful.
 
 Some features that come to mind immediately would be
 1) enhancements to the RPC mechanics - specifically support for AsynRPC
 /
 two way communication. There's a lot of places where we re-use
 heartbeats
 to send more information than what would be done if the PRC layer
 supported
 these features. Some of this can be done in a compatible manner to the
 existing RPC sub-system. Others like 2 way communication probably
 cannot.
 After this, having HDFS/YARN actually make use of these changes. The
 other
 consideration is adoption of an alternate system ike gRpc which would be
 incompatible.
 2) Simplification of configs - potentially separating client side
 configs
 and those used by daemons. This is another source of perpetual confusion
 for users.
 
 Thanks
 - Sid
 
 
 On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran ste...@hortonworks.com
 wrote:
 
 Sorry, outlook dequoted Alejandros's comments.
 
 Let me try again with his comments in italic and proofreading of mine
 
 On 05/03/2015 13:59, Steve Loughran ste...@hortonworks.commailto:
 ste...@hortonworks.com wrote:
 
 
 
 On 05/03/2015 13:05, Alejandro Abdelnur tuc...@gmail.commailto:
 tuc...@gmail.commailto:tuc...@gmail.com wrote:
 
 IMO, if part of the community wants to take on the responsibility and
 work
 that takes to do a new major release, we should not discourage them
 from
 doing that.
 
 Having multiple major branches active is a standard practice.
 
 Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
 long time to get out, and during that time 0.21, 0.22, got released and
 ignored; 0.23 picked up and used in production.
 
 The 2.04-alpha release was more of a troublespot as it got picked up
 widely enough to be used in products, and changes were made between
 that
 alpha  2.2 itself which raised compatibility issues.
 
 For 3.x I'd propose
 
 
 1.  Have less longevity of 3.x alpha/beta artifacts
 2.  Make clear there are no guarantees of compatibility from
 alpha/beta
 releases to shipping. Best effort, but not to the extent that it gets
 in
 the way. More succinctly: we will care more about seamless migration
 from
 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
 3.  Anybody who ships code based on 3.x alpha/beta to recognise and
 accept policy (2). Hadoop's instability guarantee for the 3.x
 alpha/beta
 phase
 
 As well as backwards compatibility, we need to think about Forwards
 compatibility, with the goal being:
 
 Any app written/shipped with the 

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Alejandro Abdelnur
IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

This time around we are not replacing the guts as we did from Hadoop 1 to
Hadoop 2, but superficial surgery to address issues were not considered (or
was too much to take on top of the guts transplant).

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

Thanks.


On Thu, Mar 5, 2015 at 11:40 AM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 The 'resistance' is not so much about  a new major release, more so about
 the content and the roadmap of the release. Other than the two specific
 features raised (the need for breaking compat for them is something that I
 am debating), I haven't seen a roadmap of branch-3 about any more features
 that this community needs to discuss about. If all the difference between
 branch-2 and branch-3 is going to be JDK + a couple of incompat changes, it
 is a big problem in two dimensions (1) it's a burden keeping the branches
 in sync and avoiding the split-brain we experienced with 1.x, 2.x or worse
 branch-0.23, branch-2 and (2) very hard to ask people to not break more
 things in branch-3.

 We seem to have agreed upon a course of action for JDK7. And now we are
 taking a different direction for JDK8. Going by this new proposal, come
 2016, we will have to deal with JDK9 and 3 mainline incompatible hadoop
 releases.

 Regarding, individual improvements like classpath isolation, shell script
 stuff, Jason Lowe captured it perfectly on HADOOP-11656 - it should be
 possible for every major feature that we develop to be a opt in, unless the
 change is so great and users can balance out the incompatibilities for the
 new stuff they are getting. Even with an ground breaking change like with
 YARN, we spent a bit of time to ensure compatibility (MAPREDUCE-5108) that
 has paid so many times over in return. Breaking compatibility shouldn't
 come across as too cheap a thing.

 Thanks,
 +Vinod

 On Mar 4, 2015, at 10:15 AM, Andrew Wang andrew.w...@cloudera.commailto:
 andrew.w...@cloudera.com wrote:

 Where does this resistance to a new major release stem from? As I've
 described from the beginning, this will look basically like a 2.x release,
 except for the inclusion of classpath isolation by default and target
 version JDK8. I've expressed my desire to maintain API and wire
 compatibility, and we can audit the set of incompatible changes in trunk to
 ensure this. My proposal for doing alpha and beta releases leading up to GA
 also gives downstreams a nice amount of time for testing and validation.




Re: Looking to a Hadoop 3 release

2015-03-05 Thread Vinod Kumar Vavilapalli
The 'resistance' is not so much about  a new major release, more so about the 
content and the roadmap of the release. Other than the two specific features 
raised (the need for breaking compat for them is something that I am debating), 
I haven't seen a roadmap of branch-3 about any more features that this 
community needs to discuss about. If all the difference between branch-2 and 
branch-3 is going to be JDK + a couple of incompat changes, it is a big problem 
in two dimensions (1) it's a burden keeping the branches in sync and avoiding 
the split-brain we experienced with 1.x, 2.x or worse branch-0.23, branch-2 and 
(2) very hard to ask people to not break more things in branch-3.

We seem to have agreed upon a course of action for JDK7. And now we are taking 
a different direction for JDK8. Going by this new proposal, come 2016, we will 
have to deal with JDK9 and 3 mainline incompatible hadoop releases.

Regarding, individual improvements like classpath isolation, shell script 
stuff, Jason Lowe captured it perfectly on HADOOP-11656 - it should be possible 
for every major feature that we develop to be a opt in, unless the change is so 
great and users can balance out the incompatibilities for the new stuff they 
are getting. Even with an ground breaking change like with YARN, we spent a bit 
of time to ensure compatibility (MAPREDUCE-5108) that has paid so many times 
over in return. Breaking compatibility shouldn't come across as too cheap a 
thing.

Thanks,
+Vinod

On Mar 4, 2015, at 10:15 AM, Andrew Wang 
andrew.w...@cloudera.commailto:andrew.w...@cloudera.com wrote:

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.



Re: Looking to a Hadoop 3 release

2015-03-05 Thread Steve Loughran
Sorry, outlook dequoted Alejandros's comments.

Let me try again with his comments in italic and proofreading of mine

On 05/03/2015 13:59, Steve Loughran 
ste...@hortonworks.commailto:ste...@hortonworks.com wrote:



On 05/03/2015 13:05, Alejandro Abdelnur 
tuc...@gmail.commailto:tuc...@gmail.commailto:tuc...@gmail.com wrote:

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long 
time to get out, and during that time 0.21, 0.22, got released and ignored; 
0.23 picked up and used in production.

The 2.04-alpha release was more of a troublespot as it got picked up widely 
enough to be used in products, and changes were made between that alpha  2.2 
itself which raised compatibility issues.

For 3.x I'd propose


  1.  Have less longevity of 3.x alpha/beta artifacts
  2.  Make clear there are no guarantees of compatibility from alpha/beta 
releases to shipping. Best effort, but not to the extent that it gets in the 
way. More succinctly: we will care more about seamless migration from 2.2+ to 
3.x than from a 3.0-alpha to 3.3 production.
  3.  Anybody who ships code based on 3.x alpha/beta to recognise and accept 
policy (2). Hadoop's instability guarantee for the 3.x alpha/beta phase

As well as backwards compatibility, we need to think about Forwards 
compatibility, with the goal being:

Any app written/shipped with the 3.x release binaries (JAR and native) will 
work in and against a 3.y Hadoop cluster, for all x, y in Natural  where y=x  
and is-release(x) and is-release(y)

That's important, as it means all server-side changes in 3.x which are expected 
to to mandate client-side updates: protocols, HDFS erasure decoding, security 
features, must be considered complete and stable before we can say 
is-release(x). In an ideal world, we'll even get the semantics right with tests 
to show this.

Fixing classpath hell downstream is certainly one feature I am +1 on. But: it's 
only one of the features, and given there's not any design doc on that JIRA, 
way too immature to set a release schedule on. An alpha schedule with 
no-guarantees and a regular alpha roll, could be viable, as new features go in 
and can then be used to experimentally try this stuff in branches of Hbase 
(well volunteered, Stack!), etc. Of course instability guarantees will be 
transitive downstream.


This time around we are not replacing the guts as we did from Hadoop 1 to
Hadoop 2, but superficial surgery to address issues were not considered (or
was too much to take on top of the guts transplant).

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility.


Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

The re-layout of all the source trees was a major change there, assuming 
there's no refactoring or switch of build tools then picking things back will 
be tractable


Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

+1; setting javac.version will fix this

What is nice about having java 8 as the base JVM is that it means you can be 
confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs 
can use all Java 8 features they want to.

There's one policy change to consider there which is possibly, just possibly, 
we could allow new modules in hadoop-tools to adopt Java 8 languages early, 
provided everyone recognised that backport to branch-2 isn't going to happen.

-Steve



Re: Looking to a Hadoop 3 release

2015-03-05 Thread Jason Lowe
I'm OK with a 3.0.0 release as long as we are minimizing the pain of 
maintaining yet another release line and conscious of the incompatibilities 
going into that release line.
For the former, I would really rather not see a branch-3 cut so soon.  It's yet 
another line onto which to cherry-pick, and I don't see why we need to add this 
overhead at such an early phase.  We should only create branch-3 when there's 
an incompatible change that the community wants and it should _not_ go into the 
next major release (i.e.: it's for Hadoop 4.0).  We can develop 3.0 alphas and 
betas on trunk and release from trunk in the interim.  IMHO we need to stop 
treating trunk as a place to exile patches.

For the latter, I think as a community we need to evaluate the benefits of 
breaking compatibility against the costs of migrating.  Each time we break 
compatibility we create a hurdle for people to jump when they move to the new 
release, and we should make those hurdles worth their time.  For example, 
wire-compatibility has been mentioned as part of this.  Any feature that breaks 
wire compatibility better be absolutely amazing, as it creates a huge hurdle 
for people to jump.
To summarize:+1 for a community-discussed roadmap of what we're breaking in 
Hadoop 3 and why it's worth it for users
-1 for creating branch-3 now, we can release from trunk until the next 
incompatibility for Hadoop 4 arrives
+1 for baking classpath isolation as opt-in on 2.x and eventually default on in 
3.0
Jason
  From: Andrew Wang andrew.w...@cloudera.com
 To: hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org 
Cc: common-dev@hadoop.apache.org common-dev@hadoop.apache.org; 
mapreduce-...@hadoop.apache.org mapreduce-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org yarn-...@hadoop.apache.org 
 Sent: Wednesday, March 4, 2015 12:15 PM
 Subject: Re: Looking to a Hadoop 3 release
   
Let's not dismiss this quite so handily.

Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
could make classpath isolation opt-in via configuration, what we really
want longer term is to have it on by default (or just always on). Stack in
particular points out the practical difficulties in using an opt-in method
in 2.x from a downstream project perspective. It's not pretty.

The plan that both Sean and Jason propose (which I support) is to have an
opt-in solution in 2.x, bake it there, then turn it on by default
(incompatible) in a new major release. I think this lines up well with my
proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
to help with 2.x release management if that would help with testing this
feature.

Even setting aside classpath isolation, a new major release is still
justified by JDK8. Somehow this is being ignored in the discussion. Allen,
historically the voice of the user in our community, just highlighted it as
a major compatibility issue, and myself and Tucu have also expressed our
very strong concerns about bumping this in a minor release. 2.7's bump is a
unique exception, but this is not something to be cited as precedent or
policy.

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Regards,
Andrew



On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy a...@hortonworks.com wrote:

 Awesome, looks like we can just do this in a compatible manner - nothing
 else on the list seems like it warrants a (premature) major release.

 Thanks Vinod.

 Arun

 
 From: Vinod Kumar Vavilapalli vino...@hortonworks.com
 Sent: Tuesday, March 03, 2015 2:30 PM
 To: common-dev@hadoop.apache.org
 Cc: hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
 yarn-...@hadoop.apache.org
 Subject: Re: Looking to a Hadoop 3 release

 I started pitching in more on that JIRA.

 To add, I think we can and should strive for doing this in a compatible
 manner, whatever the approach. Marking and calling it incompatible before
 we see proposal/patch seems premature to me. Commented the same on JIRA:
 https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
 .

 Thanks
 +Vinod

 On Mar 2, 2015, at 8:08 PM, Andrew Wang andrew.w...@cloudera.commailto:
 andrew.w...@cloudera.com wrote:

 Regarding classpath isolation, based on what I hear from our customers,
 it's still a big problem (even after the MR classloader work). The latest
 Jackson version bump was quite painful for our downstream projects, and the
 HDFS client still leaks a lot

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Steve Loughran


On 05/03/2015 13:05, Alejandro Abdelnur 
tuc...@gmail.commailto:tuc...@gmail.com wrote:

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long 
time to get out, and during that time 0.21, 0.22, got released and ignored; 
0.23 picked up and used in production.

The 2.04-alpha release was more of a troublespot as it got picked up widely 
enough to be used in products, and changes were made between that alpha  2.2 
itself which raised compatibility issues.

For 3.x I'd propose


  1.  Have less longevity of 3.x alpha/beta artifacts
  2.  Make clear there are no guarantees of compatibility from alpha/beta 
releases to shipping. Best effort, but not to the extent that it gets in the 
way. More succinctly: we will care more about seamless migration from 2.2+ to 
3.x than from a 3.0-alpha to 3.3 production.
  3.  Anybody who ships code based on 3.x alpha/beta to recognise and accept 
policy (2). Hadoop's instability guarantee for the 3.x alpha/beta phase

As well as backwards compatibility, we need to think about Forwards 
compatibility, with the goal being:

Any app written/shipped with the 3.x release binaries (JAR and native) will 
work against a 3.y Hadoop release, for all x, y in Natural  where y=x  and 
is-release(x) and is-release(y)

That's important, as it means all server-side changes in 3.x which are expected 
to to mandate client-side updates: protocols, HDFS erasure decoding, security 
features, must be considered complete and stable before we can say 
is-release(x). In an ideal world, we'll even get the semantics right with tests 
to show this.

Fixing classpath hell downstream is certainly one feature I am +1 on this 
roadmap is classpath isolation. But: it's only one of the features, and given 
there's not any design doc on that JIRA, way too immature to set a release 
schedule on. An alpha schedule with no-guarantees and a regular alpha roll, 
could be viable, as new features go in and can then be used to experimentally 
try this stuff in branches of Hbase (well volunteered, Stack!), etc. Of course 
instability guarantees will transitive


This time around we are not replacing the guts as we did from Hadoop 1 to
Hadoop 2, but superficial surgery to address issues were not considered (or
was too much to take on top of the guts transplant).

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility.


Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

The re-layout of all the source trees was a major change there, assuming 
there's no refactoring or switch of build tools then picking things back will 
be tractable


Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

+1; setting javac.version will fix this

What is nice about having java 8 as the base JVM is that it means you can be 
confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs 
can use all Java 8 features they want to.

There's one policy change to consider there which is possibly, just possibly, 
we could allow new modules in hadoop-tools to adopt Java 8 languages early, 
provided everyone recognised that backport to branch-2 isn't going to happen.

-Steve


Re: Looking to a Hadoop 3 release

2015-03-05 Thread Siddharth Seth
I think it'll be useful to have a discussion about what else people would
like to see in Hadoop 3.x - especially if the change is potentially
incompatible. Also, what we expect the release schedule to be for major
releases and what triggers them - JVM version, major features, the need for
incompatible changes ? Assuming major versions will not be released every 6
months/1 year (adoption time, fairly disruptive for downstream projects,
and users) -  considering additional features/incompatible changes for 3.x
would be useful.

Some features that come to mind immediately would be
1) enhancements to the RPC mechanics - specifically support for AsynRPC /
two way communication. There's a lot of places where we re-use heartbeats
to send more information than what would be done if the PRC layer supported
these features. Some of this can be done in a compatible manner to the
existing RPC sub-system. Others like 2 way communication probably cannot.
After this, having HDFS/YARN actually make use of these changes. The other
consideration is adoption of an alternate system ike gRpc which would be
incompatible.
2) Simplification of configs - potentially separating client side configs
and those used by daemons. This is another source of perpetual confusion
for users.

Thanks
- Sid


On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran ste...@hortonworks.com
wrote:

 Sorry, outlook dequoted Alejandros's comments.

 Let me try again with his comments in italic and proofreading of mine

 On 05/03/2015 13:59, Steve Loughran ste...@hortonworks.commailto:
 ste...@hortonworks.com wrote:



 On 05/03/2015 13:05, Alejandro Abdelnur tuc...@gmail.commailto:
 tuc...@gmail.commailto:tuc...@gmail.com wrote:

 IMO, if part of the community wants to take on the responsibility and work
 that takes to do a new major release, we should not discourage them from
 doing that.

 Having multiple major branches active is a standard practice.

 Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
 long time to get out, and during that time 0.21, 0.22, got released and
 ignored; 0.23 picked up and used in production.

 The 2.04-alpha release was more of a troublespot as it got picked up
 widely enough to be used in products, and changes were made between that
 alpha  2.2 itself which raised compatibility issues.

 For 3.x I'd propose


   1.  Have less longevity of 3.x alpha/beta artifacts
   2.  Make clear there are no guarantees of compatibility from alpha/beta
 releases to shipping. Best effort, but not to the extent that it gets in
 the way. More succinctly: we will care more about seamless migration from
 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
   3.  Anybody who ships code based on 3.x alpha/beta to recognise and
 accept policy (2). Hadoop's instability guarantee for the 3.x alpha/beta
 phase

 As well as backwards compatibility, we need to think about Forwards
 compatibility, with the goal being:

 Any app written/shipped with the 3.x release binaries (JAR and native)
 will work in and against a 3.y Hadoop cluster, for all x, y in Natural
 where y=x  and is-release(x) and is-release(y)

 That's important, as it means all server-side changes in 3.x which are
 expected to to mandate client-side updates: protocols, HDFS erasure
 decoding, security features, must be considered complete and stable before
 we can say is-release(x). In an ideal world, we'll even get the semantics
 right with tests to show this.

 Fixing classpath hell downstream is certainly one feature I am +1 on. But:
 it's only one of the features, and given there's not any design doc on that
 JIRA, way too immature to set a release schedule on. An alpha schedule with
 no-guarantees and a regular alpha roll, could be viable, as new features go
 in and can then be used to experimentally try this stuff in branches of
 Hbase (well volunteered, Stack!), etc. Of course instability guarantees
 will be transitive downstream.


 This time around we are not replacing the guts as we did from Hadoop 1 to
 Hadoop 2, but superficial surgery to address issues were not considered (or
 was too much to take on top of the guts transplant).

 For the split brain concern, we did a great of job maintaining Hadoop 1 and
 Hadoop 2 until Hadoop 1 faded away.

 And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
 compatibility.


 Based on that experience I would say that the coexistence of Hadoop 2 and
 Hadoop 3 will be much less demanding/traumatic.

 The re-layout of all the source trees was a major change there, assuming
 there's no refactoring or switch of build tools then picking things back
 will be tractable


 Also, to facilitate the coexistence we should limit Java language features
 to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
 we can remove this limitation.

 +1; setting javac.version will fix this

 What is nice about having java 8 as the base JVM is that it means you can
 be confident that all Hadoop 3 servers will be 

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Alejandro Abdelnur
If classloader isolation is in place, then dependency versions can freely
be upgraded as won't pollute apps space (things get trickier if there is an
ON/OFF switch).

On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer a...@altiscale.com wrote:


 Is there going to be a general upgrade of dependencies?  I'm thinking of
 jetty  jackson in particular.

 On Mar 5, 2015, at 5:24 PM, Andrew Wang andrew.w...@cloudera.com wrote:

  I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
  page. In addition to the two things I've been pushing, I also looked
  through Allen's list (thanks Allen for making this) and picked out the
  shell script rewrite and the removal of HFTP as big changes. This would
 be
  the place to propose features for inclusion in 3.x, I'd particularly
  appreciate help on the YARN/MR side.
 
  Based on what I'm hearing, let me modulate my proposal to the following:
 
  - We avoid cutting branch-3, and release off of trunk. The trunk-only
  changes don't look that scary, so I think this is fine. This does mean we
  need to be more rigorous before merging branches to trunk. I think
  Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
 would
  be very helpful in this regard.
  - We do not include anything to break wire compatibility unless (as Jason
  says) it's an unbelievably awesome feature.
  - No harm in rolling alphas from trunk, as it doesn't lock us to anything
  compatibility wise. Downstreams like releases.
 
  I'll take Steve's advice about not locking GA to a given date, but I also
  share his belief that we can alpha/beta/GA faster than it took for Hadoop
  2. Let's roll some intermediate releases, work on the roadmap items, and
  see how we're feeling in a few months.
 
  Best,
  Andrew
 
  On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth ss...@apache.org wrote:
 
  I think it'll be useful to have a discussion about what else people
 would
  like to see in Hadoop 3.x - especially if the change is potentially
  incompatible. Also, what we expect the release schedule to be for major
  releases and what triggers them - JVM version, major features, the need
 for
  incompatible changes ? Assuming major versions will not be released
 every 6
  months/1 year (adoption time, fairly disruptive for downstream projects,
  and users) -  considering additional features/incompatible changes for
 3.x
  would be useful.
 
  Some features that come to mind immediately would be
  1) enhancements to the RPC mechanics - specifically support for AsynRPC
 /
  two way communication. There's a lot of places where we re-use
 heartbeats
  to send more information than what would be done if the PRC layer
 supported
  these features. Some of this can be done in a compatible manner to the
  existing RPC sub-system. Others like 2 way communication probably
 cannot.
  After this, having HDFS/YARN actually make use of these changes. The
 other
  consideration is adoption of an alternate system ike gRpc which would be
  incompatible.
  2) Simplification of configs - potentially separating client side
 configs
  and those used by daemons. This is another source of perpetual confusion
  for users.
 
  Thanks
  - Sid
 
 
  On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran ste...@hortonworks.com
  wrote:
 
  Sorry, outlook dequoted Alejandros's comments.
 
  Let me try again with his comments in italic and proofreading of mine
 
  On 05/03/2015 13:59, Steve Loughran ste...@hortonworks.commailto:
  ste...@hortonworks.com wrote:
 
 
 
  On 05/03/2015 13:05, Alejandro Abdelnur tuc...@gmail.commailto:
  tuc...@gmail.commailto:tuc...@gmail.com wrote:
 
  IMO, if part of the community wants to take on the responsibility and
  work
  that takes to do a new major release, we should not discourage them
 from
  doing that.
 
  Having multiple major branches active is a standard practice.
 
  Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
  long time to get out, and during that time 0.21, 0.22, got released and
  ignored; 0.23 picked up and used in production.
 
  The 2.04-alpha release was more of a troublespot as it got picked up
  widely enough to be used in products, and changes were made between
 that
  alpha  2.2 itself which raised compatibility issues.
 
  For 3.x I'd propose
 
 
   1.  Have less longevity of 3.x alpha/beta artifacts
   2.  Make clear there are no guarantees of compatibility from
 alpha/beta
  releases to shipping. Best effort, but not to the extent that it gets
 in
  the way. More succinctly: we will care more about seamless migration
 from
  2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
   3.  Anybody who ships code based on 3.x alpha/beta to recognise and
  accept policy (2). Hadoop's instability guarantee for the 3.x
  alpha/beta
  phase
 
  As well as backwards compatibility, we need to think about Forwards
  compatibility, with the goal being:
 
  Any app written/shipped with the 3.x release binaries (JAR and native)
  will work in and 

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Andrew Wang
I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
page. In addition to the two things I've been pushing, I also looked
through Allen's list (thanks Allen for making this) and picked out the
shell script rewrite and the removal of HFTP as big changes. This would be
the place to propose features for inclusion in 3.x, I'd particularly
appreciate help on the YARN/MR side.

Based on what I'm hearing, let me modulate my proposal to the following:

- We avoid cutting branch-3, and release off of trunk. The trunk-only
changes don't look that scary, so I think this is fine. This does mean we
need to be more rigorous before merging branches to trunk. I think
Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would
be very helpful in this regard.
- We do not include anything to break wire compatibility unless (as Jason
says) it's an unbelievably awesome feature.
- No harm in rolling alphas from trunk, as it doesn't lock us to anything
compatibility wise. Downstreams like releases.

I'll take Steve's advice about not locking GA to a given date, but I also
share his belief that we can alpha/beta/GA faster than it took for Hadoop
2. Let's roll some intermediate releases, work on the roadmap items, and
see how we're feeling in a few months.

Best,
Andrew

On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth ss...@apache.org wrote:

 I think it'll be useful to have a discussion about what else people would
 like to see in Hadoop 3.x - especially if the change is potentially
 incompatible. Also, what we expect the release schedule to be for major
 releases and what triggers them - JVM version, major features, the need for
 incompatible changes ? Assuming major versions will not be released every 6
 months/1 year (adoption time, fairly disruptive for downstream projects,
 and users) -  considering additional features/incompatible changes for 3.x
 would be useful.

 Some features that come to mind immediately would be
 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
 two way communication. There's a lot of places where we re-use heartbeats
 to send more information than what would be done if the PRC layer supported
 these features. Some of this can be done in a compatible manner to the
 existing RPC sub-system. Others like 2 way communication probably cannot.
 After this, having HDFS/YARN actually make use of these changes. The other
 consideration is adoption of an alternate system ike gRpc which would be
 incompatible.
 2) Simplification of configs - potentially separating client side configs
 and those used by daemons. This is another source of perpetual confusion
 for users.

 Thanks
 - Sid


 On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran ste...@hortonworks.com
 wrote:

  Sorry, outlook dequoted Alejandros's comments.
 
  Let me try again with his comments in italic and proofreading of mine
 
  On 05/03/2015 13:59, Steve Loughran ste...@hortonworks.commailto:
  ste...@hortonworks.com wrote:
 
 
 
  On 05/03/2015 13:05, Alejandro Abdelnur tuc...@gmail.commailto:
  tuc...@gmail.commailto:tuc...@gmail.com wrote:
 
  IMO, if part of the community wants to take on the responsibility and
 work
  that takes to do a new major release, we should not discourage them from
  doing that.
 
  Having multiple major branches active is a standard practice.
 
  Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
  long time to get out, and during that time 0.21, 0.22, got released and
  ignored; 0.23 picked up and used in production.
 
  The 2.04-alpha release was more of a troublespot as it got picked up
  widely enough to be used in products, and changes were made between that
  alpha  2.2 itself which raised compatibility issues.
 
  For 3.x I'd propose
 
 
1.  Have less longevity of 3.x alpha/beta artifacts
2.  Make clear there are no guarantees of compatibility from alpha/beta
  releases to shipping. Best effort, but not to the extent that it gets in
  the way. More succinctly: we will care more about seamless migration from
  2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
3.  Anybody who ships code based on 3.x alpha/beta to recognise and
  accept policy (2). Hadoop's instability guarantee for the 3.x
 alpha/beta
  phase
 
  As well as backwards compatibility, we need to think about Forwards
  compatibility, with the goal being:
 
  Any app written/shipped with the 3.x release binaries (JAR and native)
  will work in and against a 3.y Hadoop cluster, for all x, y in Natural
  where y=x  and is-release(x) and is-release(y)
 
  That's important, as it means all server-side changes in 3.x which are
  expected to to mandate client-side updates: protocols, HDFS erasure
  decoding, security features, must be considered complete and stable
 before
  we can say is-release(x). In an ideal world, we'll even get the semantics
  right with tests to show this.
 
  Fixing classpath hell downstream is certainly one feature I am +1 on.
 But:
  it's 

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Yongjun Zhang
Thanks all.

There is an open issue HDFS-6962 (ACLs inheritance conflicts with
umaskmode), for which the incompatibility appears to make it not suitable
for 2.x and it's targetted 3.0, please see:

https://issues.apache.org/jira/browse/HDFS-6962?focusedCommentId=14335418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14335418

Best,

--Yongjun


On Wed, Mar 4, 2015 at 8:13 PM, Allen Wittenauer a...@altiscale.com wrote:


 One of the questions that keeps popping up is “what exactly is in trunk?”

 As some may recall, I had done some experiments creating the change log
 based upon JIRA.  While the interest level appeared to be approaching zero,
 I kept playing with it a bit and eventually also started playing with the
 release notes script (for various reasons I won’t bore you with.)

 In any case, I’ve started posting the results of these runs on one of my
 github repos if anyone was wanting a quick reference as to JIRA’s opinion
 on the matter:

 https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0





Re: Looking to a Hadoop 3 release

2015-03-05 Thread Chris Douglas
On Mon, Mar 2, 2015 at 11:04 PM, Konstantin Shvachko
shv.had...@gmail.com wrote:
 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
 manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
 other versions. If that somehow beneficial for commercial vendors, which I
 don't see how, for the community it was proven to be very disruptive. Would
 be really good to avoid it this time.

Agreed; let's try to minimize backporting headaches. Pulling trunk 
branch-2  branch-2.x is already tedious. Adding a branch-3,
branch-3.x would be obnoxious.

 3. Could we release Hadoop 3 directly from trunk? With a proper feature
 freeze in advance. Current trunk is in the best working condition I've seen
 in years - much better, than when hadoop-2 was coming to life. It could
 make a good alpha.

+1 This sounds like a good approach. Marked as alpha, we can break
compatibility in minor versions. Stabilizing a beta can correspond
with cutting branch-3, since that will be winding down branch-2. This
shouldn't disrupt existing plans for branch-2.

However, this requires that committers not accumulate too much
compatibility debt in trunk. Undoing all that in branch-3 imposes a
burdensome tax. Scanning through Allen's diff: that doesn't appear to
be the case so far, but it recommends against developing features in
place on trunk. Just be considerate of users and developers who will
need to move from (and maintain) branch-2.

 I believe we can start planning 3.0 from trunk right after 2.7 is out.

If we're publishing a snapshot, we don't need too much planning. -C

 On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
 wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew



Re: Looking to a Hadoop 3 release

2015-03-04 Thread Andrew Wang
Let's not dismiss this quite so handily.

Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
could make classpath isolation opt-in via configuration, what we really
want longer term is to have it on by default (or just always on). Stack in
particular points out the practical difficulties in using an opt-in method
in 2.x from a downstream project perspective. It's not pretty.

The plan that both Sean and Jason propose (which I support) is to have an
opt-in solution in 2.x, bake it there, then turn it on by default
(incompatible) in a new major release. I think this lines up well with my
proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
to help with 2.x release management if that would help with testing this
feature.

Even setting aside classpath isolation, a new major release is still
justified by JDK8. Somehow this is being ignored in the discussion. Allen,
historically the voice of the user in our community, just highlighted it as
a major compatibility issue, and myself and Tucu have also expressed our
very strong concerns about bumping this in a minor release. 2.7's bump is a
unique exception, but this is not something to be cited as precedent or
policy.

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Regards,
Andrew

On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy a...@hortonworks.com wrote:

 Awesome, looks like we can just do this in a compatible manner - nothing
 else on the list seems like it warrants a (premature) major release.

 Thanks Vinod.

 Arun

 
 From: Vinod Kumar Vavilapalli vino...@hortonworks.com
 Sent: Tuesday, March 03, 2015 2:30 PM
 To: common-dev@hadoop.apache.org
 Cc: hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
 yarn-...@hadoop.apache.org
 Subject: Re: Looking to a Hadoop 3 release

 I started pitching in more on that JIRA.

 To add, I think we can and should strive for doing this in a compatible
 manner, whatever the approach. Marking and calling it incompatible before
 we see proposal/patch seems premature to me. Commented the same on JIRA:
 https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
 .

 Thanks
 +Vinod

 On Mar 2, 2015, at 8:08 PM, Andrew Wang andrew.w...@cloudera.commailto:
 andrew.w...@cloudera.com wrote:

 Regarding classpath isolation, based on what I hear from our customers,
 it's still a big problem (even after the MR classloader work). The latest
 Jackson version bump was quite painful for our downstream projects, and the
 HDFS client still leaks a lot of dependencies. Would welcome more
 discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
 chimed in.




Re: Looking to a Hadoop 3 release

2015-03-04 Thread Stack
In general +1 on 3.0.0. Its time. If we start now, it might make it out by
2016. If we start now, downstreamers can start aligning themselves to land
versions that suit at about the same time.

While two big items have been called out as possible incompatible changes,
and there is ongoing discussion as to whether they are or not*, is there
any chance of getting a longer list of big differences between the
branches? In particular I'd be interested in improvements that are 'off' by
default that would be better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept
seemingly open to interpretation with a definition that is other than
prevails elsewhere in software. See Allen's list above, and in our
downstream project, the recent HBASE-13149 HBase server MR tools are
broken on Hadoop 2.5+ Yarn, among others.  Let 3.x be incompatible with
2.x if only so we can leave behind all current notions of 'compatibility'
and just start over (as per Allen).


On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew



RE: Looking to a Hadoop 3 release

2015-03-04 Thread Zheng, Kai
Might I have some comments for this, just providing my thought. Thanks.

 If we start now, it might make it out by 2016. If we start now, 
 downstreamers can start aligning themselves to land versions that suit at 
 about the same time.
Not only for down streamers to align with the long term release, but also for 
contributors like me to align with their future effort, maybe.

In addition to the JDK8 support and classpath isolation, might we add more 
possible candidate considerations. 
How would you like this one, HADOOP-9797 Pluggable and compatible UGI change ?
https://issues.apache.org/jira/browse/HADOOP-9797

The benefits: 
1) allow multiple login sessions/contexts and authentication methods to be used 
in the same Java application/process without conflicts, providing good 
isolation by getting rid of globals and statics.
2) allow to pluggable new authentication methods for UGI, in modular, 
manageable and maintainable manner.

Another, we would also push the first release of Apache Kerby, preparing for a 
strong dedicated and clean Kerberos library in Java for both client and KDC 
sides, and by leveraging the library, 
update Hadoop-MiniKDC and perform more security tests.
https://issues.apache.org/jira/browse/DIRKRB-102

Hope this makes sense. Thanks.

Regards,
Kai

-Original Message-
From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack
Sent: Thursday, March 05, 2015 2:47 AM
To: common-dev@hadoop.apache.org
Cc: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

In general +1 on 3.0.0. Its time. If we start now, it might make it out by 
2016. If we start now, downstreamers can start aligning themselves to land 
versions that suit at about the same time.

While two big items have been called out as possible incompatible changes, and 
there is ongoing discussion as to whether they are or not*, is there any chance 
of getting a longer list of big differences between the branches? In particular 
I'd be interested in improvements that are 'off' by default that would be 
better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept 
seemingly open to interpretation with a definition that is other than prevails 
elsewhere in software. See Allen's list above, and in our downstream project, 
the recent HBASE-13149 HBase server MR tools are broken on Hadoop 2.5+ Yarn, 
among others.  Let 3.x be incompatible with 2.x if only so we can leave behind 
all current notions of 'compatibility'
and just start over (as per Allen).


On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about 
 due for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that 
 will have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been 
 a long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to 
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two 
 months from now). In the past, we've had issues with our dependencies 
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and 
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish 
 series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM 
 and other cat herding responsibilities. There are already quite a few 
 changes slated for 3.0 besides the above (for instance the shell 
 script rewrite) so there's already value in a 3.0 alpha, and the more 
 time we give downstreams to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm 
 hoping to freeze incompatible changes after maybe two alphas, do a 
 beta (with no further incompat changes allowed), and then finally a 
 3.x GA. For those keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a 
 big bang release. For instance, it would be great if we could maintain 
 wire compatibility between 2.x and 3.x, so rolling upgrades work. 
 Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're 
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If 
 people are friendly to the idea, I'd like to cut a branch-3 and start 
 working on the first alpha.

 Best,
 Andrew



Re: Looking to a Hadoop 3 release

2015-03-04 Thread Allen Wittenauer

One of the questions that keeps popping up is “what exactly is in trunk?”

As some may recall, I had done some experiments creating the change log based 
upon JIRA.  While the interest level appeared to be approaching zero, I kept 
playing with it a bit and eventually also started playing with the release 
notes script (for various reasons I won’t bore you with.)

In any case, I’ve started posting the results of these runs on one of my github 
repos if anyone was wanting a quick reference as to JIRA’s opinion on the 
matter:

https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0




Re: Looking to a Hadoop 3 release

2015-03-03 Thread Andrew Wang
Hi Konst, thanks for taking a look. I think I essentially agree with your
points.

On Mon, Mar 2, 2015 at 11:04 PM, Konstantin Shvachko shv.had...@gmail.com
wrote:

 Andrew,

 Hadoop 3 seems in general like a good idea to me.
 1. I did not understand if you propose to release 3.0 instead of 2.7 or in
 addition?
I think 2.7 is needed at least as a stabilization step for the 2.x line.

 I agree with this, 2.7 is needed, and I think Vinod/Arun are working on it
now.

I expect branch-2 to be maintained for a while yet, separate from a
branch-3.


 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
 manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
 other versions. If that somehow beneficial for commercial vendors, which I
 don't see how, for the community it was proven to be very disruptive. Would
 be really good to avoid it this time.

 My motivations here are purely what I've stated above. I remember the pain
of the branch-1 days as well, and this would be a far, far smaller
difference. JDK8 min version and classpath isolation are compelling, yet
incompatible, which is why I'm proposing Hadoop 3. Besides those two
features, it should be approximately the same size as our 2.x releases.


 3. Could we release Hadoop 3 directly from trunk? With a proper feature
 freeze in advance. Current trunk is in the best working condition I've seen
 in years - much better, than when hadoop-2 was coming to life. It could
 make a good alpha.
 I believe we can start planning 3.0 from trunk right after 2.7 is out.


I agree with this, and would be okay with this if our audit of trunk
reveals no incompatible changes we're uncomfortable releasing.

I'll note though that committing to multiple branches is way easier now
with git and cherry-pick, so that overhead is reduced. Rolling out an alpha
now is strictly a good thing for our downstreams, even if it means we need
to do extra commits.

Thanks,
Andrew


Re: Looking to a Hadoop 3 release

2015-03-03 Thread Andrew Wang
Hi Akira, thanks for responding,

On Tue, Mar 3, 2015 at 4:04 AM, Akira AJISAKA ajisa...@oss.nttdata.co.jp
wrote:

 Thanks Andrew for bringing this up.
 +1 mostly looks fine but I'm thinking it's not now to cut branch-3.

  classpath isolation

 IMHO, classpath isolation is a good thing to do.
 We should pay down the technical dept ASAP. I'm willing to help.

 I'm thinking we can cut branch-3 and release 3.0 alpha
 after HADOOP-11656 is fixed. That is, I'd like to mark
 this issue as a blocker for 3.0.
 I wonder that even if we cut branch-3 now, trunk and
 branch-3 would be the same for a while. That seems useless.

 I'm willing to wait a bit here, but I think even what we have now is worth
kicking the tires, and either the JDK8 target version or classpath
isolation would make it even more compelling.

If you're worried about backport overheads, Konst's proposal of releasing
directly from trunk might be appealing. Needs some more examination though.


  JDK8

 As Steve suggested, JDK8 can be in both trunk and branch-2.
 +1 for moving to JDK8 ASAP.

 We can make sure branch-2 runs well under JDK8, but I'm against doing a
target version bump to JDK8 like we're planning to do for JDK7 in a minor
release. As I described in my reply to Arun, that was a special
circumstance, and JDK target version bumps really are deserving of a new
major release.


  maintaining 2.x

 For user side, now there is little merit to upgrade to 3.x.
 More important thing is how long 2.x will be maintained.
 Therefore we should consider when to stop backporting
 new features to 2.x, and when to stop maintaining 2.x.
 I'd like to maintain 2.x as long as possible, at least
 one year after 3.x GA release.

 The value in releasing alphas right now is not so much for end users, but
for downstream projects which need time to integrate. I don't expect
end-users to really jump on 3.x until the downstreams have also rolled new
releases based on 3.x.

Determining when support for 2.x is over is done by the community. I
personally plan to keep backporting for a while after 3.x GA is released.
If backports to branch-2 tail off, it just takes one committer with the
interest to keep maintaining it. This has been a common thing in HBase for
instance, Lars H maintained 0.92 for a long time because he had the
interest.


 * Other issue

 What's the current status of HDFS symlink?
 If HADOOP-10019 requires some incompatible changes,
 I'd like to include in 3.x.

 There are still a lot of unresolved compatibility and security issues,
especially with cross-filesystem symlinks. We tabled this work before, and
frankly I'm not sure these issues will ever be satisfactorily resolved.
Even today, there are plenty of Unix apps that don't handle symlinks
correctly, and we still lack equivalents of more secure syscalls like
openat() in the first place.

Thanks,
Andrew


Re: Looking to a Hadoop 3 release

2015-03-03 Thread Andrew Wang
Hi Junping, thanks for your response,

I view branch-3 as essentially the same size as our recent 2.x releases,
with the exception of incompatible changes like classpath isolation and
JDK8 target version. These, while perhaps not revolutionary, are still
incompatible, and require a major version bump.

I don't see a forking of the community effort, since backports should flow
pretty easily from branch-3 to branch-2 the same way they currently can
flow from branch-2 to branch-2.6. It's just an extra git commit, not like
what we had to deal with in the branch-1 days with a custom backport.

Hopefully that addresses your concerns.

Thanks,
Andrew

On Tue, Mar 3, 2015 at 6:12 AM, Junping Du j...@hortonworks.com wrote:

 Thanks all for good discussions here.
 +1 on supporting Java 8 ASAP. In addition, I agree that we should
 separating this effort with cutting down Hadoop 3.
 IMO, Hadoop is still very cool today, and we should only consider Hadoop 3
 until we have revolutionary feature (like YARN for 2.0) which deserve to
 break fundamental compatibilities. Or it may just cause more distractions
 for community effort.
 Just 2 cents.

 Thanks,

 Junping
 
 From: Akira AJISAKA ajisa...@oss.nttdata.co.jp
 Sent: Tuesday, March 03, 2015 12:04 PM
 To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
 hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
 Subject: Re: Looking to a Hadoop 3 release

 Thanks Andrew for bringing this up.
 +1 mostly looks fine but I'm thinking it's not now to cut branch-3.

   classpath isolation

 IMHO, classpath isolation is a good thing to do.
 We should pay down the technical dept ASAP. I'm willing to help.

 I'm thinking we can cut branch-3 and release 3.0 alpha
 after HADOOP-11656 is fixed. That is, I'd like to mark
 this issue as a blocker for 3.0.
 I wonder that even if we cut branch-3 now, trunk and
 branch-3 would be the same for a while. That seems useless.

   JDK8

 As Steve suggested, JDK8 can be in both trunk and branch-2.
 +1 for moving to JDK8 ASAP.

   maintaining 2.x

 For user side, now there is little merit to upgrade to 3.x.
 More important thing is how long 2.x will be maintained.
 Therefore we should consider when to stop backporting
 new features to 2.x, and when to stop maintaining 2.x.
 I'd like to maintain 2.x as long as possible, at least
 one year after 3.x GA release.

 * Other issue

 What's the current status of HDFS symlink?
 If HADOOP-10019 requires some incompatible changes,
 I'd like to include in 3.x.

 Regards,
 Akira

 On 3/2/15 15:19, Andrew Wang wrote:
  Hi devs,
 
  It's been a year and a half since 2.x went GA, and I think we're about
 due
  for a 3.x release.
  Notably, there are two incompatible changes I'd like to call out, that
 will
  have a tremendous positive impact for our users.
 
  First, classpath isolation being done at HADOOP-11656, which has been a
  long-standing request from many downstreams and Hadoop users.
 
  Second, bumping the source and target JDK version to JDK8 (related to
  HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
  months from now). In the past, we've had issues with our dependencies
  discontinuing support for old JDKs, so this will future-proof us.
 
  Between the two, we'll also have quite an opportunity to clean up and
  upgrade our dependencies, another common user and developer request.
 
  I'd like to propose that we start rolling a series of monthly-ish series
 of
  3.0 alpha releases ASAP, with myself volunteering to take on the RM and
  other cat herding responsibilities. There are already quite a few changes
  slated for 3.0 besides the above (for instance the shell script rewrite)
 so
  there's already value in a 3.0 alpha, and the more time we give
 downstreams
  to integrate, the better.
 
  This opens up discussion about inclusion of other changes, but I'm hoping
  to freeze incompatible changes after maybe two alphas, do a beta (with no
  further incompat changes allowed), and then finally a 3.x GA. For those
  keeping track, that means a 3.x GA in about four months.
 
  I would also like to stress though that this is not intended to be a big
  bang release. For instance, it would be great if we could maintain wire
  compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
  branch-2 and branch-3 similar also makes backports easier, since we're
  likely maintaining 2.x for a while yet.
 
  Please let me know any comments / concerns related to the above. If
 people
  are friendly to the idea, I'd like to cut a branch-3 and start working on
  the first alpha.
 
  Best,
  Andrew
 




Re: Looking to a Hadoop 3 release

2015-03-03 Thread Steve Loughran

I want to understand a lot more about the classpath isolation (HADOOP-11656) 
proposal, specifically, what is proposed and does it have to be tagged as 
incompatible? That's a bigger change than must setting javac.version=8 in the 
POM —though given what a fundamental problem it addresses, I'm in favour of 
doing something there.


On 3 March 2015 at 08:05:46, Andrew Wang 
(andrew.w...@cloudera.commailto:andrew.w...@cloudera.com) wrote:

I view branch-3 as essentially the same size as our recent 2.x releases,
with the exception of incompatible changes like classpath isolation and
JDK8 target version. These, while perhaps not revolutionary, are still
incompatible, and require a major version bump.


Re: Looking to a Hadoop 3 release

2015-03-03 Thread Vinod Kumar Vavilapalli
Totally agreed. I just left a comment there on the current state and what is 
needed. As of now, I think the big (and only?) changes are flipping the default 
classloader for tasks and splitting the HDFS jar.

Thanks,
+Vinod

On Mar 3, 2015, at 9:02 AM, Steve Loughran 
ste...@hortonworks.commailto:ste...@hortonworks.com wrote:


I want to understand a lot more about the classpath isolation (HADOOP-11656) 
proposal, specifically, what is proposed and does it have to be tagged as 
incompatible? That's a bigger change than must setting javac.version=8 in the 
POM —though given what a fundamental problem it addresses, I'm in favour of 
doing something there.


On 3 March 2015 at 08:05:46, Andrew Wang 
(andrew.w...@cloudera.commailto:andrew.w...@cloudera.com) wrote:

I view branch-3 as essentially the same size as our recent 2.x releases,
with the exception of incompatible changes like classpath isolation and
JDK8 target version. These, while perhaps not revolutionary, are still
incompatible, and require a major version bump.



Re: Looking to a Hadoop 3 release

2015-03-03 Thread Allen Wittenauer
Between:

* removing -finalize
* breaking HDFS browsing
* changing du’s output (in the 2.7 branch)
* changing various names of metrics (either intentionally or otherwise)
* changing the JDK release

… and probably lots of other stuff in branch-2 I haven’t seen/know 
about, our best course of action is to:

$ git rm hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md

At least this way we as caretakers don’t come across as hypocrits.  
It’s pretty clear the direction has shown we only care about API compatibility 
and the rest is ignored when it isn’t “convenient”.  [The next time someone 
tells you that Hadoop is hard to operate, I want you think about this email.]  
(1)

Making 2.7 build with JDK7 led to the *exact* situation I figured it 
would:  now we have a precedent where we just say to the community “You know 
those guarantees?  Yeah, you might as well ignore them because we’re going to 
change the core component any damn time we feel like it.”

We haven’t made a release branch off of trunk since branch-0.23.  If 
anyone thinks that’s healthy, there is some beach property in Alberta you might 
be interested in as well. Our release cycle came to a screeching halt after 
0.20 and we’ve never recovered.

However, I offer an alternative.

This same circular argument comes up all the time: (2)

* There aren’t enough changes in trunk to make a new branch. 
* We can’t upgrade/change component X because there is no plan to make 
a new major release.

To quote Frozen:  Let It Go

We’re probably at the point where there aren’t likely to be very many 
more earth shattering changes to the Hadoop code base.  The community has 
decided instead to push these types of changes as separate projects via 
incubator to avoid the committer paralysis that this community suffers.  

Because of this, I don’t think the “enough changes” argument works 
anymore.  Instead, we need to pick a new metric to build a cadence to force 
regular updates.  I’d offer that the “every two years” JDK EOL sets the perfect 
cadence, matched by many other enterprise and OSS software, and gives us an 
opportunity to reflect in the version number that the critical component of our 
software has changed.

This cadence allows for people to plan appropriately and know what our 
roadmap and direction actually is.  Folks are more likely to build “real” 
solutions rather than make compromises that suffer in quality in the name of 
compatibility simply because they don’t know when their work will actually show 
up. We’ll have a normal, regular opportunity to update dependencies (regardless 
of the state of HADOOP-11656).

Now, if you’ll excuse me, I have more contributor's patches to go 
through.

(1) FWIW, I made the decision not to worry about backward compatibility in the 
shell code rewrite when I made the realization that the jsvc log and pid file 
names were poorly chosen to allow for certain capabilities.  Did anyone 
actually touch them from outside the software? Probably not.  But it is still 
effectively an interface, so off to trunk it went. 

(2) … and that’s before we even get to the “Version numbers are cheap” 
arguments that were made during the Great Renames of 0.20 and 0.23.

Re: Looking to a Hadoop 3 release

2015-03-03 Thread sanjay Radia

 On Mar 3, 2015, at 9:36 AM, Karthik Kambatla ka...@cloudera.com wrote:
 
 If we preserve API compat and try to preserve wire compat, I don't see the
 harm in bumping the major release.

If we preserve compatibility, then there is no need to bump major number.
 It allows us to include several
 fixes/features in trunk in a release. If we are not actively thinking of a
 way to release items in trunk, why even have it?

What are the fixes and features in trunk that you would like to see get out 
quickly?
Can these be back ported easily to branch 2?

sanjay



Re: Looking to a Hadoop 3 release

2015-03-03 Thread Arun Murthy
Awesome, looks like we can just do this in a compatible manner - nothing else 
on the list seems like it warrants a (premature) major release.

Thanks Vinod.

Arun


From: Vinod Kumar Vavilapalli vino...@hortonworks.com
Sent: Tuesday, March 03, 2015 2:30 PM
To: common-dev@hadoop.apache.org
Cc: hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

I started pitching in more on that JIRA.

To add, I think we can and should strive for doing this in a compatible manner, 
whatever the approach. Marking and calling it incompatible before we see 
proposal/patch seems premature to me. Commented the same on JIRA: 
https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875.

Thanks
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang 
andrew.w...@cloudera.commailto:andrew.w...@cloudera.com wrote:

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.



Re: Looking to a Hadoop 3 release

2015-03-03 Thread Vinod Kumar Vavilapalli

I started pitching in more on that JIRA.

To add, I think we can and should strive for doing this in a compatible manner, 
whatever the approach. Marking and calling it incompatible before we see 
proposal/patch seems premature to me. Commented the same on JIRA: 
https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875.

Thanks
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang 
andrew.w...@cloudera.commailto:andrew.w...@cloudera.com wrote:

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.



Re: Looking to a Hadoop 3 release

2015-03-03 Thread Karthik Kambatla
I am surprised classpath-isolation is being called a minor issue. We have
been hearing users complain about Hadoop leaking its dependencies into the
classpath for a while now, Guava being the culprit often. Not being able to
upgrade our dependencies without affecting users has started to hamper our
development too; e.g. Guava conflict with upgrading Curator version.

If we preserve API compat and try to preserve wire compat, I don't see the
harm in bumping the major release. It allows us to include several
fixes/features in trunk in a release. If we are not actively thinking of a
way to release items in trunk, why even have it?

If there are any disadvantages to doing a major release, I would like to
know. May be, we could arrive at a plan to accomplish it without those
problems.

Thanks
Karthik

On Tue, Mar 3, 2015 at 9:02 AM, Steve Loughran ste...@hortonworks.com
wrote:


  I want to understand a lot more about the classpath isolation
 (HADOOP-11656) proposal, specifically, what is proposed and does it have to
 be tagged as incompatible? That's a bigger change than must setting
 javac.version=8 in the POM —though given what a fundamental problem it
 addresses, I'm in favour of doing something there.

 On 3 March 2015 at 08:05:46, Andrew Wang (andrew.w...@cloudera.com) wrote:

 I view branch-3 as essentially the same size as our recent 2.x releases,
 with the exception of incompatible changes like classpath isolation and
 JDK8 target version. These, while perhaps not revolutionary, are still
 incompatible, and require a major version bump.




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: Looking to a Hadoop 3 release

2015-03-03 Thread Junping Du
Thanks all for good discussions here.
+1 on supporting Java 8 ASAP. In addition, I agree that we should separating 
this effort with cutting down Hadoop 3. 
IMO, Hadoop is still very cool today, and we should only consider Hadoop 3 
until we have revolutionary feature (like YARN for 2.0) which deserve to break 
fundamental compatibilities. Or it may just cause more distractions for 
community effort.
Just 2 cents.

Thanks,

Junping

From: Akira AJISAKA ajisa...@oss.nttdata.co.jp
Sent: Tuesday, March 03, 2015 12:04 PM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

Thanks Andrew for bringing this up.
+1 mostly looks fine but I'm thinking it's not now to cut branch-3.

  classpath isolation

IMHO, classpath isolation is a good thing to do.
We should pay down the technical dept ASAP. I'm willing to help.

I'm thinking we can cut branch-3 and release 3.0 alpha
after HADOOP-11656 is fixed. That is, I'd like to mark
this issue as a blocker for 3.0.
I wonder that even if we cut branch-3 now, trunk and
branch-3 would be the same for a while. That seems useless.

  JDK8

As Steve suggested, JDK8 can be in both trunk and branch-2.
+1 for moving to JDK8 ASAP.

  maintaining 2.x

For user side, now there is little merit to upgrade to 3.x.
More important thing is how long 2.x will be maintained.
Therefore we should consider when to stop backporting
new features to 2.x, and when to stop maintaining 2.x.
I'd like to maintain 2.x as long as possible, at least
one year after 3.x GA release.

* Other issue

What's the current status of HDFS symlink?
If HADOOP-10019 requires some incompatible changes,
I'd like to include in 3.x.

Regards,
Akira

On 3/2/15 15:19, Andrew Wang wrote:
 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew




Re: Looking to a Hadoop 3 release

2015-03-02 Thread Karthik Kambatla
+1

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.


 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.


Guava etc. have been such a pain in the past. Can't wait to have a release
we don't have to worry about what version of dependencies users want to
use.



 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.


Are you saying we can use lambdas without re-writing all of Hadoop in
Scala?



 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities.


Will be glad to help.


 There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.


 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: Looking to a Hadoop 3 release

2015-03-02 Thread Aaron T. Myers
+1, this sounds like a good plan to me.

Thanks a lot for volunteering to take this on, Andrew.

Best,
Aaron

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew



Looking to a Hadoop 3 release

2015-03-02 Thread Andrew Wang
Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew


Re: Looking to a Hadoop 3 release

2015-03-02 Thread Yongjun Zhang
Thanks Andrew for the proposal.

+1, and I will be happy to help.

--Yongjun




On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew



Re: Looking to a Hadoop 3 release

2015-03-02 Thread sanjay Radia
Andrew 
  Thanks for bringing up the issue of moving to Java8. Java8 is important
However, I am not seeing a strong motivation for changing the major number.
We can go to Java8 in  the 2.series. 
The classpath issue for Hadoop-11656 is too minor to force a major number 
change (no pun intended).

Lets separate the issue of Java8 and Hadoop 3.0

sanjay


 On Mar 2, 2015, at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote:
 
 Hi devs,
 
 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.
 
 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.
 
 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.
 
 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.
 
 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.
 
 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.
 
 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.
 
 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.
 
 Best,
 Andrew



Re: Looking to a Hadoop 3 release

2015-03-02 Thread Vinod Kumar Vavilapalli

 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.


Is moving to JDK8 fundamentally different from the move to JDK7? We are moving 
to JDK7 via release 2.7 that I am helping with now.


 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.


Aren't the shell script rewrite changes supposed to be compatible?

Thanks,
+Vinod



Re: Looking to a Hadoop 3 release

2015-03-02 Thread Robert Kanter
+1  Happy to help too

On Mon, Mar 2, 2015 at 3:57 PM, Yongjun Zhang yzh...@cloudera.com wrote:

 Thanks Andrew for the proposal.

 +1, and I will be happy to help.

 --Yongjun




 On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
 wrote:

  Hi devs,
 
  It's been a year and a half since 2.x went GA, and I think we're about
 due
  for a 3.x release.
  Notably, there are two incompatible changes I'd like to call out, that
 will
  have a tremendous positive impact for our users.
 
  First, classpath isolation being done at HADOOP-11656, which has been a
  long-standing request from many downstreams and Hadoop users.
 
  Second, bumping the source and target JDK version to JDK8 (related to
  HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
  months from now). In the past, we've had issues with our dependencies
  discontinuing support for old JDKs, so this will future-proof us.
 
  Between the two, we'll also have quite an opportunity to clean up and
  upgrade our dependencies, another common user and developer request.
 
  I'd like to propose that we start rolling a series of monthly-ish series
 of
  3.0 alpha releases ASAP, with myself volunteering to take on the RM and
  other cat herding responsibilities. There are already quite a few changes
  slated for 3.0 besides the above (for instance the shell script rewrite)
 so
  there's already value in a 3.0 alpha, and the more time we give
 downstreams
  to integrate, the better.
 
  This opens up discussion about inclusion of other changes, but I'm hoping
  to freeze incompatible changes after maybe two alphas, do a beta (with no
  further incompat changes allowed), and then finally a 3.x GA. For those
  keeping track, that means a 3.x GA in about four months.
 
  I would also like to stress though that this is not intended to be a big
  bang release. For instance, it would be great if we could maintain wire
  compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
  branch-2 and branch-3 similar also makes backports easier, since we're
  likely maintaining 2.x for a while yet.
 
  Please let me know any comments / concerns related to the above. If
 people
  are friendly to the idea, I'd like to cut a branch-3 and start working on
  the first alpha.
 
  Best,
  Andrew
 



RE: Looking to a Hadoop 3 release

2015-03-02 Thread Zheng, Kai
JDK8 support is in the consideration, looks like many issues were reported and 
resolved already.

https://issues.apache.org/jira/browse/HADOOP-11090


-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for 
a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will 
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a 
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to 
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months 
from now). In the past, we've had issues with our dependencies discontinuing 
support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade 
our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other 
cat herding responsibilities. There are already quite a few changes slated for 
3.0 besides the above (for instance the shell script rewrite) so there's 
already value in a 3.0 alpha, and the more time we give downstreams to 
integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to 
freeze incompatible changes after maybe two alphas, do a beta (with no further 
incompat changes allowed), and then finally a 3.x GA. For those keeping track, 
that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang 
release. For instance, it would be great if we could maintain wire 
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely 
maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are 
friendly to the idea, I'd like to cut a branch-3 and start working on the first 
alpha.

Best,
Andrew


RE: Looking to a Hadoop 3 release

2015-03-02 Thread Zheng, Kai
Sorry for the bad. I thought it was sending to my colleagues. 

By the way, for the JDK8 support, we (Intel) would like to investigate further 
and help, thanks.

Regards,
Kai

-Original Message-
From: Zheng, Kai 
Sent: Tuesday, March 03, 2015 8:49 AM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: RE: Looking to a Hadoop 3 release

JDK8 support is in the consideration, looks like many issues were reported and 
resolved already.

https://issues.apache.org/jira/browse/HADOOP-11090


-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for 
a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will 
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a 
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to 
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months 
from now). In the past, we've had issues with our dependencies discontinuing 
support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade 
our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other 
cat herding responsibilities. There are already quite a few changes slated for 
3.0 besides the above (for instance the shell script rewrite) so there's 
already value in a 3.0 alpha, and the more time we give downstreams to 
integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to 
freeze incompatible changes after maybe two alphas, do a beta (with no further 
incompat changes allowed), and then finally a 3.x GA. For those keeping track, 
that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang 
release. For instance, it would be great if we could maintain wire 
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely 
maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are 
friendly to the idea, I'd like to cut a branch-3 and start working on the first 
alpha.

Best,
Andrew


RE: Looking to a Hadoop 3 release

2015-03-02 Thread Liu, Yi A
+1

Regards,
Yi Liu

-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for 
a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will 
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a 
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to 
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months 
from now). In the past, we've had issues with our dependencies discontinuing 
support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade 
our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other 
cat herding responsibilities. There are already quite a few changes slated for 
3.0 besides the above (for instance the shell script rewrite) so there's 
already value in a 3.0 alpha, and the more time we give downstreams to 
integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to 
freeze incompatible changes after maybe two alphas, do a beta (with no further 
incompat changes allowed), and then finally a 3.x GA. For those keeping track, 
that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang 
release. For instance, it would be great if we could maintain wire 
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely 
maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are 
friendly to the idea, I'd like to cut a branch-3 and start working on the first 
alpha.

Best,
Andrew


Re: Looking to a Hadoop 3 release

2015-03-02 Thread Chen He
+1 non-binding

It is a nice to have hadoop 3.x release. My honor to help.

Regards!

Chen

On Mon, Mar 2, 2015 at 4:58 PM, Zheng, Kai kai.zh...@intel.com wrote:

 Sorry for the bad. I thought it was sending to my colleagues.

 By the way, for the JDK8 support, we (Intel) would like to investigate
 further and help, thanks.

 Regards,
 Kai

 -Original Message-
 From: Zheng, Kai
 Sent: Tuesday, March 03, 2015 8:49 AM
 To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
 hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
 Subject: RE: Looking to a Hadoop 3 release

 JDK8 support is in the consideration, looks like many issues were reported
 and resolved already.

 https://issues.apache.org/jira/browse/HADOOP-11090


 -Original Message-
 From: Andrew Wang [mailto:andrew.w...@cloudera.com]
 Sent: Tuesday, March 03, 2015 7:20 AM
 To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
 hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
 Subject: Looking to a Hadoop 3 release

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that
 will have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew



Re: Looking to a Hadoop 3 release

2015-03-02 Thread Arun Murthy
Andrew,

 Thanks for bringing up this discussion.

 I'm a little puzzled for I feel like we are rehashing the same discussion from 
last year - where we agreed on a different course of action w.r.t switch to 
JDK7.

 IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly 
for users such as Yahoo/Twitter/eBay who have several clusters between which 
compatibility is paramount. 

 Now, breaking compatibility is perfectly fine over time where there is 
sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). 

 However, I'm struggling to quantify the benefit of hadoop-3 for users for the 
cost of the breakage.

 Given that we already agreed to put in JDK7 in 2.7, and that the classpath is 
a fairly minor irritant given some existing solutions (e.g. a new default 
classloader), how do you quantify the benefit for users?

 We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome 
to run the RM role for that release.

 Furthermore, I'm really concerned that this will be used as an opportunity to 
further break compat in more egregious ways. 

 Also, are you foreseeing more compat breaks? OTOH, if we all agree that we 
should absolutely prevent compat breakages such as the client-server wire 
protocol, I feel the point of a major release is kinda lost.

 Overall, my biggest concern is the compatibility story vis-a-vis the benefit. 

 Thoughts?

thanks,
Arun


From: Andrew Wang andrew.w...@cloudera.com
Sent: Monday, March 02, 2015 3:19 PM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew


Re: Looking to a Hadoop 3 release

2015-03-02 Thread Steve Loughran
I'm +1 for a migrate to Java 8 as soon as possible.

That's branch-2  trunk, as having them on the same language level makes 
cherrypicking stuff off trunk possible. That's particularly the case for Java 8 
as it is the first major change to the language since Java 5.

w.r.t shipping trunk as 3.x, it's going to take longer than planned. Hopefully 
not as long as the 2.x release process, but you never know.   Which means I 
expect some more Hadoop 2 releases this year. We need to make the jump there 
too, get 2.7 out the door and include a roadmap in there to when the java 8+ 
only event happens across the codebase.


-Steve


ps. for anyone who wants a pure java8 build today, set -Djavac.version=1.8 on 
the classpath of a maven build. Last time I tried there were some (minor) bits 
of YARN that wouldn't compile...




On 2 March 2015 at 18:31:00, Arun Murthy 
(a...@hortonworks.commailto:a...@hortonworks.com) wrote:

Andrew,

Thanks for bringing up this discussion.

I'm a little puzzled for I feel like we are rehashing the same discussion from 
last year - where we agreed on a different course of action w.r.t switch to 
JDK7.

IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly 
for users such as Yahoo/Twitter/eBay who have several clusters between which 
compatibility is paramount.

Now, breaking compatibility is perfectly fine over time where there is 
sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1).

However, I'm struggling to quantify the benefit of hadoop-3 for users for the 
cost of the breakage.

Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a 
fairly minor irritant given some existing solutions (e.g. a new default 
classloader), how do you quantify the benefit for users?

We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome 
to run the RM role for that release.

Furthermore, I'm really concerned that this will be used as an opportunity to 
further break compat in more egregious ways.

Also, are you foreseeing more compat breaks? OTOH, if we all agree that we 
should absolutely prevent compat breakages such as the client-server wire 
protocol, I feel the point of a major release is kinda lost.

Overall, my biggest concern is the compatibility story vis-a-vis the benefit.

Thoughts?

thanks,
Arun


From: Andrew Wang andrew.w...@cloudera.com
Sent: Monday, March 02, 2015 3:19 PM
To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew


Re: Looking to a Hadoop 3 release

2015-03-02 Thread Jean-Baptiste Onofré

+1

It sounds like a good idea, especially regarding JDK.

Regards
JB

On 03/03/2015 12:19 AM, Andrew Wang wrote:

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Looking to a Hadoop 3 release

2015-03-02 Thread Andrew Wang
 Thanks as always for the feedback everyone. Some inline comments to Arun's
email, as his were the most extensive:


  Given that we already agreed to put in JDK7 in 2.7, and that the
 classpath is a fairly minor irritant given some existing solutions (e.g. a
 new default classloader), how do you quantify the benefit for users?

 I looked at our thread on this topic from last time, and we (meaning at
least myself and Tucu) agreed to a one-time exception to the JDK7 bump in
2.x for practical reasons. We waited for so long that we had some assurance
JDK6 was on the outs. Multiple distros also already had bumped their min
version to JDK7. This is not true this time around. Bumping the JDK version
is hugely impactful on the end user, and my email on the earlier thread
still reflects my thoughts on JDK compatibility:

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3CCAGB5D2a5fEDfBApQyER_zyhc8a4Xd_ea1wJSsxxkiAiDZO9%2BNg%40mail.gmail.com%3E

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Having the freedom to upgrade our dependencies at will would also be a big
win for us as developers.

 We could just do JDK8 in hadoop-2.10 or some such, you are definitely
 welcome to run the RM role for that release.

  Furthermore, I'm really concerned that this will be used as an
 opportunity to further break compat in more egregious ways.

  Also, are you foreseeing more compat breaks? OTOH, if we all agree that
 we should absolutely prevent compat breakages such as the client-server
 wire protocol, I feel the point of a major release is kinda lost.


Right now, the incompatible changes would be JDK8, classpath isolation, and
whatever is already in trunk. I can audit these existing trunk changes when
branch-3 is cut.

I would like to keep this list as short as possible, to preserve wire
compat and rolling upgrade. As far as major releases go, this is not one to
be scared of. However, since it's incompatible, it still needs that major
version bump.

Best,
Andrew

P.S. Vinod, the shell script rewrite is incompatible. Allen intentionally
excluded it from branch-2 for this reason.


Re: Looking to a Hadoop 3 release

2015-03-02 Thread Konstantin Shvachko
Andrew,

Hadoop 3 seems in general like a good idea to me.
1. I did not understand if you propose to release 3.0 instead of 2.7 or in
addition?
   I think 2.7 is needed at least as a stabilization step for the 2.x line.

2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
other versions. If that somehow beneficial for commercial vendors, which I
don't see how, for the community it was proven to be very disruptive. Would
be really good to avoid it this time.

3. Could we release Hadoop 3 directly from trunk? With a proper feature
freeze in advance. Current trunk is in the best working condition I've seen
in years - much better, than when hadoop-2 was coming to life. It could
make a good alpha.
I believe we can start planning 3.0 from trunk right after 2.7 is out.

Thanks,
--Konst

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew