Re: [DISCUSS] Branches and versions for Hadoop 3

2017-09-01 Thread Andrew Wang
Hi folks,

We've landed two of our beta1 features, S3Guard and TSv2, into trunk. Jian
just sent out the vote for our remaining beta1 feature, YARN native
services, but I think it's time to branch to unblock the resource profiles
merge to 3.1.

I'll cut just branch-3.0 for now, since we don't have anything urgent that
needs to go into 3.0.0-beta1 vs. 3.0.0 GA.

Cheers,
Andrew

On Tue, Aug 29, 2017 at 11:21 PM, varunsax...@apache.org <
varun.saxena.apa...@gmail.com> wrote:

> Hi Andrew,
>
> We have completed the merge of TSv2 to trunk.
> You can now go ahead with the branching.
>
> Regards,
> Varun Saxena.
>
> On Tue, Aug 29, 2017 at 11:35 PM, Andrew Wang 
> wrote:
>
>> Sure. Ping me when the TSv2 goes in, and I can take care of branching.
>>
>> We're still waiting on the native services and S3Guard merges, but I
>> don't want to hold branching to the last minute.
>>
>> On Tue, Aug 29, 2017 at 10:51 AM, Vrushali C 
>> wrote:
>>
>>> Hi Andrew,
>>> As Rohith mentioned, if you are good with it, from the TSv2 side, we are
>>> ready to go for merge tonight itself (Pacific time)  right after the voting
>>> period ends. Varun Saxena has been diligently rebasing up until now so most
>>> likely our merge should be reasonably straightforward.
>>>
>>> @Wangda: your resource profile vote ends tomorrow, could we please
>>> coordinate our merges?
>>>
>>> thanks
>>> Vrushali
>>>
>>>
>>> On Mon, Aug 28, 2017 at 10:45 PM, Rohith Sharma K S <
>>> rohithsharm...@apache.org> wrote:
>>>
 On 29 August 2017 at 06:24, Andrew Wang 
 wrote:

 > So far I've seen no -1's to the branching proposal, so I plan to
 execute
 > this tomorrow unless there's further feedback.
 >
 For on going branch merge threads i.e TSv2, voting will be closing
 tomorrow. Does it end up in merging into trunk(3.1.0-SNAPSHOT) and
 branch-3.0(3.0.0-beta1-SNAPSHOT) ? If so, would you be able to wait for
 couple of more days before creating branch-3.0 so that TSv2 branch merge
 would be done directly to trunk?



 >
 > Regarding the above discussion, I think Jason and I have essentially
 the
 > same opinion.
 >
 > I hope that keeping trunk a release branch means a higher bar for
 merges
 > and code review in general. In the past, I've seen some patches
 committed
 > to trunk-only as a way of passing responsibility to a future user or
 > reviewer. That doesn't help anyone; patches should be committed with
 the
 > intent of running them in production.
 >
 > I'd also like to repeat the above thanks to the many, many
 contributors
 > who've helped with release improvements. Allen's work on
 create-release and
 > automated changes and release notes were essential, as was Xiao's
 work on
 > LICENSE and NOTICE files. I'm also looking forward to Marton's site
 > improvements, which addresses one of the remaining sore spots in the
 > release process.
 >
 > Things have gotten smoother with each alpha we've done over the last
 year,
 > and it's a testament to everyone's work that we have a good
 probability of
 > shipping beta and GA later this year.
 >
 > Cheers,
 > Andrew
 >
 >

>>>
>>>
>>
>


YARN javadoc failures Re: [DISCUSS] Branches and versions for Hadoop 3

2017-09-01 Thread Allen Wittenauer

> On Aug 28, 2017, at 9:58 AM, Allen Wittenauer  
> wrote:
>   The automation only goes so far.  At least while investigating Yetus 
> bugs, I've seen more than enough blatant and purposeful ignored errors and 
> warnings that I'm not convinced it will be effective. ("That javadoc compile 
> failure didn't come from my patch!"  Um, yes, yes it did.) PR for features 
> has greatly trumped code correctness for a few years now.


I'm psychic.

Looks like YARN-6877 is crashing JDK8 javadoc.  Maven stops processing 
and errors out before even giving a build error/success. Reverting the patch 
makes things work again. Anyway, Yetus caught it, warned about it continuously, 
but it was still committed.  


-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-30 Thread varunsax...@apache.org
Hi Andrew,

We have completed the merge of TSv2 to trunk.
You can now go ahead with the branching.

Regards,
Varun Saxena.

On Tue, Aug 29, 2017 at 11:35 PM, Andrew Wang 
wrote:

> Sure. Ping me when the TSv2 goes in, and I can take care of branching.
>
> We're still waiting on the native services and S3Guard merges, but I don't
> want to hold branching to the last minute.
>
> On Tue, Aug 29, 2017 at 10:51 AM, Vrushali C 
> wrote:
>
>> Hi Andrew,
>> As Rohith mentioned, if you are good with it, from the TSv2 side, we are
>> ready to go for merge tonight itself (Pacific time)  right after the voting
>> period ends. Varun Saxena has been diligently rebasing up until now so most
>> likely our merge should be reasonably straightforward.
>>
>> @Wangda: your resource profile vote ends tomorrow, could we please
>> coordinate our merges?
>>
>> thanks
>> Vrushali
>>
>>
>> On Mon, Aug 28, 2017 at 10:45 PM, Rohith Sharma K S <
>> rohithsharm...@apache.org> wrote:
>>
>>> On 29 August 2017 at 06:24, Andrew Wang 
>>> wrote:
>>>
>>> > So far I've seen no -1's to the branching proposal, so I plan to
>>> execute
>>> > this tomorrow unless there's further feedback.
>>> >
>>> For on going branch merge threads i.e TSv2, voting will be closing
>>> tomorrow. Does it end up in merging into trunk(3.1.0-SNAPSHOT) and
>>> branch-3.0(3.0.0-beta1-SNAPSHOT) ? If so, would you be able to wait for
>>> couple of more days before creating branch-3.0 so that TSv2 branch merge
>>> would be done directly to trunk?
>>>
>>>
>>>
>>> >
>>> > Regarding the above discussion, I think Jason and I have essentially
>>> the
>>> > same opinion.
>>> >
>>> > I hope that keeping trunk a release branch means a higher bar for
>>> merges
>>> > and code review in general. In the past, I've seen some patches
>>> committed
>>> > to trunk-only as a way of passing responsibility to a future user or
>>> > reviewer. That doesn't help anyone; patches should be committed with
>>> the
>>> > intent of running them in production.
>>> >
>>> > I'd also like to repeat the above thanks to the many, many contributors
>>> > who've helped with release improvements. Allen's work on
>>> create-release and
>>> > automated changes and release notes were essential, as was Xiao's work
>>> on
>>> > LICENSE and NOTICE files. I'm also looking forward to Marton's site
>>> > improvements, which addresses one of the remaining sore spots in the
>>> > release process.
>>> >
>>> > Things have gotten smoother with each alpha we've done over the last
>>> year,
>>> > and it's a testament to everyone's work that we have a good
>>> probability of
>>> > shipping beta and GA later this year.
>>> >
>>> > Cheers,
>>> > Andrew
>>> >
>>> >
>>>
>>
>>
>


Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-29 Thread Andrew Wang
Hi Subru,

Basically we're amending the proposal from the original email in the chain
to also immediately create the branch-3.0.0-beta1 release branch. As
described in my 2017-08-25 wiki update, we're gating the merge of these two
features to branch-3.0 on additional testing,  but this keeps 3.0.0 open
for development.

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

For completeness, here's what our branches and versions would look like:

trunk: 3.1.0-SNAPSHOT
branch-3.0: 3.0.0-SNAPSHOT
branch-3.0.0-beta1: 3.0.0-beta1-SNAPSHOT
branch-2 and etc: remain as is

Best,
Andrew

On Tue, Aug 29, 2017 at 12:21 PM, Subramaniam V K 
wrote:

> Andrew,
>
> First up thanks for tirelessly pushing on 3.0 release.
>
> I am confused about your comment on creating 2 branches as my
> understanding of Jason's (and Vinod's) comments are that we defer creating
> branch-3?
>
> IMHO, we should consider creating branch-3 (necessary but not sufficient)
> only when we have:
>
>1. a significant incompatible change.
>2. a new feature that cannot be turned off without affecting core
>components.
>
> In summary, I feel we should follow a lazy rather than eager approach
> towards creating mainline branches.
>
> Thanks,
> Subru
>
>
>
> On Tue, Aug 29, 2017 at 11:45 AM, Wangda Tan  wrote:
>
>> Gotcha, make sense, so I will hold commit until you cut the two branches
>> and TSv2 get committed.
>>
>> Thanks,
>> Wangda
>>
>> On Tue, Aug 29, 2017 at 11:25 AM, Andrew Wang 
>> wrote:
>>
>> > Hi Wangda,
>> >
>> > I'll cut two branches: branch-3.0 (3.0.0-SNAPSHOT) and
>> branch-3.0.0-beta1
>> > (3.0.0-beta1-SNAPSHOT). This way we can merge GA features to branch-3.0
>> but
>> > not branch-3.0.0-beta1.
>> >
>> > Best,
>> > Andrew
>> >
>> > On Tue, Aug 29, 2017 at 11:18 AM, Wangda Tan 
>> wrote:
>> >
>> >> Vrushali,
>> >>
>> >> Sure we can wait TSv2 merged before merge resource profile branch.
>> >>
>> >> Andrew,
>> >>
>> >> My understanding is you're going to cut branch-3.0 for 3.0-beta1, and
>> the
>> >> same branch (branch-3.0) will be used for 3.0-GA as well. So my
>> question
>> >> is, there're several features (TSv2, resource profile, YARN-5734) are
>> >> targeted to merge to 3.0-GA but not 3.0-beta1, which branch we should
>> >> commit to, and when we can commit? Also, similar to 3.0.0-alpha1 to 4,
>> you
>> >> will cut branch-3.0.0-beta1, correct?
>> >>
>> >> Thanks,
>> >> Wangda
>> >>
>> >>
>> >> On Tue, Aug 29, 2017 at 11:05 AM, Andrew Wang <
>> andrew.w...@cloudera.com>
>> >> wrote:
>> >>
>> >>> Sure. Ping me when the TSv2 goes in, and I can take care of branching.
>> >>>
>> >>> We're still waiting on the native services and S3Guard merges, but I
>> >>> don't want to hold branching to the last minute.
>> >>>
>> >>> On Tue, Aug 29, 2017 at 10:51 AM, Vrushali C > >
>> >>> wrote:
>> >>>
>>  Hi Andrew,
>>  As Rohith mentioned, if you are good with it, from the TSv2 side, we
>>  are ready to go for merge tonight itself (Pacific time)  right after
>> the
>>  voting period ends. Varun Saxena has been diligently rebasing up
>> until now
>>  so most likely our merge should be reasonably straightforward.
>> 
>>  @Wangda: your resource profile vote ends tomorrow, could we please
>>  coordinate our merges?
>> 
>>  thanks
>>  Vrushali
>> 
>> 
>>  On Mon, Aug 28, 2017 at 10:45 PM, Rohith Sharma K S <
>>  rohithsharm...@apache.org> wrote:
>> 
>> > On 29 August 2017 at 06:24, Andrew Wang 
>> > wrote:
>> >
>> > > So far I've seen no -1's to the branching proposal, so I plan to
>> > execute
>> > > this tomorrow unless there's further feedback.
>> > >
>> > For on going branch merge threads i.e TSv2, voting will be closing
>> > tomorrow. Does it end up in merging into trunk(3.1.0-SNAPSHOT) and
>> > branch-3.0(3.0.0-beta1-SNAPSHOT) ? If so, would you be able to wait
>> > for
>> > couple of more days before creating branch-3.0 so that TSv2 branch
>> > merge
>> > would be done directly to trunk?
>> >
>> >
>> >
>> > >
>> > > Regarding the above discussion, I think Jason and I have
>> essentially
>> > the
>> > > same opinion.
>> > >
>> > > I hope that keeping trunk a release branch means a higher bar for
>> > merges
>> > > and code review in general. In the past, I've seen some patches
>> > committed
>> > > to trunk-only as a way of passing responsibility to a future user
>> or
>> > > reviewer. That doesn't help anyone; patches should be committed
>> with
>> > the
>> > > intent of running them in production.
>> > >
>> > > I'd also like to repeat the above thanks to the many, many
>> > contributors
>> > > who've helped with release improvements. Allen's work on
>> > create-release 

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-29 Thread Subramaniam V K
Andrew,

First up thanks for tirelessly pushing on 3.0 release.

I am confused about your comment on creating 2 branches as my understanding
of Jason's (and Vinod's) comments are that we defer creating branch-3?

IMHO, we should consider creating branch-3 (necessary but not sufficient)
only when we have:

   1. a significant incompatible change.
   2. a new feature that cannot be turned off without affecting core
   components.

In summary, I feel we should follow a lazy rather than eager approach
towards creating mainline branches.

Thanks,
Subru



On Tue, Aug 29, 2017 at 11:45 AM, Wangda Tan  wrote:

> Gotcha, make sense, so I will hold commit until you cut the two branches
> and TSv2 get committed.
>
> Thanks,
> Wangda
>
> On Tue, Aug 29, 2017 at 11:25 AM, Andrew Wang 
> wrote:
>
> > Hi Wangda,
> >
> > I'll cut two branches: branch-3.0 (3.0.0-SNAPSHOT) and branch-3.0.0-beta1
> > (3.0.0-beta1-SNAPSHOT). This way we can merge GA features to branch-3.0
> but
> > not branch-3.0.0-beta1.
> >
> > Best,
> > Andrew
> >
> > On Tue, Aug 29, 2017 at 11:18 AM, Wangda Tan 
> wrote:
> >
> >> Vrushali,
> >>
> >> Sure we can wait TSv2 merged before merge resource profile branch.
> >>
> >> Andrew,
> >>
> >> My understanding is you're going to cut branch-3.0 for 3.0-beta1, and
> the
> >> same branch (branch-3.0) will be used for 3.0-GA as well. So my question
> >> is, there're several features (TSv2, resource profile, YARN-5734) are
> >> targeted to merge to 3.0-GA but not 3.0-beta1, which branch we should
> >> commit to, and when we can commit? Also, similar to 3.0.0-alpha1 to 4,
> you
> >> will cut branch-3.0.0-beta1, correct?
> >>
> >> Thanks,
> >> Wangda
> >>
> >>
> >> On Tue, Aug 29, 2017 at 11:05 AM, Andrew Wang  >
> >> wrote:
> >>
> >>> Sure. Ping me when the TSv2 goes in, and I can take care of branching.
> >>>
> >>> We're still waiting on the native services and S3Guard merges, but I
> >>> don't want to hold branching to the last minute.
> >>>
> >>> On Tue, Aug 29, 2017 at 10:51 AM, Vrushali C 
> >>> wrote:
> >>>
>  Hi Andrew,
>  As Rohith mentioned, if you are good with it, from the TSv2 side, we
>  are ready to go for merge tonight itself (Pacific time)  right after
> the
>  voting period ends. Varun Saxena has been diligently rebasing up
> until now
>  so most likely our merge should be reasonably straightforward.
> 
>  @Wangda: your resource profile vote ends tomorrow, could we please
>  coordinate our merges?
> 
>  thanks
>  Vrushali
> 
> 
>  On Mon, Aug 28, 2017 at 10:45 PM, Rohith Sharma K S <
>  rohithsharm...@apache.org> wrote:
> 
> > On 29 August 2017 at 06:24, Andrew Wang 
> > wrote:
> >
> > > So far I've seen no -1's to the branching proposal, so I plan to
> > execute
> > > this tomorrow unless there's further feedback.
> > >
> > For on going branch merge threads i.e TSv2, voting will be closing
> > tomorrow. Does it end up in merging into trunk(3.1.0-SNAPSHOT) and
> > branch-3.0(3.0.0-beta1-SNAPSHOT) ? If so, would you be able to wait
> > for
> > couple of more days before creating branch-3.0 so that TSv2 branch
> > merge
> > would be done directly to trunk?
> >
> >
> >
> > >
> > > Regarding the above discussion, I think Jason and I have
> essentially
> > the
> > > same opinion.
> > >
> > > I hope that keeping trunk a release branch means a higher bar for
> > merges
> > > and code review in general. In the past, I've seen some patches
> > committed
> > > to trunk-only as a way of passing responsibility to a future user
> or
> > > reviewer. That doesn't help anyone; patches should be committed
> with
> > the
> > > intent of running them in production.
> > >
> > > I'd also like to repeat the above thanks to the many, many
> > contributors
> > > who've helped with release improvements. Allen's work on
> > create-release and
> > > automated changes and release notes were essential, as was Xiao's
> > work on
> > > LICENSE and NOTICE files. I'm also looking forward to Marton's site
> > > improvements, which addresses one of the remaining sore spots in
> the
> > > release process.
> > >
> > > Things have gotten smoother with each alpha we've done over the
> last
> > year,
> > > and it's a testament to everyone's work that we have a good
> > probability of
> > > shipping beta and GA later this year.
> > >
> > > Cheers,
> > > Andrew
> > >
> > >
> >
> 
> 
> >>>
> >>
> >
>


Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-29 Thread Andrew Wang
Hi Wangda,

I'll cut two branches: branch-3.0 (3.0.0-SNAPSHOT) and branch-3.0.0-beta1
(3.0.0-beta1-SNAPSHOT). This way we can merge GA features to branch-3.0 but
not branch-3.0.0-beta1.

Best,
Andrew

On Tue, Aug 29, 2017 at 11:18 AM, Wangda Tan  wrote:

> Vrushali,
>
> Sure we can wait TSv2 merged before merge resource profile branch.
>
> Andrew,
>
> My understanding is you're going to cut branch-3.0 for 3.0-beta1, and the
> same branch (branch-3.0) will be used for 3.0-GA as well. So my question
> is, there're several features (TSv2, resource profile, YARN-5734) are
> targeted to merge to 3.0-GA but not 3.0-beta1, which branch we should
> commit to, and when we can commit? Also, similar to 3.0.0-alpha1 to 4, you
> will cut branch-3.0.0-beta1, correct?
>
> Thanks,
> Wangda
>
>
> On Tue, Aug 29, 2017 at 11:05 AM, Andrew Wang 
> wrote:
>
>> Sure. Ping me when the TSv2 goes in, and I can take care of branching.
>>
>> We're still waiting on the native services and S3Guard merges, but I
>> don't want to hold branching to the last minute.
>>
>> On Tue, Aug 29, 2017 at 10:51 AM, Vrushali C 
>> wrote:
>>
>>> Hi Andrew,
>>> As Rohith mentioned, if you are good with it, from the TSv2 side, we are
>>> ready to go for merge tonight itself (Pacific time)  right after the voting
>>> period ends. Varun Saxena has been diligently rebasing up until now so most
>>> likely our merge should be reasonably straightforward.
>>>
>>> @Wangda: your resource profile vote ends tomorrow, could we please
>>> coordinate our merges?
>>>
>>> thanks
>>> Vrushali
>>>
>>>
>>> On Mon, Aug 28, 2017 at 10:45 PM, Rohith Sharma K S <
>>> rohithsharm...@apache.org> wrote:
>>>
 On 29 August 2017 at 06:24, Andrew Wang 
 wrote:

 > So far I've seen no -1's to the branching proposal, so I plan to
 execute
 > this tomorrow unless there's further feedback.
 >
 For on going branch merge threads i.e TSv2, voting will be closing
 tomorrow. Does it end up in merging into trunk(3.1.0-SNAPSHOT) and
 branch-3.0(3.0.0-beta1-SNAPSHOT) ? If so, would you be able to wait for
 couple of more days before creating branch-3.0 so that TSv2 branch merge
 would be done directly to trunk?



 >
 > Regarding the above discussion, I think Jason and I have essentially
 the
 > same opinion.
 >
 > I hope that keeping trunk a release branch means a higher bar for
 merges
 > and code review in general. In the past, I've seen some patches
 committed
 > to trunk-only as a way of passing responsibility to a future user or
 > reviewer. That doesn't help anyone; patches should be committed with
 the
 > intent of running them in production.
 >
 > I'd also like to repeat the above thanks to the many, many
 contributors
 > who've helped with release improvements. Allen's work on
 create-release and
 > automated changes and release notes were essential, as was Xiao's
 work on
 > LICENSE and NOTICE files. I'm also looking forward to Marton's site
 > improvements, which addresses one of the remaining sore spots in the
 > release process.
 >
 > Things have gotten smoother with each alpha we've done over the last
 year,
 > and it's a testament to everyone's work that we have a good
 probability of
 > shipping beta and GA later this year.
 >
 > Cheers,
 > Andrew
 >
 >

>>>
>>>
>>
>


Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-29 Thread Wangda Tan
Vrushali,

Sure we can wait TSv2 merged before merge resource profile branch.

Andrew,

My understanding is you're going to cut branch-3.0 for 3.0-beta1, and the
same branch (branch-3.0) will be used for 3.0-GA as well. So my question
is, there're several features (TSv2, resource profile, YARN-5734) are
targeted to merge to 3.0-GA but not 3.0-beta1, which branch we should
commit to, and when we can commit? Also, similar to 3.0.0-alpha1 to 4, you
will cut branch-3.0.0-beta1, correct?

Thanks,
Wangda


On Tue, Aug 29, 2017 at 11:05 AM, Andrew Wang 
wrote:

> Sure. Ping me when the TSv2 goes in, and I can take care of branching.
>
> We're still waiting on the native services and S3Guard merges, but I don't
> want to hold branching to the last minute.
>
> On Tue, Aug 29, 2017 at 10:51 AM, Vrushali C 
> wrote:
>
>> Hi Andrew,
>> As Rohith mentioned, if you are good with it, from the TSv2 side, we are
>> ready to go for merge tonight itself (Pacific time)  right after the voting
>> period ends. Varun Saxena has been diligently rebasing up until now so most
>> likely our merge should be reasonably straightforward.
>>
>> @Wangda: your resource profile vote ends tomorrow, could we please
>> coordinate our merges?
>>
>> thanks
>> Vrushali
>>
>>
>> On Mon, Aug 28, 2017 at 10:45 PM, Rohith Sharma K S <
>> rohithsharm...@apache.org> wrote:
>>
>>> On 29 August 2017 at 06:24, Andrew Wang 
>>> wrote:
>>>
>>> > So far I've seen no -1's to the branching proposal, so I plan to
>>> execute
>>> > this tomorrow unless there's further feedback.
>>> >
>>> For on going branch merge threads i.e TSv2, voting will be closing
>>> tomorrow. Does it end up in merging into trunk(3.1.0-SNAPSHOT) and
>>> branch-3.0(3.0.0-beta1-SNAPSHOT) ? If so, would you be able to wait for
>>> couple of more days before creating branch-3.0 so that TSv2 branch merge
>>> would be done directly to trunk?
>>>
>>>
>>>
>>> >
>>> > Regarding the above discussion, I think Jason and I have essentially
>>> the
>>> > same opinion.
>>> >
>>> > I hope that keeping trunk a release branch means a higher bar for
>>> merges
>>> > and code review in general. In the past, I've seen some patches
>>> committed
>>> > to trunk-only as a way of passing responsibility to a future user or
>>> > reviewer. That doesn't help anyone; patches should be committed with
>>> the
>>> > intent of running them in production.
>>> >
>>> > I'd also like to repeat the above thanks to the many, many contributors
>>> > who've helped with release improvements. Allen's work on
>>> create-release and
>>> > automated changes and release notes were essential, as was Xiao's work
>>> on
>>> > LICENSE and NOTICE files. I'm also looking forward to Marton's site
>>> > improvements, which addresses one of the remaining sore spots in the
>>> > release process.
>>> >
>>> > Things have gotten smoother with each alpha we've done over the last
>>> year,
>>> > and it's a testament to everyone's work that we have a good
>>> probability of
>>> > shipping beta and GA later this year.
>>> >
>>> > Cheers,
>>> > Andrew
>>> >
>>> >
>>>
>>
>>
>


Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-29 Thread Andrew Wang
Sure. Ping me when the TSv2 goes in, and I can take care of branching.

We're still waiting on the native services and S3Guard merges, but I don't
want to hold branching to the last minute.

On Tue, Aug 29, 2017 at 10:51 AM, Vrushali C 
wrote:

> Hi Andrew,
> As Rohith mentioned, if you are good with it, from the TSv2 side, we are
> ready to go for merge tonight itself (Pacific time)  right after the voting
> period ends. Varun Saxena has been diligently rebasing up until now so most
> likely our merge should be reasonably straightforward.
>
> @Wangda: your resource profile vote ends tomorrow, could we please
> coordinate our merges?
>
> thanks
> Vrushali
>
>
> On Mon, Aug 28, 2017 at 10:45 PM, Rohith Sharma K S <
> rohithsharm...@apache.org> wrote:
>
>> On 29 August 2017 at 06:24, Andrew Wang  wrote:
>>
>> > So far I've seen no -1's to the branching proposal, so I plan to execute
>> > this tomorrow unless there's further feedback.
>> >
>> For on going branch merge threads i.e TSv2, voting will be closing
>> tomorrow. Does it end up in merging into trunk(3.1.0-SNAPSHOT) and
>> branch-3.0(3.0.0-beta1-SNAPSHOT) ? If so, would you be able to wait for
>> couple of more days before creating branch-3.0 so that TSv2 branch merge
>> would be done directly to trunk?
>>
>>
>>
>> >
>> > Regarding the above discussion, I think Jason and I have essentially the
>> > same opinion.
>> >
>> > I hope that keeping trunk a release branch means a higher bar for merges
>> > and code review in general. In the past, I've seen some patches
>> committed
>> > to trunk-only as a way of passing responsibility to a future user or
>> > reviewer. That doesn't help anyone; patches should be committed with the
>> > intent of running them in production.
>> >
>> > I'd also like to repeat the above thanks to the many, many contributors
>> > who've helped with release improvements. Allen's work on create-release
>> and
>> > automated changes and release notes were essential, as was Xiao's work
>> on
>> > LICENSE and NOTICE files. I'm also looking forward to Marton's site
>> > improvements, which addresses one of the remaining sore spots in the
>> > release process.
>> >
>> > Things have gotten smoother with each alpha we've done over the last
>> year,
>> > and it's a testament to everyone's work that we have a good probability
>> of
>> > shipping beta and GA later this year.
>> >
>> > Cheers,
>> > Andrew
>> >
>> >
>>
>
>


Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-29 Thread Vrushali C
Hi Andrew,
As Rohith mentioned, if you are good with it, from the TSv2 side, we are
ready to go for merge tonight itself (Pacific time)  right after the voting
period ends. Varun Saxena has been diligently rebasing up until now so most
likely our merge should be reasonably straightforward.

@Wangda: your resource profile vote ends tomorrow, could we please
coordinate our merges?

thanks
Vrushali


On Mon, Aug 28, 2017 at 10:45 PM, Rohith Sharma K S <
rohithsharm...@apache.org> wrote:

> On 29 August 2017 at 06:24, Andrew Wang  wrote:
>
> > So far I've seen no -1's to the branching proposal, so I plan to execute
> > this tomorrow unless there's further feedback.
> >
> For on going branch merge threads i.e TSv2, voting will be closing
> tomorrow. Does it end up in merging into trunk(3.1.0-SNAPSHOT) and
> branch-3.0(3.0.0-beta1-SNAPSHOT) ? If so, would you be able to wait for
> couple of more days before creating branch-3.0 so that TSv2 branch merge
> would be done directly to trunk?
>
>
>
> >
> > Regarding the above discussion, I think Jason and I have essentially the
> > same opinion.
> >
> > I hope that keeping trunk a release branch means a higher bar for merges
> > and code review in general. In the past, I've seen some patches committed
> > to trunk-only as a way of passing responsibility to a future user or
> > reviewer. That doesn't help anyone; patches should be committed with the
> > intent of running them in production.
> >
> > I'd also like to repeat the above thanks to the many, many contributors
> > who've helped with release improvements. Allen's work on create-release
> and
> > automated changes and release notes were essential, as was Xiao's work on
> > LICENSE and NOTICE files. I'm also looking forward to Marton's site
> > improvements, which addresses one of the remaining sore spots in the
> > release process.
> >
> > Things have gotten smoother with each alpha we've done over the last
> year,
> > and it's a testament to everyone's work that we have a good probability
> of
> > shipping beta and GA later this year.
> >
> > Cheers,
> > Andrew
> >
> >
>


Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Rohith Sharma K S
On 29 August 2017 at 06:24, Andrew Wang  wrote:

> So far I've seen no -1's to the branching proposal, so I plan to execute
> this tomorrow unless there's further feedback.
>
For on going branch merge threads i.e TSv2, voting will be closing
tomorrow. Does it end up in merging into trunk(3.1.0-SNAPSHOT) and
branch-3.0(3.0.0-beta1-SNAPSHOT) ? If so, would you be able to wait for
couple of more days before creating branch-3.0 so that TSv2 branch merge
would be done directly to trunk?



>
> Regarding the above discussion, I think Jason and I have essentially the
> same opinion.
>
> I hope that keeping trunk a release branch means a higher bar for merges
> and code review in general. In the past, I've seen some patches committed
> to trunk-only as a way of passing responsibility to a future user or
> reviewer. That doesn't help anyone; patches should be committed with the
> intent of running them in production.
>
> I'd also like to repeat the above thanks to the many, many contributors
> who've helped with release improvements. Allen's work on create-release and
> automated changes and release notes were essential, as was Xiao's work on
> LICENSE and NOTICE files. I'm also looking forward to Marton's site
> improvements, which addresses one of the remaining sore spots in the
> release process.
>
> Things have gotten smoother with each alpha we've done over the last year,
> and it's a testament to everyone's work that we have a good probability of
> shipping beta and GA later this year.
>
> Cheers,
> Andrew
>
>


Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Andrew Wang
So far I've seen no -1's to the branching proposal, so I plan to execute
this tomorrow unless there's further feedback.

Regarding the above discussion, I think Jason and I have essentially the
same opinion.

I hope that keeping trunk a release branch means a higher bar for merges
and code review in general. In the past, I've seen some patches committed
to trunk-only as a way of passing responsibility to a future user or
reviewer. That doesn't help anyone; patches should be committed with the
intent of running them in production.

I'd also like to repeat the above thanks to the many, many contributors
who've helped with release improvements. Allen's work on create-release and
automated changes and release notes were essential, as was Xiao's work on
LICENSE and NOTICE files. I'm also looking forward to Marton's site
improvements, which addresses one of the remaining sore spots in the
release process.

Things have gotten smoother with each alpha we've done over the last year,
and it's a testament to everyone's work that we have a good probability of
shipping beta and GA later this year.

Cheers,
Andrew

On Mon, Aug 28, 2017 at 3:48 PM, Colin McCabe  wrote:

> On Mon, Aug 28, 2017, at 14:22, Allen Wittenauer wrote:
> >
> > > On Aug 28, 2017, at 12:41 PM, Jason Lowe  wrote:
> > >
> > > I think this gets back to the "if it's worth committing" part.
> >
> >   This brings us back to my original question:
> >
> >   "Doesn't this place an undue burden on the contributor with the
> first incompatible patch to prove worthiness?  What happens if it is
> decided that it's not good enough?"
>
> I feel like this line of argument is flawed by definition.  "What
> happens if the patch isn't worth breaking compatibility over"?  Then we
> shouldn't break compatibility over it.  We all know that most
> compatibility breaks are avoidable with enough effort.  And it's an
> effort we should make, for the good of our users.
>
> Most useful features can be implemented without compatibility breaks.
> And for the few that truly can't, the community should surely agree that
> it's worth breaking compatibility before we do it.  If it's a really
> cool feature, that approval will surely not be hard to get (I'm tempted
> to quote your earlier email about how much we love features...)
>
> >
> >   The answer, if I understand your position, is then at least a
> maybe leaning towards yes: a patch that prior to this branching policy
> change that  would have gone in without any notice now has a higher burden
> (i.e., major feature) to prove worthiness ... and in the process eliminates
> a whole class of contributors and empowers others. Thus my concern ...
> >
> > > As you mentioned, people are already breaking compatibility left and
> right as it is, which is why I wondered if it was really any better in
> practice.  Personally I'd rather find out about a major breakage sooner
> than later, since if trunk remains an active area of development at all
> times it's more likely the community will sit up and take notice when
> something crazy goes in.  In the past, trunk was not really an actively
> deployed area for over 5 years, and all sorts of stuff went in without
> people really being aware of it.
> >
> >   Given the general acknowledgement that the compatibility
> guidelines are mostly useless in reality, maybe the answer is really that
> we're doing releases all wrong.  Would it necessarily be a bad thing if we
> moved to a model where incompatible changes gradually released instead of
> one big one every seven?
>
> I haven't seen anyone "acknowledge that... compatibility guidelines are
> mostly useless"... even you.  Reading your posts from the past, I don't
> get that impression.  On the contrary, you are often upset about
> compatibility breakages.
>
> What would be positive about allowing compatibility breaks in minor
> releases?  Can you give a specific example of what would be improved?
>
> >
> >   Yes, I lived through the "walking on glass" days at Yahoo! and
> realize what I'm saying.  But I also think the rate of incompatible changes
> has slowed tremendously.  Entire groups of APIs aren't getting tossed out
> every week anymore.
> >
> > > It sounds like we agree on that part but disagree on the specifics of
> how to help trunk remain active.
> >
> >   Yup, and there is nothing wrong with that. ;)
> >
> > >  Given that historically trunk has languished for years I was hoping
> this proposal would help reduce the likelihood of it happening again.  If
> we eventually decide that cutting branch-3 now makes more sense then I'll
> do what I can to make that work well, but it would be good to see concrete
> proposals on how to avoid the problems we had with it over the last 6 years.
> >
> >
> >   Yup, agree. But proposals rarely seem to get much actual traction.
> (It's kind of fun reading the Hadoop bylaws and compatibility guidelines
> and old [VOTE] threads to realize 

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Colin McCabe
On Mon, Aug 28, 2017, at 14:22, Allen Wittenauer wrote:
> 
> > On Aug 28, 2017, at 12:41 PM, Jason Lowe  wrote:
> > 
> > I think this gets back to the "if it's worth committing" part.
> 
>   This brings us back to my original question:
> 
>   "Doesn't this place an undue burden on the contributor with the first 
> incompatible patch to prove worthiness?  What happens if it is decided that 
> it's not good enough?"

I feel like this line of argument is flawed by definition.  "What
happens if the patch isn't worth breaking compatibility over"?  Then we
shouldn't break compatibility over it.  We all know that most
compatibility breaks are avoidable with enough effort.  And it's an
effort we should make, for the good of our users.

Most useful features can be implemented without compatibility breaks. 
And for the few that truly can't, the community should surely agree that
it's worth breaking compatibility before we do it.  If it's a really
cool feature, that approval will surely not be hard to get (I'm tempted
to quote your earlier email about how much we love features...)

> 
>   The answer, if I understand your position, is then at least a maybe 
> leaning towards yes: a patch that prior to this branching policy change that  
> would have gone in without any notice now has a higher burden (i.e., major 
> feature) to prove worthiness ... and in the process eliminates a whole class 
> of contributors and empowers others. Thus my concern ...
> 
> > As you mentioned, people are already breaking compatibility left and right 
> > as it is, which is why I wondered if it was really any better in practice.  
> > Personally I'd rather find out about a major breakage sooner than later, 
> > since if trunk remains an active area of development at all times it's more 
> > likely the community will sit up and take notice when something crazy goes 
> > in.  In the past, trunk was not really an actively deployed area for over 5 
> > years, and all sorts of stuff went in without people really being aware of 
> > it.
> 
>   Given the general acknowledgement that the compatibility guidelines are 
> mostly useless in reality, maybe the answer is really that we're doing 
> releases all wrong.  Would it necessarily be a bad thing if we moved to a 
> model where incompatible changes gradually released instead of one big one 
> every seven?

I haven't seen anyone "acknowledge that... compatibility guidelines are
mostly useless"... even you.  Reading your posts from the past, I don't
get that impression.  On the contrary, you are often upset about
compatibility breakages.

What would be positive about allowing compatibility breaks in minor
releases?  Can you give a specific example of what would be improved?

> 
>   Yes, I lived through the "walking on glass" days at Yahoo! and realize 
> what I'm saying.  But I also think the rate of incompatible changes has 
> slowed tremendously.  Entire groups of APIs aren't getting tossed out every 
> week anymore.
> 
> > It sounds like we agree on that part but disagree on the specifics of how 
> > to help trunk remain active.
> 
>   Yup, and there is nothing wrong with that. ;)
> 
> >  Given that historically trunk has languished for years I was hoping this 
> > proposal would help reduce the likelihood of it happening again.  If we 
> > eventually decide that cutting branch-3 now makes more sense then I'll do 
> > what I can to make that work well, but it would be good to see concrete 
> > proposals on how to avoid the problems we had with it over the last 6 years.
> 
> 
>   Yup, agree. But proposals rarely seem to get much actual traction. 
> (It's kind of fun reading the Hadoop bylaws and compatibility guidelines and 
> old [VOTE] threads to realize how much stuff doesn't actually happen despite 
> everyone generally agree that abc is a good idea.)  To circle back a bit, I 
> do also agree that automation has a role to play
> 
>Before anyone can accuse or imply me of being a hypocrite (and I'm 
> sure someone eventually will privately if not publicly), I'm sure some folks 
> don't realize I've been working on this set of problems from a different 
> angle for the past few years.
> 
>   There are a handful of people that know I was going to attempt to do a 
> 3.x release a few years ago. [Andrew basically beat me to it. :) ] But I ran 
> into the release process.  What a mess.  Way too much manual work, lots of 
> undocumented bits, violation of ASF rules(!) , etc, etc.  We've all heard the 
> complaints.
> 
>   My hypothesis:  if the release process itself is easier, then getting a 
> release based on trunk is easier too. The more we automate, the more 
> non-vendors ("non traditional release managers"?) will be willing to roll 
> releases.  The more people that feel comfortable rolling a release, the more 
> likelihood releases will happen.  The more likelihood of releases happening, 
> the greater chance trunk had of 

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Allen Wittenauer

> On Aug 28, 2017, at 12:41 PM, Jason Lowe  wrote:
> 
> I think this gets back to the "if it's worth committing" part.

This brings us back to my original question:

"Doesn't this place an undue burden on the contributor with the first 
incompatible patch to prove worthiness?  What happens if it is decided that 
it's not good enough?"

The answer, if I understand your position, is then at least a maybe 
leaning towards yes: a patch that prior to this branching policy change that  
would have gone in without any notice now has a higher burden (i.e., major 
feature) to prove worthiness ... and in the process eliminates a whole class of 
contributors and empowers others. Thus my concern ...

> As you mentioned, people are already breaking compatibility left and right as 
> it is, which is why I wondered if it was really any better in practice.  
> Personally I'd rather find out about a major breakage sooner than later, 
> since if trunk remains an active area of development at all times it's more 
> likely the community will sit up and take notice when something crazy goes 
> in.  In the past, trunk was not really an actively deployed area for over 5 
> years, and all sorts of stuff went in without people really being aware of it.

Given the general acknowledgement that the compatibility guidelines are 
mostly useless in reality, maybe the answer is really that we're doing releases 
all wrong.  Would it necessarily be a bad thing if we moved to a model where 
incompatible changes gradually released instead of one big one every seven?

Yes, I lived through the "walking on glass" days at Yahoo! and realize 
what I'm saying.  But I also think the rate of incompatible changes has slowed 
tremendously.  Entire groups of APIs aren't getting tossed out every week 
anymore.

> It sounds like we agree on that part but disagree on the specifics of how to 
> help trunk remain active.

Yup, and there is nothing wrong with that. ;)

>  Given that historically trunk has languished for years I was hoping this 
> proposal would help reduce the likelihood of it happening again.  If we 
> eventually decide that cutting branch-3 now makes more sense then I'll do 
> what I can to make that work well, but it would be good to see concrete 
> proposals on how to avoid the problems we had with it over the last 6 years.


Yup, agree. But proposals rarely seem to get much actual traction. 
(It's kind of fun reading the Hadoop bylaws and compatibility guidelines and 
old [VOTE] threads to realize how much stuff doesn't actually happen despite 
everyone generally agree that abc is a good idea.)  To circle back a bit, I do 
also agree that automation has a role to play

 Before anyone can accuse or imply me of being a hypocrite (and I'm 
sure someone eventually will privately if not publicly), I'm sure some folks 
don't realize I've been working on this set of problems from a different angle 
for the past few years.

There are a handful of people that know I was going to attempt to do a 
3.x release a few years ago. [Andrew basically beat me to it. :) ] But I ran 
into the release process.  What a mess.  Way too much manual work, lots of 
undocumented bits, violation of ASF rules(!) , etc, etc.  We've all heard the 
complaints.

My hypothesis:  if the release process itself is easier, then getting a 
release based on trunk is easier too. The more we automate, the more 
non-vendors ("non traditional release managers"?) will be willing to roll 
releases.  The more people that feel comfortable rolling a release, the more 
likelihood releases will happen.  The more likelihood of releases happening, 
the greater chance trunk had of getting out the door.

That turned into years worth of fixing and automating lots of stuff 
that was continual complained about but never fixed:  release notes, 
changes.txt, chunks of the build process, chunks of the release tar ball 
process, fixing consistency, etc.  Some of that became a part of Yetus, some of 
it didn't.  Some of that work leaked into branch-2 at some point. Many probably 
don't know why this stuff was happening.  Then there were the people that 
claimed I was "wasting my time" and that I should be focusing on "more 
important" things.  (Press release features, I'm assuming.)

So, yes, I'd like to see proposals, but I'd also like to challenge the 
community at large to spend more time on these build processes.  There's a 
tremendous amount of cruft and our usage of maven is still nearly primordial in 
implementation. (Shout out to Marton Elek who has some great although ambitious 
ideas.)  

Also kudos to Andrew for putting create-release and a lot of my other 
changes through their paces in the early days.  When he publicly stepped up to 
do the release, I don't know if he realized what he was walking into... 

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Vinod Kumar Vavilapalli
+1 to Andrew’s proposal for 3.x releases.

We had fairly elaborate threads on this branching & compatibility topic before. 
One of them’s here: [1]

+1 to what Jason said.
 (a) Incompatible changes are not to be treated lightly.  We need to stop 
breaking stuff and ‘just dump it on trunk'.
 (b) Major versions are expensive. We should hesitate before asking our users 
to move from 2.0 to 3.0 or 3.0 to 4.0 (with incompatible changes) *without* any 
other major value proposition.

Some of the incompatible changes can clear wait while others cannot and so may 
mandate a major release. What are the some of the common types of incompatible 
changes?
 - Renaming APIs, removing deprecated APIs, renaming configuration properties, 
changing the default value of a configuration, changing shell output / logging 
etc:
— Today, we do this on trunk even though the actual effort involved is very 
minimal compared to the overhead it forces in maintaining incompatible trunk.
 - Dependency library updates - updating guava, protobuf etc in Hadoop breaks 
upstreaming applications. I am assuming Classpath Isolation [2] is still a 
blocker for 3.0 GA.
 - JDK upgrades: We tried two different ways with JDK 7 and JDK 8, we need a 
formal policy on this.

If we can managing the above common breaking changes, we can cause less pain to 
our end users.

Here’s what we can do for 3.x / 4.x specifically.
 - Stay on trunk based 3.x releases
 - Avoid all incompatible changes as much as possible
 - If we run into a bunch of minor incompatible changes that have be done, we 
either (a) make the incompatible behavior optional or (b) just park them say 
with an parked-incompatible-change label if making it optional is not possible
 - We create a 4.0 only when (a) we hit the first major incompatible change 
because a major next-step for Hadoop needs it (for e.g. Erasure Coding), and/or 
(b) the number of parked incompatible changes passes a certain threshold. 
Unlike Jason, I don’t see the threshold to be 1 for cases that don’t fit (1).

References
 [1] Looking to a Hadoop 3 release: 
http://markmail.org/thread/2daldggjaeewdmdf#query:+page:1+mid:m6x73t6srlchywsn+state:results
 

 [2] Classpath isolation for downstream client: 
https://issues.apache.org/jira/browse/HADOOP-11656 


Thanks
+Vinod

> On Aug 25, 2017, at 1:23 PM, Jason Lowe  wrote:
> 
> Allen Wittenauer wrote:
> 
> 
>> Doesn't this place an undue burden on the contributor with the first
>> incompatible patch to prove worthiness?  What happens if it is decided that
>> it's not good enough?
> 
> 
> It is a burden for that first, "this can't go anywhere else but 4.x"
> change, but arguably that should not be a change done lightly anyway.  (Or
> any other backwards-incompatible change for that matter.)  If it's worth
> committing then I think it's perfectly reasonable to send out the dev
> announce that there's reason for trunk to diverge from 3.x, cut branch-3,
> and move on.  This is no different than Andrew's recent announcement that
> there's now a need for separating trunk and the 3.0 line based on what's
> about to go in.
> 
> I do not think it makes sense to pay for the maintenance overhead of two
> nearly-identical lines with no backwards-incompatible changes between them
> until we have the need.  Otherwise if past trunk behavior is any
> indication, it ends up mostly enabling people to commit to just trunk,
> forgetting that the thing they are committing is perfectly valid for
> branch-3.  If we can agree that trunk and branch-3 should be equivalent
> until an incompatible change goes into trunk, why pay for the commit
> overhead and potential for accidentally missed commits until it is really
> necessary?
> 
> How many will it take before the dam will break?  Or is there a timeline
>> going to be given before trunk gets set to 4.x?
> 
> 
> I think the threshold count for the dam should be 1.  As soon as we have a
> JIRA that needs to be committed to move the project forward and we cannot
> ship it in a 3.x release then we create branch-3 and move trunk to 4.x.
> As for a timeline going to 4.x, again I don't see it so much as a "baking
> period" as a "when we need it" criteria.  If we need it in a week then we
> should cut it in a week.  Or a year then a year.  It all depends upon when
> that 4.x-only change is ready to go in.
> 
> Given the number of committers that openly ignore discussions like this,
>> who is going to verify that incompatible changes don't get in?
>> 
> 
> The same entities who are verifying other bugs don't get in, i.e.: the
> committers and the Hadoop QA bot running the tests.  Yes, I know that means
> it's inevitable that compatibility breakages will happen, and we can and
> should improve the automation around compatibility testing when possible.
> But I don't think there's a magic 

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Jason Lowe
Allen Wittenauer wrote:


> > On Aug 25, 2017, at 1:23 PM, Jason Lowe  wrote:
> >
> > Allen Wittenauer wrote:
> >
> > > Doesn't this place an undue burden on the contributor with the first
> incompatible patch to prove worthiness?  What happens if it is decided that
> it's not good enough?
> >
> > It is a burden for that first, "this can't go anywhere else but 4.x"
> change, but arguably that should not be a change done lightly anyway.  (Or
> any other backwards-incompatible change for that matter.)  If it's worth
> committing then I think it's perfectly reasonable to send out the dev
> announce that there's reason for trunk to diverge from 3.x, cut branch-3,
> and move on.  This is no different than Andrew's recent announcement that
> there's now a need for separating trunk and the 3.0 line based on what's
> about to go in.
>
> So, by this definition as soon as a patch comes in to remove
> deprecated bits there will be no issue with a branch-3 getting created,
> correct?
>

I think this gets back to the "if it's worth committing" part.  I feel the
community should collectively decide when it's worth taking the hit to
maintain the separate code line.  IMHO removing deprecated bits alone is
not reason enough to diverge the code base and the additional maintenance
that comes along with the extra code line.  A new feature is traditionally
the reason to diverge because that's something users would actually care
enough about to take the compatibility hit when moving to the version that
has it.  That also helps drive a timely release of the new code line
because users want the feature that went into it.


> >  Otherwise if past trunk behavior is any indication, it ends up mostly
> enabling people to commit to just trunk, forgetting that the thing they are
> committing is perfectly valid for branch-3.
>
> I'm not sure there was any "forgetting" involved.  We likely
> wouldn't be talking about 3.x at all if it wasn't for the code diverging
> enough.
>

I don't think it was the myriad of small patches that went only into trunk
over the last 6 years that drove this.  Instead I think it was simply that
an "important enough" feature went in, like erasure coding, that gathered
momentum behind this release.  Trunk sat ignored for basically 5+ years,
and plenty of patches went into just trunk that should have gone into at
least branch-2 as well.  I don't think we as a community did the
contributors any favors by putting their changes into a code line that
didn't see a release for a very long time.  Yes 3.x could have released
sooner to help solve that issue, but given the complete lack of excitement
around 3.x until just recently is there any reason this won't happen again
with 4.x?  Seems to me 4.x will need to have something "interesting enough"
to drive people to release it relative to 3.x, which to me indicates we
shouldn't commit things only to there until we have an interest to do so.

> > Given the number of committers that openly ignore discussions like
> this, who is going to verify that incompatible changes don't get in?
> >
> > The same entities who are verifying other bugs don't get in, i.e.: the
> committers and the Hadoop QA bot running the tests.
> >  Yes, I know that means it's inevitable that compatibility breakages
> will happen, and we can and should improve the automation around
> compatibility testing when possible.
>
> The automation only goes so far.  At least while investigating
> Yetus bugs, I've seen more than enough blatant and purposeful ignored
> errors and warnings that I'm not convinced it will be effective. ("That
> javadoc compile failure didn't come from my patch!"  Um, yes, yes it did.)
> PR for features has greatly trumped code correctness for a few years now.
>

I totally agree here.  We can and should do better about this outside of
automation.  I brought up automation since I see it as a useful part of the
total solution along with better developer education, oversight, etc.  I'm
thinking specifically about tools that can report on public API signature
changes, but that's just one aspect of compatibility.  Semantic behavior is
not something a static analysis tool can automatically detect, and the only
way to automate some of that is something like end-to-end compatibility
testing.  Bigtop may cover some of this with testing of older versions of
downstream projects like HBase, Hive, Oozie, etc., and we could setup some
tests that standup two different Hadoop clusters and run tests that verify
interop between them.  But the tests will never be exhaustive and we will
still need educated committers and oversight to fill in the gaps.

>  But I don't think there's a magic bullet for preventing all
> compatibility bugs from being introduced, just like there isn't one for
> preventing general bugs.  Does having a trunk branch separate but
> essentially similar to branch-3 make this any better?
>
> Yes: it's been the process for over a decade 

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Allen Wittenauer

> On Aug 25, 2017, at 1:23 PM, Jason Lowe  wrote:
> 
> Allen Wittenauer wrote:
>  
> > Doesn't this place an undue burden on the contributor with the first 
> > incompatible patch to prove worthiness?  What happens if it is decided that 
> > it's not good enough?
> 
> It is a burden for that first, "this can't go anywhere else but 4.x" change, 
> but arguably that should not be a change done lightly anyway.  (Or any other 
> backwards-incompatible change for that matter.)  If it's worth committing 
> then I think it's perfectly reasonable to send out the dev announce that 
> there's reason for trunk to diverge from 3.x, cut branch-3, and move on.  
> This is no different than Andrew's recent announcement that there's now a 
> need for separating trunk and the 3.0 line based on what's about to go in.

So, by this definition as soon as a patch comes in to remove deprecated 
bits there will be no issue with a branch-3 getting created, correct?

>  Otherwise if past trunk behavior is any indication, it ends up mostly 
> enabling people to commit to just trunk, forgetting that the thing they are 
> committing is perfectly valid for branch-3. 

I'm not sure there was any "forgetting" involved.  We likely wouldn't 
be talking about 3.x at all if it wasn't for the code diverging enough.

> > Given the number of committers that openly ignore discussions like this, 
> > who is going to verify that incompatible changes don't get in?
>  
> The same entities who are verifying other bugs don't get in, i.e.: the 
> committers and the Hadoop QA bot running the tests.
>  Yes, I know that means it's inevitable that compatibility breakages will 
> happen, and we can and should improve the automation around compatibility 
> testing when possible.

The automation only goes so far.  At least while investigating Yetus 
bugs, I've seen more than enough blatant and purposeful ignored errors and 
warnings that I'm not convinced it will be effective. ("That javadoc compile 
failure didn't come from my patch!"  Um, yes, yes it did.) PR for features has 
greatly trumped code correctness for a few years now.

In any case, specifically thinking of the folks that commit maybe one 
or two patches a year.  They generally don't pay attention to *any* of this 
stuff and it doesn't seem like many people are actually paying attention to 
what gets committed until it breaks their universe.

>  But I don't think there's a magic bullet for preventing all compatibility 
> bugs from being introduced, just like there isn't one for preventing general 
> bugs.  Does having a trunk branch separate but essentially similar to 
> branch-3 make this any better?

Yes: it's been the process for over a decade now.  Unless there is some 
outreach done, it is almost a guarantee that someone will commit something to 
trunk they shouldn't because they simply won't know (or care?) the process has 
changed.  

> > Longer term:  what is the PMC doing to make sure we start doing major 
> > releases in a timely fashion again?  In other words, is this really an 
> > issue if we shoot for another major in (throws dart) 2 years?
> 
> If we're trying to do semantic versioning

FWIW: Hadoop has *never* done semantic versioning. A large percentage 
of our minors should really have been majors. 

> then we shouldn't have a regular cadence for major releases unless we have a 
> regular cadence of changes that break compatibility.  

But given that we don't follow semantic versioning

> I'd hope that's not something we would strive towards.  I do agree that we 
> should try to be better about shipping releases, major or minor, in a more 
> timely manner, but I don't agree that we should cut 4.0 simply based on a 
> duration since the last major release.

... the only thing we're really left with is (technically) time, either 
in the form of a volunteer saying "hey, I've got time to cut a release" or "my 
employer has a corporate goal based upon a feature in this release".   I would 
*love* for the PMC to define a policy or guidelines that says the community 
should strive for a major after x  incompatible changes, a minor after y 
changes, a micro after z fixes.  Even if it doesn't have any teeth, it would at 
least give people hope that their contributions won't be lost in the dustbin of 
history and may actually push others to work on getting a release out.  (Hadoop 
has people made committers based upon features that have never gotten into a 
stable release.  Needless to say, most of those people no longer contribute 
actively if at all.)

No one really has any idea of when releases happen, we have situations 
like we see with fsck:  a completely untenable amount of options for things 
that shouldn't even be options.  It's incredibly user unfriendly and a great 
example of why Hadoop comes off as hostile to its own users.  But because no 
one really knows when the next incompat 

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-25 Thread Jason Lowe
Allen Wittenauer wrote:


> Doesn't this place an undue burden on the contributor with the first
> incompatible patch to prove worthiness?  What happens if it is decided that
> it's not good enough?


It is a burden for that first, "this can't go anywhere else but 4.x"
change, but arguably that should not be a change done lightly anyway.  (Or
any other backwards-incompatible change for that matter.)  If it's worth
committing then I think it's perfectly reasonable to send out the dev
announce that there's reason for trunk to diverge from 3.x, cut branch-3,
and move on.  This is no different than Andrew's recent announcement that
there's now a need for separating trunk and the 3.0 line based on what's
about to go in.

I do not think it makes sense to pay for the maintenance overhead of two
nearly-identical lines with no backwards-incompatible changes between them
until we have the need.  Otherwise if past trunk behavior is any
indication, it ends up mostly enabling people to commit to just trunk,
forgetting that the thing they are committing is perfectly valid for
branch-3.  If we can agree that trunk and branch-3 should be equivalent
until an incompatible change goes into trunk, why pay for the commit
overhead and potential for accidentally missed commits until it is really
necessary?

How many will it take before the dam will break?  Or is there a timeline
> going to be given before trunk gets set to 4.x?


I think the threshold count for the dam should be 1.  As soon as we have a
JIRA that needs to be committed to move the project forward and we cannot
ship it in a 3.x release then we create branch-3 and move trunk to 4.x.
As for a timeline going to 4.x, again I don't see it so much as a "baking
period" as a "when we need it" criteria.  If we need it in a week then we
should cut it in a week.  Or a year then a year.  It all depends upon when
that 4.x-only change is ready to go in.

Given the number of committers that openly ignore discussions like this,
> who is going to verify that incompatible changes don't get in?
>

The same entities who are verifying other bugs don't get in, i.e.: the
committers and the Hadoop QA bot running the tests.  Yes, I know that means
it's inevitable that compatibility breakages will happen, and we can and
should improve the automation around compatibility testing when possible.
But I don't think there's a magic bullet for preventing all compatibility
bugs from being introduced, just like there isn't one for preventing
general bugs.  Does having a trunk branch separate but essentially similar
to branch-3 make this any better?

Longer term:  what is the PMC doing to make sure we start doing major
> releases in a timely fashion again?  In other words, is this really an
> issue if we shoot for another major in (throws dart) 2 years?
>

If we're trying to do semantic versioning then we shouldn't have a regular
cadence for major releases unless we have a regular cadence of changes that
break compatibility.  I'd hope that's not something we would strive
towards.  I do agree that we should try to be better about shipping
releases, major or minor, in a more timely manner, but I don't agree that
we should cut 4.0 simply based on a duration since the last major release.
The release contents and community's desire for those contents should
dictate the release numbering and schedule, respectively.

Jason


On Fri, Aug 25, 2017 at 2:16 PM, Allen Wittenauer 
wrote:

>
> > On Aug 25, 2017, at 10:36 AM, Andrew Wang 
> wrote:
>
> > Until we need to make incompatible changes, there's no need for
> > a Hadoop 4.0 version.
>
> Some questions:
>
> Doesn't this place an undue burden on the contributor with the
> first incompatible patch to prove worthiness?  What happens if it is
> decided that it's not good enough?
>
> How many will it take before the dam will break?  Or is there a
> timeline going to be given before trunk gets set to 4.x?
>
> Given the number of committers that openly ignore discussions like
> this, who is going to verify that incompatible changes don't get in?
>
> Longer term:  what is the PMC doing to make sure we start doing
> major releases in a timely fashion again?  In other words, is this really
> an issue if we shoot for another major in (throws dart) 2 years?
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-25 Thread Gangumalla, Uma
Plan looks good to me.

+1

Regards,
Uma

On 8/25/17, 10:36 AM, "Andrew Wang"  wrote:

>Hi folks,
>
>With 3.0.0-beta1 fast approaching, I wanted to go over the proposed
>branching strategy.
>
>In the early 2.x days, moving trunk immediately to 3.0.0 was a mistake.
>branch-2 and trunk were virtually identical, and increased backport
>complexity. Until we need to make incompatible changes, there's no need
>for
>a Hadoop 4.0 version.
>
>Thus, here's a proposal of branches and versions:
>
>trunk: 3.1.0-SNAPSHOT
>branch-3.0: 3.0.0-beta1-SNAPSHOT
>branch-2 and etc: remain as is
>
>LMK questions/comments/etc. Appreciate your attentiveness; I'm hoping to
>build consensus quickly since we have a number of open VOTEs for branch
>merges.
>
>Thanks,
>Andrew


-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-25 Thread Wangda Tan
Hi Andrew,

Thanks for updating the proposal, +1.

- Wangda

On Fri, Aug 25, 2017 at 12:16 PM, Allen Wittenauer  wrote:

>
> > On Aug 25, 2017, at 10:36 AM, Andrew Wang 
> wrote:
>
> > Until we need to make incompatible changes, there's no need for
> > a Hadoop 4.0 version.
>
> Some questions:
>
> Doesn't this place an undue burden on the contributor with the
> first incompatible patch to prove worthiness?  What happens if it is
> decided that it's not good enough?
>
> How many will it take before the dam will break?  Or is there a
> timeline going to be given before trunk gets set to 4.x?
>
> Given the number of committers that openly ignore discussions like
> this, who is going to verify that incompatible changes don't get in?
>
> Longer term:  what is the PMC doing to make sure we start doing
> major releases in a timely fashion again?  In other words, is this really
> an issue if we shoot for another major in (throws dart) 2 years?
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-25 Thread Allen Wittenauer

> On Aug 25, 2017, at 10:36 AM, Andrew Wang  wrote:

> Until we need to make incompatible changes, there's no need for
> a Hadoop 4.0 version.

Some questions:

Doesn't this place an undue burden on the contributor with the first 
incompatible patch to prove worthiness?  What happens if it is decided that 
it's not good enough?

How many will it take before the dam will break?  Or is there a 
timeline going to be given before trunk gets set to 4.x?  

Given the number of committers that openly ignore discussions like 
this, who is going to verify that incompatible changes don't get in?

Longer term:  what is the PMC doing to make sure we start doing major 
releases in a timely fashion again?  In other words, is this really an issue if 
we shoot for another major in (throws dart) 2 years?
-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-25 Thread Rushabh Shah
+1 from me too for the branching proposal.

On Fri, Aug 25, 2017 at 12:56 PM, Eric Payne <
eric.payne1...@yahoo.com.invalid> wrote:

> +1 for this branching proposal.-Eric
>
>
>   From: Andrew Wang <andrew.w...@cloudera.com>
>  To: "common-...@hadoop.apache.org" <common-...@hadoop.apache.org>; "
> mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>; "
> hdfs-dev@hadoop.apache.org" <hdfs-dev@hadoop.apache.org>; "
> yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org>
>  Sent: Friday, August 25, 2017 12:36 PM
>  Subject: [DISCUSS] Branches and versions for Hadoop 3
>
> Hi folks,
>
> With 3.0.0-beta1 fast approaching, I wanted to go over the proposed
> branching strategy.
>
> In the early 2.x days, moving trunk immediately to 3.0.0 was a mistake.
> branch-2 and trunk were virtually identical, and increased backport
> complexity. Until we need to make incompatible changes, there's no need for
> a Hadoop 4.0 version.
>
> Thus, here's a proposal of branches and versions:
>
> trunk: 3.1.0-SNAPSHOT
> branch-3.0: 3.0.0-beta1-SNAPSHOT
> branch-2 and etc: remain as is
>
> LMK questions/comments/etc. Appreciate your attentiveness; I'm hoping to
> build consensus quickly since we have a number of open VOTEs for branch
> merges.
>
> Thanks,
> Andrew
>
>
>
>


Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-25 Thread Eric Payne
+1 for this branching proposal.-Eric


  From: Andrew Wang <andrew.w...@cloudera.com>
 To: "common-...@hadoop.apache.org" <common-...@hadoop.apache.org>; 
"mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>; 
"hdfs-dev@hadoop.apache.org" <hdfs-dev@hadoop.apache.org>; 
"yarn-...@hadoop.apache.org" <yarn-...@hadoop.apache.org> 
 Sent: Friday, August 25, 2017 12:36 PM
 Subject: [DISCUSS] Branches and versions for Hadoop 3
   
Hi folks,

With 3.0.0-beta1 fast approaching, I wanted to go over the proposed
branching strategy.

In the early 2.x days, moving trunk immediately to 3.0.0 was a mistake.
branch-2 and trunk were virtually identical, and increased backport
complexity. Until we need to make incompatible changes, there's no need for
a Hadoop 4.0 version.

Thus, here's a proposal of branches and versions:

trunk: 3.1.0-SNAPSHOT
branch-3.0: 3.0.0-beta1-SNAPSHOT
branch-2 and etc: remain as is

LMK questions/comments/etc. Appreciate your attentiveness; I'm hoping to
build consensus quickly since we have a number of open VOTEs for branch
merges.

Thanks,
Andrew


   

[DISCUSS] Branches and versions for Hadoop 3

2017-08-25 Thread Andrew Wang
Hi folks,

With 3.0.0-beta1 fast approaching, I wanted to go over the proposed
branching strategy.

In the early 2.x days, moving trunk immediately to 3.0.0 was a mistake.
branch-2 and trunk were virtually identical, and increased backport
complexity. Until we need to make incompatible changes, there's no need for
a Hadoop 4.0 version.

Thus, here's a proposal of branches and versions:

trunk: 3.1.0-SNAPSHOT
branch-3.0: 3.0.0-beta1-SNAPSHOT
branch-2 and etc: remain as is

LMK questions/comments/etc. Appreciate your attentiveness; I'm hoping to
build consensus quickly since we have a number of open VOTEs for branch
merges.

Thanks,
Andrew