Re: [discuss] Spark 2.x release cadence

2016-09-29 Thread Weiqing Yang
Sorry. I think I just replied to the wrong thread. :(


WQ

On Thu, Sep 29, 2016 at 10:58 AM, Weiqing Yang 
wrote:

> +1 (non binding)
>
>
>
> RC4 is compiled and tested on the system: CentOS Linux release
> 7.0.1406 / openjdk 1.8.0_102 / R 3.3.1
>
>  All tests passed.
>
>
>
> ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver
> -Dpyspark -Dsparkr -DskipTests clean package
>
> ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver
> -Dpyspark -Dsparkr test
>
>
>
>
>
> Best,
>
> Weiqing
>
> On Thu, Sep 29, 2016 at 8:02 AM, Cody Koeninger 
> wrote:
>
>> Regarding documentation debt, is there a reason not to deploy
>> documentation updates more frequently than releases?  I recall this
>> used to be the case.
>>
>> On Wed, Sep 28, 2016 at 3:35 PM, Joseph Bradley 
>> wrote:
>> > +1 for 4 months.  With QA taking about a month, that's very reasonable.
>> >
>> > My main ask (especially for MLlib) is for contributors and committers to
>> > take extra care not to delay on updating the Programming Guide for new
>> APIs.
>> > Documentation debt often collects and has to be paid off during QA, and
>> a
>> > longer cycle will exacerbate this problem.
>> >
>> > On Wed, Sep 28, 2016 at 7:30 AM, Tom Graves
>> 
>> > wrote:
>> >>
>> >> +1 to 4 months.
>> >>
>> >> Tom
>> >>
>> >>
>> >> On Tuesday, September 27, 2016 2:07 PM, Reynold Xin <
>> r...@databricks.com>
>> >> wrote:
>> >>
>> >>
>> >> We are 2 months past releasing Spark 2.0.0, an important milestone for
>> the
>> >> project. Spark 2.0.0 deviated (took 6 month from the regular release
>> cadence
>> >> we had for the 1.x line, and we never explicitly discussed what the
>> release
>> >> cadence should look like for 2.x. Thus this email.
>> >>
>> >> During Spark 1.x, roughly every three months we make a new 1.x feature
>> >> release (e.g. 1.5.0 comes out three months after 1.4.0). Development
>> >> happened primarily in the first two months, and then a release branch
>> was
>> >> cut at the end of month 2, and the last month was reserved for QA and
>> >> release preparation.
>> >>
>> >> During 2.0.0 development, I really enjoyed the longer release cycle
>> >> because there was a lot of major changes happening and the longer time
>> was
>> >> critical for thinking through architectural changes as well as API
>> design.
>> >> While I don't expect the same degree of drastic changes in a 2.x
>> feature
>> >> release, I do think it'd make sense to increase the length of release
>> cycle
>> >> so we can make better designs.
>> >>
>> >> My strawman proposal is to maintain a regular release cadence, as we
>> did
>> >> in Spark 1.x, and increase the cycle from 3 months to 4 months. This
>> >> effectively gives us ~50% more time to develop (in reality it'd be
>> slightly
>> >> less than 50% since longer dev time also means longer QA time). As for
>> >> maintenance releases, I think those should still be cut on-demand,
>> similar
>> >> to Spark 1.x, but more aggressively.
>> >>
>> >> To put this into perspective, 4-month cycle means we will release Spark
>> >> 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at
>> the
>> >> end of Oct).
>> >>
>> >> I am curious what others think.
>> >>
>> >>
>> >>
>> >>
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>


Re: [discuss] Spark 2.x release cadence

2016-09-29 Thread Weiqing Yang
+1 (non binding)



RC4 is compiled and tested on the system: CentOS Linux release
7.0.1406 / openjdk 1.8.0_102 / R 3.3.1

 All tests passed.



./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver
-Dpyspark -Dsparkr -DskipTests clean package

./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver
-Dpyspark -Dsparkr test





Best,

Weiqing

On Thu, Sep 29, 2016 at 8:02 AM, Cody Koeninger  wrote:

> Regarding documentation debt, is there a reason not to deploy
> documentation updates more frequently than releases?  I recall this
> used to be the case.
>
> On Wed, Sep 28, 2016 at 3:35 PM, Joseph Bradley 
> wrote:
> > +1 for 4 months.  With QA taking about a month, that's very reasonable.
> >
> > My main ask (especially for MLlib) is for contributors and committers to
> > take extra care not to delay on updating the Programming Guide for new
> APIs.
> > Documentation debt often collects and has to be paid off during QA, and a
> > longer cycle will exacerbate this problem.
> >
> > On Wed, Sep 28, 2016 at 7:30 AM, Tom Graves  >
> > wrote:
> >>
> >> +1 to 4 months.
> >>
> >> Tom
> >>
> >>
> >> On Tuesday, September 27, 2016 2:07 PM, Reynold Xin <
> r...@databricks.com>
> >> wrote:
> >>
> >>
> >> We are 2 months past releasing Spark 2.0.0, an important milestone for
> the
> >> project. Spark 2.0.0 deviated (took 6 month from the regular release
> cadence
> >> we had for the 1.x line, and we never explicitly discussed what the
> release
> >> cadence should look like for 2.x. Thus this email.
> >>
> >> During Spark 1.x, roughly every three months we make a new 1.x feature
> >> release (e.g. 1.5.0 comes out three months after 1.4.0). Development
> >> happened primarily in the first two months, and then a release branch
> was
> >> cut at the end of month 2, and the last month was reserved for QA and
> >> release preparation.
> >>
> >> During 2.0.0 development, I really enjoyed the longer release cycle
> >> because there was a lot of major changes happening and the longer time
> was
> >> critical for thinking through architectural changes as well as API
> design.
> >> While I don't expect the same degree of drastic changes in a 2.x feature
> >> release, I do think it'd make sense to increase the length of release
> cycle
> >> so we can make better designs.
> >>
> >> My strawman proposal is to maintain a regular release cadence, as we did
> >> in Spark 1.x, and increase the cycle from 3 months to 4 months. This
> >> effectively gives us ~50% more time to develop (in reality it'd be
> slightly
> >> less than 50% since longer dev time also means longer QA time). As for
> >> maintenance releases, I think those should still be cut on-demand,
> similar
> >> to Spark 1.x, but more aggressively.
> >>
> >> To put this into perspective, 4-month cycle means we will release Spark
> >> 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at
> the
> >> end of Oct).
> >>
> >> I am curious what others think.
> >>
> >>
> >>
> >>
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [discuss] Spark 2.x release cadence

2016-09-29 Thread Cody Koeninger
Regarding documentation debt, is there a reason not to deploy
documentation updates more frequently than releases?  I recall this
used to be the case.

On Wed, Sep 28, 2016 at 3:35 PM, Joseph Bradley  wrote:
> +1 for 4 months.  With QA taking about a month, that's very reasonable.
>
> My main ask (especially for MLlib) is for contributors and committers to
> take extra care not to delay on updating the Programming Guide for new APIs.
> Documentation debt often collects and has to be paid off during QA, and a
> longer cycle will exacerbate this problem.
>
> On Wed, Sep 28, 2016 at 7:30 AM, Tom Graves 
> wrote:
>>
>> +1 to 4 months.
>>
>> Tom
>>
>>
>> On Tuesday, September 27, 2016 2:07 PM, Reynold Xin 
>> wrote:
>>
>>
>> We are 2 months past releasing Spark 2.0.0, an important milestone for the
>> project. Spark 2.0.0 deviated (took 6 month from the regular release cadence
>> we had for the 1.x line, and we never explicitly discussed what the release
>> cadence should look like for 2.x. Thus this email.
>>
>> During Spark 1.x, roughly every three months we make a new 1.x feature
>> release (e.g. 1.5.0 comes out three months after 1.4.0). Development
>> happened primarily in the first two months, and then a release branch was
>> cut at the end of month 2, and the last month was reserved for QA and
>> release preparation.
>>
>> During 2.0.0 development, I really enjoyed the longer release cycle
>> because there was a lot of major changes happening and the longer time was
>> critical for thinking through architectural changes as well as API design.
>> While I don't expect the same degree of drastic changes in a 2.x feature
>> release, I do think it'd make sense to increase the length of release cycle
>> so we can make better designs.
>>
>> My strawman proposal is to maintain a regular release cadence, as we did
>> in Spark 1.x, and increase the cycle from 3 months to 4 months. This
>> effectively gives us ~50% more time to develop (in reality it'd be slightly
>> less than 50% since longer dev time also means longer QA time). As for
>> maintenance releases, I think those should still be cut on-demand, similar
>> to Spark 1.x, but more aggressively.
>>
>> To put this into perspective, 4-month cycle means we will release Spark
>> 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the
>> end of Oct).
>>
>> I am curious what others think.
>>
>>
>>
>>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [discuss] Spark 2.x release cadence

2016-09-28 Thread Joseph Bradley
+1 for 4 months.  With QA taking about a month, that's very reasonable.

My main ask (especially for MLlib) is for contributors and committers to
take extra care not to delay on updating the Programming Guide for new
APIs.  Documentation debt often collects and has to be paid off during QA,
and a longer cycle will exacerbate this problem.

On Wed, Sep 28, 2016 at 7:30 AM, Tom Graves 
wrote:

> +1 to 4 months.
>
> Tom
>
>
> On Tuesday, September 27, 2016 2:07 PM, Reynold Xin 
> wrote:
>
>
> We are 2 months past releasing Spark 2.0.0, an important milestone for the
> project. Spark 2.0.0 deviated (took 6 month from the regular release
> cadence we had for the 1.x line, and we never explicitly discussed what the
> release cadence should look like for 2.x. Thus this email.
>
> During Spark 1.x, roughly every three months we make a new 1.x feature
> release (e.g. 1.5.0 comes out three months after 1.4.0). Development
> happened primarily in the first two months, and then a release branch was
> cut at the end of month 2, and the last month was reserved for QA and
> release preparation.
>
> During 2.0.0 development, I really enjoyed the longer release cycle
> because there was a lot of major changes happening and the longer time was
> critical for thinking through architectural changes as well as API design.
> While I don't expect the same degree of drastic changes in a 2.x feature
> release, I do think it'd make sense to increase the length of release cycle
> so we can make better designs.
>
> My strawman proposal is to maintain a regular release cadence, as we did
> in Spark 1.x, and increase the cycle from 3 months to 4 months. This
> effectively gives us ~50% more time to develop (in reality it'd be slightly
> less than 50% since longer dev time also means longer QA time). As for
> maintenance releases, I think those should still be cut on-demand, similar
> to Spark 1.x, but more aggressively.
>
> To put this into perspective, 4-month cycle means we will release Spark
> 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the
> end of Oct).
>
> I am curious what others think.
>
>
>
>
>


Re: [discuss] Spark 2.x release cadence

2016-09-28 Thread Tom Graves
+1 to 4 months.
Tom 

On Tuesday, September 27, 2016 2:07 PM, Reynold Xin  
wrote:
 

 We are 2 months past releasing Spark 2.0.0, an important milestone for the 
project. Spark 2.0.0 deviated (took 6 month from the regular release cadence we 
had for the 1.x line, and we never explicitly discussed what the release 
cadence should look like for 2.x. Thus this email.
During Spark 1.x, roughly every three months we make a new 1.x feature release 
(e.g. 1.5.0 comes out three months after 1.4.0). Development happened primarily 
in the first two months, and then a release branch was cut at the end of month 
2, and the last month was reserved for QA and release preparation.
During 2.0.0 development, I really enjoyed the longer release cycle because 
there was a lot of major changes happening and the longer time was critical for 
thinking through architectural changes as well as API design. While I don't 
expect the same degree of drastic changes in a 2.x feature release, I do think 
it'd make sense to increase the length of release cycle so we can make better 
designs.
My strawman proposal is to maintain a regular release cadence, as we did in 
Spark 1.x, and increase the cycle from 3 months to 4 months. This effectively 
gives us ~50% more time to develop (in reality it'd be slightly less than 50% 
since longer dev time also means longer QA time). As for maintenance releases, 
I think those should still be cut on-demand, similar to Spark 1.x, but more 
aggressively.
To put this into perspective, 4-month cycle means we will release Spark 2.1.0 
at the end of Nov or early Dec (and branch cut / code freeze at the end of Oct).
I am curious what others think.



   

Re: [discuss] Spark 2.x release cadence

2016-09-27 Thread Felix Cheung
+1 on longer release cycle at schedule and more maintenance releases.


_
From: Mark Hamstra <m...@clearstorydata.com<mailto:m...@clearstorydata.com>>
Sent: Tuesday, September 27, 2016 2:01 PM
Subject: Re: [discuss] Spark 2.x release cadence
To: Reynold Xin <r...@databricks.com<mailto:r...@databricks.com>>
Cc: <dev@spark.apache.org<mailto:dev@spark.apache.org>>


+1

And I'll dare say that for those with Spark in production, what is more 
important is that maintenance releases come out in a timely fashion than that 
new features are released one month sooner or later.

On Tue, Sep 27, 2016 at 12:06 PM, Reynold Xin 
<r...@databricks.com<mailto:r...@databricks.com>> wrote:
We are 2 months past releasing Spark 2.0.0, an important milestone for the 
project. Spark 2.0.0 deviated (took 6 month from the regular release cadence we 
had for the 1.x line, and we never explicitly discussed what the release 
cadence should look like for 2.x. Thus this email.

During Spark 1.x, roughly every three months we make a new 1.x feature release 
(e.g. 1.5.0 comes out three months after 1.4.0). Development happened primarily 
in the first two months, and then a release branch was cut at the end of month 
2, and the last month was reserved for QA and release preparation.

During 2.0.0 development, I really enjoyed the longer release cycle because 
there was a lot of major changes happening and the longer time was critical for 
thinking through architectural changes as well as API design. While I don't 
expect the same degree of drastic changes in a 2.x feature release, I do think 
it'd make sense to increase the length of release cycle so we can make better 
designs.

My strawman proposal is to maintain a regular release cadence, as we did in 
Spark 1.x, and increase the cycle from 3 months to 4 months. This effectively 
gives us ~50% more time to develop (in reality it'd be slightly less than 50% 
since longer dev time also means longer QA time). As for maintenance releases, 
I think those should still be cut on-demand, similar to Spark 1.x, but more 
aggressively.

To put this into perspective, 4-month cycle means we will release Spark 2.1.0 
at the end of Nov or early Dec (and branch cut / code freeze at the end of Oct).

I am curious what others think.







Re: [discuss] Spark 2.x release cadence

2016-09-27 Thread Mark Hamstra
+1

And I'll dare say that for those with Spark in production, what is more
important is that maintenance releases come out in a timely fashion than
that new features are released one month sooner or later.

On Tue, Sep 27, 2016 at 12:06 PM, Reynold Xin  wrote:

> We are 2 months past releasing Spark 2.0.0, an important milestone for the
> project. Spark 2.0.0 deviated (took 6 month from the regular release
> cadence we had for the 1.x line, and we never explicitly discussed what the
> release cadence should look like for 2.x. Thus this email.
>
> During Spark 1.x, roughly every three months we make a new 1.x feature
> release (e.g. 1.5.0 comes out three months after 1.4.0). Development
> happened primarily in the first two months, and then a release branch was
> cut at the end of month 2, and the last month was reserved for QA and
> release preparation.
>
> During 2.0.0 development, I really enjoyed the longer release cycle
> because there was a lot of major changes happening and the longer time was
> critical for thinking through architectural changes as well as API design.
> While I don't expect the same degree of drastic changes in a 2.x feature
> release, I do think it'd make sense to increase the length of release cycle
> so we can make better designs.
>
> My strawman proposal is to maintain a regular release cadence, as we did
> in Spark 1.x, and increase the cycle from 3 months to 4 months. This
> effectively gives us ~50% more time to develop (in reality it'd be slightly
> less than 50% since longer dev time also means longer QA time). As for
> maintenance releases, I think those should still be cut on-demand, similar
> to Spark 1.x, but more aggressively.
>
> To put this into perspective, 4-month cycle means we will release Spark
> 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the
> end of Oct).
>
> I am curious what others think.
>
>
>


Re: [discuss] Spark 2.x release cadence

2016-09-27 Thread Sean Owen
+1 -- I think the minor releases were taking more like 4 months than 3
months anyway, and it was good for the reasons you give. This reflects
reality and is a good thing. All the better if we then can more
comfortably really follow the timeline.

On Tue, Sep 27, 2016 at 3:06 PM, Reynold Xin  wrote:
> We are 2 months past releasing Spark 2.0.0, an important milestone for the
> project. Spark 2.0.0 deviated (took 6 month from the regular release cadence
> we had for the 1.x line, and we never explicitly discussed what the release
> cadence should look like for 2.x. Thus this email.
>
> During Spark 1.x, roughly every three months we make a new 1.x feature
> release (e.g. 1.5.0 comes out three months after 1.4.0). Development
> happened primarily in the first two months, and then a release branch was
> cut at the end of month 2, and the last month was reserved for QA and
> release preparation.
>
> During 2.0.0 development, I really enjoyed the longer release cycle because
> there was a lot of major changes happening and the longer time was critical
> for thinking through architectural changes as well as API design. While I
> don't expect the same degree of drastic changes in a 2.x feature release, I
> do think it'd make sense to increase the length of release cycle so we can
> make better designs.
>
> My strawman proposal is to maintain a regular release cadence, as we did in
> Spark 1.x, and increase the cycle from 3 months to 4 months. This
> effectively gives us ~50% more time to develop (in reality it'd be slightly
> less than 50% since longer dev time also means longer QA time). As for
> maintenance releases, I think those should still be cut on-demand, similar
> to Spark 1.x, but more aggressively.
>
> To put this into perspective, 4-month cycle means we will release Spark
> 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the
> end of Oct).
>
> I am curious what others think.
>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [discuss] Spark 2.x release cadence

2016-09-27 Thread Shivaram Venkataraman
+1 I think having a 4 month window instead of a 3 month window sounds good.

However I think figuring out a timeline for maintenance releases would
also be good. This is a common concern that comes up in many user
threads and it'll be better to have some structure around this. It
doesn't need to be strict, but something like the first maintenance
release for the latest 2.x.0 release within 2 months. And then a
second maintenance release within 6 months or something like that.

Thanks
Shivaram

On Tue, Sep 27, 2016 at 12:06 PM, Reynold Xin  wrote:
> We are 2 months past releasing Spark 2.0.0, an important milestone for the
> project. Spark 2.0.0 deviated (took 6 month from the regular release cadence
> we had for the 1.x line, and we never explicitly discussed what the release
> cadence should look like for 2.x. Thus this email.
>
> During Spark 1.x, roughly every three months we make a new 1.x feature
> release (e.g. 1.5.0 comes out three months after 1.4.0). Development
> happened primarily in the first two months, and then a release branch was
> cut at the end of month 2, and the last month was reserved for QA and
> release preparation.
>
> During 2.0.0 development, I really enjoyed the longer release cycle because
> there was a lot of major changes happening and the longer time was critical
> for thinking through architectural changes as well as API design. While I
> don't expect the same degree of drastic changes in a 2.x feature release, I
> do think it'd make sense to increase the length of release cycle so we can
> make better designs.
>
> My strawman proposal is to maintain a regular release cadence, as we did in
> Spark 1.x, and increase the cycle from 3 months to 4 months. This
> effectively gives us ~50% more time to develop (in reality it'd be slightly
> less than 50% since longer dev time also means longer QA time). As for
> maintenance releases, I think those should still be cut on-demand, similar
> to Spark 1.x, but more aggressively.
>
> To put this into perspective, 4-month cycle means we will release Spark
> 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the
> end of Oct).
>
> I am curious what others think.
>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[discuss] Spark 2.x release cadence

2016-09-27 Thread Reynold Xin
We are 2 months past releasing Spark 2.0.0, an important milestone for the
project. Spark 2.0.0 deviated (took 6 month from the regular release
cadence we had for the 1.x line, and we never explicitly discussed what the
release cadence should look like for 2.x. Thus this email.

During Spark 1.x, roughly every three months we make a new 1.x feature
release (e.g. 1.5.0 comes out three months after 1.4.0). Development
happened primarily in the first two months, and then a release branch was
cut at the end of month 2, and the last month was reserved for QA and
release preparation.

During 2.0.0 development, I really enjoyed the longer release cycle because
there was a lot of major changes happening and the longer time was critical
for thinking through architectural changes as well as API design. While I
don't expect the same degree of drastic changes in a 2.x feature release, I
do think it'd make sense to increase the length of release cycle so we can
make better designs.

My strawman proposal is to maintain a regular release cadence, as we did in
Spark 1.x, and increase the cycle from 3 months to 4 months. This
effectively gives us ~50% more time to develop (in reality it'd be slightly
less than 50% since longer dev time also means longer QA time). As for
maintenance releases, I think those should still be cut on-demand, similar
to Spark 1.x, but more aggressively.

To put this into perspective, 4-month cycle means we will release Spark
2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the
end of Oct).

I am curious what others think.