RE: 3.0 and the Cassandra release process
O yea, and BGL4 is now green without any impending risks. Additionally, the other yellow projects LWR05 MTV05 are on a path that will lead to green in coming weeks. Thats All Folks -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, April 15, 2015 3:40 AM To: dev Subject: Re: 3.0 and the Cassandra release process Short answer: yes. Longer answer, pasted from my reply to Jon Haddad elsewhere in the thread: We are moving away from designating major releases like 3.0 as special, other than as a marker of compatibility. In fact we are moving away from major releases entirely, with each release being a much smaller, digestible unit of change, and the ultimate goal of every even release being production-quality. This means that bugs won't pile up and compound each other. And bugs that do slip through will affect less users. As 3.x stabilizes, more people will try out the releases, yielding better quality, yielding even more people trying them out in a virtuous cycle. This won't just happen by wishing for it. I am very serious about investing the energy we would have spent on backporting fixes to a stable branch, into improving our QA process and test coverage. After a very short list of in-progress features that may not make the 3.0 cutoff (#6477, #6696 come to mind) I'm willing to virtually pause new feature development entirely to make this happen. On Tue, Apr 14, 2015 at 11:53 PM, Phil Yang ud1...@gmail.com wrote: Hi Jonathan, How long will tick-tock releases will be maintained? Do users have to upgrade to a new even release with new features to fix the bugs in an older even release? 2015-04-14 6:28 GMT+08:00 Jonathan Ellis jbel...@gmail.com: On Tue, Mar 17, 2015 at 4:06 PM, Jonathan Ellis jbel...@gmail.com wrote: I’m optimistic that as we improve our process this way, our even releases will become increasingly stable. If so, we can skip sub-minor releases (3.2.x) entirely, and focus on keeping the release train moving. In the meantime, we will continue delivering 2.1.x stability releases. The weak point of this plan is the transition from the big release development methodology culminating in 3.0, to the monthly tick-tock releases. Since 3.0 needs to go through a beta/release candidate phase, during which we're going to be serious about not adding new features, that means that 3.1 will come with multiple months worth of features, so right off the bat we're starting from a disadvantage from a stability standpoint. Recognizing that it will take several months for the tick-tock releases to stabilize, I would like to ship 3.0.x stability releases concurrently with 3.y tick-tock releases. This should stabilize 3.0.x faster than tick-tock, while at the same time hedging our bets such that if we assess tick-tock in six months and decide it's not delivering on its goals, we're not six months behind in having a usable set of features that we shipped in 3.0. So, to summarize: - New features will *only* go into tick-tock releases. - Bug fixes will go into tick-tock releases and a 3.0.x branch, which will be maintained for at least a year -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced -- Thanks, Phil Yang -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: 3.0 and the Cassandra release process
Short answer: yes. Longer answer, pasted from my reply to Jon Haddad elsewhere in the thread: We are moving away from designating major releases like 3.0 as special, other than as a marker of compatibility. In fact we are moving away from major releases entirely, with each release being a much smaller, digestible unit of change, and the ultimate goal of every even release being production-quality. This means that bugs won't pile up and compound each other. And bugs that do slip through will affect less users. As 3.x stabilizes, more people will try out the releases, yielding better quality, yielding even more people trying them out in a virtuous cycle. This won't just happen by wishing for it. I am very serious about investing the energy we would have spent on backporting fixes to a stable branch, into improving our QA process and test coverage. After a very short list of in-progress features that may not make the 3.0 cutoff (#6477, #6696 come to mind) I'm willing to virtually pause new feature development entirely to make this happen. On Tue, Apr 14, 2015 at 11:53 PM, Phil Yang ud1...@gmail.com wrote: Hi Jonathan, How long will tick-tock releases will be maintained? Do users have to upgrade to a new even release with new features to fix the bugs in an older even release? 2015-04-14 6:28 GMT+08:00 Jonathan Ellis jbel...@gmail.com: On Tue, Mar 17, 2015 at 4:06 PM, Jonathan Ellis jbel...@gmail.com wrote: I’m optimistic that as we improve our process this way, our even releases will become increasingly stable. If so, we can skip sub-minor releases (3.2.x) entirely, and focus on keeping the release train moving. In the meantime, we will continue delivering 2.1.x stability releases. The weak point of this plan is the transition from the big release development methodology culminating in 3.0, to the monthly tick-tock releases. Since 3.0 needs to go through a beta/release candidate phase, during which we're going to be serious about not adding new features, that means that 3.1 will come with multiple months worth of features, so right off the bat we're starting from a disadvantage from a stability standpoint. Recognizing that it will take several months for the tick-tock releases to stabilize, I would like to ship 3.0.x stability releases concurrently with 3.y tick-tock releases. This should stabilize 3.0.x faster than tick-tock, while at the same time hedging our bets such that if we assess tick-tock in six months and decide it's not delivering on its goals, we're not six months behind in having a usable set of features that we shipped in 3.0. So, to summarize: - New features will *only* go into tick-tock releases. - Bug fixes will go into tick-tock releases and a 3.0.x branch, which will be maintained for at least a year -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced -- Thanks, Phil Yang -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: 3.0 and the Cassandra release process
Hi Jonathan, How long will tick-tock releases will be maintained? Do users have to upgrade to a new even release with new features to fix the bugs in an older even release? 2015-04-14 6:28 GMT+08:00 Jonathan Ellis jbel...@gmail.com: On Tue, Mar 17, 2015 at 4:06 PM, Jonathan Ellis jbel...@gmail.com wrote: I’m optimistic that as we improve our process this way, our even releases will become increasingly stable. If so, we can skip sub-minor releases (3.2.x) entirely, and focus on keeping the release train moving. In the meantime, we will continue delivering 2.1.x stability releases. The weak point of this plan is the transition from the big release development methodology culminating in 3.0, to the monthly tick-tock releases. Since 3.0 needs to go through a beta/release candidate phase, during which we're going to be serious about not adding new features, that means that 3.1 will come with multiple months worth of features, so right off the bat we're starting from a disadvantage from a stability standpoint. Recognizing that it will take several months for the tick-tock releases to stabilize, I would like to ship 3.0.x stability releases concurrently with 3.y tick-tock releases. This should stabilize 3.0.x faster than tick-tock, while at the same time hedging our bets such that if we assess tick-tock in six months and decide it's not delivering on its goals, we're not six months behind in having a usable set of features that we shipped in 3.0. So, to summarize: - New features will *only* go into tick-tock releases. - Bug fixes will go into tick-tock releases and a 3.0.x branch, which will be maintained for at least a year -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced -- Thanks, Phil Yang
Re: 3.0 and the Cassandra release process
On Tue, Mar 17, 2015 at 4:06 PM, Jonathan Ellis jbel...@gmail.com wrote: I’m optimistic that as we improve our process this way, our even releases will become increasingly stable. If so, we can skip sub-minor releases (3.2.x) entirely, and focus on keeping the release train moving. In the meantime, we will continue delivering 2.1.x stability releases. The weak point of this plan is the transition from the big release development methodology culminating in 3.0, to the monthly tick-tock releases. Since 3.0 needs to go through a beta/release candidate phase, during which we're going to be serious about not adding new features, that means that 3.1 will come with multiple months worth of features, so right off the bat we're starting from a disadvantage from a stability standpoint. Recognizing that it will take several months for the tick-tock releases to stabilize, I would like to ship 3.0.x stability releases concurrently with 3.y tick-tock releases. This should stabilize 3.0.x faster than tick-tock, while at the same time hedging our bets such that if we assess tick-tock in six months and decide it's not delivering on its goals, we're not six months behind in having a usable set of features that we shipped in 3.0. So, to summarize: - New features will *only* go into tick-tock releases. - Bug fixes will go into tick-tock releases and a 3.0.x branch, which will be maintained for at least a year -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: 3.0 and the Cassandra release process
In this tick tock cycle, is there still a long term release that's maintained, meant for production? Will bug fixes be back ported to 3.0 (stable) with new stuff going forward to 3.x? On Thu, Mar 26, 2015 at 6:50 AM Aleksey Yeschenko alek...@apache.org wrote: Hey Jason. I think pretty much everybody is on board with: 1) A monthly release cycle 2) Keeping trunk releasable all the times And that’s what my personal +1 was for. The tick-tock mechanism details and bug fix policy for the maintained stable lines should be fleshed out before we proceed. I believe that once they are explained better, the concerns will mostly, or entirely, go away. -- AY On Mon, Mar 23, 2015 at 11:15 PM, Jason Brown jasedbr...@gmail.com wrote: Hey all, I had a hallway conversation with some folks here last week, and they expressed some concerns with this proposal. I will not attempt to summarize their arguments as I don't believe I could do them ample justice, but I strongly encouraged those individuals to speak up and be heard on this thread (I know they are watching!). Thanks, -Jason On Mon, Mar 23, 2015 at 6:32 AM, 曹志富 cao.zh...@gmail.com wrote: +1 -- Ranger Tsao 2015-03-20 22:57 GMT+08:00 Ryan McGuire r...@datastax.com: I'm taking notes from the infrastructure doc and wrote down some action items for my team: https://gist.github.com/EnigmaCurry/d53eccb55f5d0986c976 -- [image: datastax_logo.png] http://www.datastax.com/ Ryan McGuire Software Engineering Manager in Test | r...@datastax.com [image: linkedin.png] https://www.linkedin.com/in/enigmacurry [image: twitter.png] http://twitter.com/enigmacurry http://github.com/enigmacurry On Thu, Mar 19, 2015 at 1:08 PM, Ariel Weisberg ariel.weisb...@datastax.com wrote: Hi, I realized one of the documents we didn't send out was the infrastructure side changes I am looking for. This one is maybe a little rougher as it was the first one I wrote on the subject. https://docs.google.com/document/d/1Seku0vPwChbnH3uYYxon0UO- b6LDtSqluZiH--sWWi0/edit?usp=sharing The goal is to have infrastructure that gives developers as close to immediate feedback as possible on their code before they merge. Feedback that is delayed to after merging to trunk should come in a day or two and there is a product owner (Michael Shuler) responsible for making sure that issues are addressed quickly. QA is going to help by providing developers with a better tools for writing higher level functional tests that explore all of the functions together along with the configuration space without developers having to do any work other then plugging in functionality to exercise and then validate something specific. This kind of harness is hard to get right and make reliable and expressive so they have their work cut out for them. It's going to be an iterative process where the tests improve as new work introduces missing coverage and as bugs/regressions drive the introduction of new tests. The monthly retrospective (planning on doing that first of the month) is also going to help us refine the testing and development process. Ariel On Thu, Mar 19, 2015 at 7:23 AM, Jason Brown jasedbr...@gmail.com wrote: +1 to this general proposal. I think the time has finally come for us to try something new, and this sounds legit. Thanks! On Thu, Mar 19, 2015 at 12:49 AM, Phil Yang ud1...@gmail.com wrote: Can I regard the odd version as the development preview and the even version as the production ready? IMO, as a database infrastructure project, stable is more important than other kinds of projects. LTS is a good idea, but if we don't support non-LTS releases for enough time to fix their bugs, users on non-LTS release may have to upgrade a new major release to fix the bugs and may have to handle some new bugs by the new features. I'm afraid that eventually people would only think about the LTS one. 2015-03-19 8:48 GMT+08:00 Pavel Yaskevich pove...@gmail.com: +1 On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman mkjell...@internalcircle.com wrote: For most of my life I’ve lived on the software bleeding edge both personally and professionally. Maybe it’s a personal weakness, but I guess I get a thrill out of the problem solving aspect? Recently I came to a bit of an epiphany — the closer I keep to the daily build — generally the happier
Re: 3.0 and the Cassandra release process
We are moving away from designating major releases like 3.0 as special, other than as a marker of compatibility. In fact we are moving away from major releases entirely, with each release being a much smaller, digestible unit of change, and the ultimate goal of every even release being production-quality. This means that bugs won't pile up and compound each other. And bugs that do slip through will affect less users. As 3.x stabilizes, more people will try out the releases, yielding better quality, yielding even more people trying them out in a virtuous cycle. This won't just happen by wishing for it. I am very serious about investing the energy we would have spent on backporting fixes to a stable branch, into improving our QA process and test coverage. After a very short list of in-progress features that may not make the 3.0 cutoff (#6477, #6696 come to mind) I'm willing to virtually pause new feature development entirely to make this happen. Some patience will be necessary with the first few releases. But at this point, people are used to about six months of waiting for a new major to stabilize. So, let's give this a try until 3.6. If that still hasn't materially stabilized, then we need to go back to the drawing board. But I'm optimistic that it will. On Thu, Apr 2, 2015 at 5:04 PM, Jonathan Haddad j...@jonhaddad.com wrote: In this tick tock cycle, is there still a long term release that's maintained, meant for production? Will bug fixes be back ported to 3.0 (stable) with new stuff going forward to 3.x? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: 3.0 and the Cassandra release process
Hey Jonathan, I have been hoping for this approach for years now-one of the reasons I left Datastax was due to my feeling that quality was always on the backburner and never really taken seriously vs marketing driven releases. I sincerely hope this approach reverses that perceived trend. -- Colin +1 612 859 6129 Skype colin.p.clark On Apr 2, 2015, at 5:54 PM, Jonathan Ellis jbel...@gmail.com wrote: We are moving away from designating major releases like 3.0 as special, other than as a marker of compatibility. In fact we are moving away from major releases entirely, with each release being a much smaller, digestible unit of change, and the ultimate goal of every even release being production-quality. This means that bugs won't pile up and compound each other. And bugs that do slip through will affect less users. As 3.x stabilizes, more people will try out the releases, yielding better quality, yielding even more people trying them out in a virtuous cycle. This won't just happen by wishing for it. I am very serious about investing the energy we would have spent on backporting fixes to a stable branch, into improving our QA process and test coverage. After a very short list of in-progress features that may not make the 3.0 cutoff (#6477, #6696 come to mind) I'm willing to virtually pause new feature development entirely to make this happen. Some patience will be necessary with the first few releases. But at this point, people are used to about six months of waiting for a new major to stabilize. So, let's give this a try until 3.6. If that still hasn't materially stabilized, then we need to go back to the drawing board. But I'm optimistic that it will. On Thu, Apr 2, 2015 at 5:04 PM, Jonathan Haddad j...@jonhaddad.com wrote: In this tick tock cycle, is there still a long term release that's maintained, meant for production? Will bug fixes be back ported to 3.0 (stable) with new stuff going forward to 3.x? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: 3.0 and the Cassandra release process
Hey Jason. I think pretty much everybody is on board with: 1) A monthly release cycle 2) Keeping trunk releasable all the times And that’s what my personal +1 was for. The tick-tock mechanism details and bug fix policy for the maintained stable lines should be fleshed out before we proceed. I believe that once they are explained better, the concerns will mostly, or entirely, go away. -- AY On Mon, Mar 23, 2015 at 11:15 PM, Jason Brown jasedbr...@gmail.com wrote: Hey all, I had a hallway conversation with some folks here last week, and they expressed some concerns with this proposal. I will not attempt to summarize their arguments as I don't believe I could do them ample justice, but I strongly encouraged those individuals to speak up and be heard on this thread (I know they are watching!). Thanks, -Jason On Mon, Mar 23, 2015 at 6:32 AM, 曹志富 cao.zh...@gmail.com wrote: +1 -- Ranger Tsao 2015-03-20 22:57 GMT+08:00 Ryan McGuire r...@datastax.com: I'm taking notes from the infrastructure doc and wrote down some action items for my team: https://gist.github.com/EnigmaCurry/d53eccb55f5d0986c976 -- [image: datastax_logo.png] http://www.datastax.com/ Ryan McGuire Software Engineering Manager in Test | r...@datastax.com [image: linkedin.png] https://www.linkedin.com/in/enigmacurry [image: twitter.png] http://twitter.com/enigmacurry http://github.com/enigmacurry On Thu, Mar 19, 2015 at 1:08 PM, Ariel Weisberg ariel.weisb...@datastax.com wrote: Hi, I realized one of the documents we didn't send out was the infrastructure side changes I am looking for. This one is maybe a little rougher as it was the first one I wrote on the subject. https://docs.google.com/document/d/1Seku0vPwChbnH3uYYxon0UO-b6LDtSqluZiH--sWWi0/edit?usp=sharing The goal is to have infrastructure that gives developers as close to immediate feedback as possible on their code before they merge. Feedback that is delayed to after merging to trunk should come in a day or two and there is a product owner (Michael Shuler) responsible for making sure that issues are addressed quickly. QA is going to help by providing developers with a better tools for writing higher level functional tests that explore all of the functions together along with the configuration space without developers having to do any work other then plugging in functionality to exercise and then validate something specific. This kind of harness is hard to get right and make reliable and expressive so they have their work cut out for them. It's going to be an iterative process where the tests improve as new work introduces missing coverage and as bugs/regressions drive the introduction of new tests. The monthly retrospective (planning on doing that first of the month) is also going to help us refine the testing and development process. Ariel On Thu, Mar 19, 2015 at 7:23 AM, Jason Brown jasedbr...@gmail.com wrote: +1 to this general proposal. I think the time has finally come for us to try something new, and this sounds legit. Thanks! On Thu, Mar 19, 2015 at 12:49 AM, Phil Yang ud1...@gmail.com wrote: Can I regard the odd version as the development preview and the even version as the production ready? IMO, as a database infrastructure project, stable is more important than other kinds of projects. LTS is a good idea, but if we don't support non-LTS releases for enough time to fix their bugs, users on non-LTS release may have to upgrade a new major release to fix the bugs and may have to handle some new bugs by the new features. I'm afraid that eventually people would only think about the LTS one. 2015-03-19 8:48 GMT+08:00 Pavel Yaskevich pove...@gmail.com: +1 On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman mkjell...@internalcircle.com wrote: For most of my life I’ve lived on the software bleeding edge both personally and professionally. Maybe it’s a personal weakness, but I guess I get a thrill out of the problem solving aspect? Recently I came to a bit of an epiphany — the closer I keep to the daily build — generally the happier I am on a daily basis. Bugs happen, but for the most part (aside from show stopper bugs), pain points for myself in a given daily build can generally can be debugged to 1 or maybe 2 root causes, fixed in ~24 hours, and then life is better the next day again. In comparison, the old waterfall model generally means taking an “official” release at
Re: 3.0 and the Cassandra release process
Broadly as a contributor and operator I like the idea of more frequent releases off of an always stable master. First customer ship quality all the time [1]! I'm a little concerned that the specific tick-tock proposal could devolve into a 'devodd' style where the 'feature release' becomes a thing no one wants to run in production. However, if master is always stable it doesn't really matter when releases are cut and if master is *not* stable that is a larger problem then the details of the release cadence. I say give it a shot. [1] http://wiki.illumos.org/display/illumos/On+the+Quality+Death+Spiral
Re: 3.0 and the Cassandra release process
+1 -- Ranger Tsao 2015-03-20 22:57 GMT+08:00 Ryan McGuire r...@datastax.com: I'm taking notes from the infrastructure doc and wrote down some action items for my team: https://gist.github.com/EnigmaCurry/d53eccb55f5d0986c976 -- [image: datastax_logo.png] http://www.datastax.com/ Ryan McGuire Software Engineering Manager in Test | r...@datastax.com [image: linkedin.png] https://www.linkedin.com/in/enigmacurry [image: twitter.png] http://twitter.com/enigmacurry http://github.com/enigmacurry On Thu, Mar 19, 2015 at 1:08 PM, Ariel Weisberg ariel.weisb...@datastax.com wrote: Hi, I realized one of the documents we didn't send out was the infrastructure side changes I am looking for. This one is maybe a little rougher as it was the first one I wrote on the subject. https://docs.google.com/document/d/1Seku0vPwChbnH3uYYxon0UO-b6LDtSqluZiH--sWWi0/edit?usp=sharing The goal is to have infrastructure that gives developers as close to immediate feedback as possible on their code before they merge. Feedback that is delayed to after merging to trunk should come in a day or two and there is a product owner (Michael Shuler) responsible for making sure that issues are addressed quickly. QA is going to help by providing developers with a better tools for writing higher level functional tests that explore all of the functions together along with the configuration space without developers having to do any work other then plugging in functionality to exercise and then validate something specific. This kind of harness is hard to get right and make reliable and expressive so they have their work cut out for them. It's going to be an iterative process where the tests improve as new work introduces missing coverage and as bugs/regressions drive the introduction of new tests. The monthly retrospective (planning on doing that first of the month) is also going to help us refine the testing and development process. Ariel On Thu, Mar 19, 2015 at 7:23 AM, Jason Brown jasedbr...@gmail.com wrote: +1 to this general proposal. I think the time has finally come for us to try something new, and this sounds legit. Thanks! On Thu, Mar 19, 2015 at 12:49 AM, Phil Yang ud1...@gmail.com wrote: Can I regard the odd version as the development preview and the even version as the production ready? IMO, as a database infrastructure project, stable is more important than other kinds of projects. LTS is a good idea, but if we don't support non-LTS releases for enough time to fix their bugs, users on non-LTS release may have to upgrade a new major release to fix the bugs and may have to handle some new bugs by the new features. I'm afraid that eventually people would only think about the LTS one. 2015-03-19 8:48 GMT+08:00 Pavel Yaskevich pove...@gmail.com: +1 On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman mkjell...@internalcircle.com wrote: For most of my life I’ve lived on the software bleeding edge both personally and professionally. Maybe it’s a personal weakness, but I guess I get a thrill out of the problem solving aspect? Recently I came to a bit of an epiphany — the closer I keep to the daily build — generally the happier I am on a daily basis. Bugs happen, but for the most part (aside from show stopper bugs), pain points for myself in a given daily build can generally can be debugged to 1 or maybe 2 root causes, fixed in ~24 hours, and then life is better the next day again. In comparison, the old waterfall model generally means taking an “official” release at some point and waiting for some poor soul (or developer) to actually run the thing. No matter how good the QA team is, until it’s actually used in the real world, most bugs aren’t found. If you and your organization can wait 24 hours * number of bugs discovered after people actually started using the thing, you end up with a “usable build” around the holy-grail minor X.X.5 release of Cassandra. I love the idea of the LTS model Jonathan describes because it means more code can get real testing and “bake” for longer instead of sitting largely unused on some git repository in a datacenter far far away. A lot of code has changed between 2.0 and trunk today. The code has diverged to the point that if you write something for 2.0 (as the most stable major branch currently available), merging it forward to 3.0 or after generally means rewriting it. If the only thing that comes out of this is a smaller delta of LOC between the deployable version/branch and what we can develop against and what QA is focused on I think
Re: 3.0 and the Cassandra release process
Hey all, I had a hallway conversation with some folks here last week, and they expressed some concerns with this proposal. I will not attempt to summarize their arguments as I don't believe I could do them ample justice, but I strongly encouraged those individuals to speak up and be heard on this thread (I know they are watching!). Thanks, -Jason On Mon, Mar 23, 2015 at 6:32 AM, 曹志富 cao.zh...@gmail.com wrote: +1 -- Ranger Tsao 2015-03-20 22:57 GMT+08:00 Ryan McGuire r...@datastax.com: I'm taking notes from the infrastructure doc and wrote down some action items for my team: https://gist.github.com/EnigmaCurry/d53eccb55f5d0986c976 -- [image: datastax_logo.png] http://www.datastax.com/ Ryan McGuire Software Engineering Manager in Test | r...@datastax.com [image: linkedin.png] https://www.linkedin.com/in/enigmacurry [image: twitter.png] http://twitter.com/enigmacurry http://github.com/enigmacurry On Thu, Mar 19, 2015 at 1:08 PM, Ariel Weisberg ariel.weisb...@datastax.com wrote: Hi, I realized one of the documents we didn't send out was the infrastructure side changes I am looking for. This one is maybe a little rougher as it was the first one I wrote on the subject. https://docs.google.com/document/d/1Seku0vPwChbnH3uYYxon0UO-b6LDtSqluZiH--sWWi0/edit?usp=sharing The goal is to have infrastructure that gives developers as close to immediate feedback as possible on their code before they merge. Feedback that is delayed to after merging to trunk should come in a day or two and there is a product owner (Michael Shuler) responsible for making sure that issues are addressed quickly. QA is going to help by providing developers with a better tools for writing higher level functional tests that explore all of the functions together along with the configuration space without developers having to do any work other then plugging in functionality to exercise and then validate something specific. This kind of harness is hard to get right and make reliable and expressive so they have their work cut out for them. It's going to be an iterative process where the tests improve as new work introduces missing coverage and as bugs/regressions drive the introduction of new tests. The monthly retrospective (planning on doing that first of the month) is also going to help us refine the testing and development process. Ariel On Thu, Mar 19, 2015 at 7:23 AM, Jason Brown jasedbr...@gmail.com wrote: +1 to this general proposal. I think the time has finally come for us to try something new, and this sounds legit. Thanks! On Thu, Mar 19, 2015 at 12:49 AM, Phil Yang ud1...@gmail.com wrote: Can I regard the odd version as the development preview and the even version as the production ready? IMO, as a database infrastructure project, stable is more important than other kinds of projects. LTS is a good idea, but if we don't support non-LTS releases for enough time to fix their bugs, users on non-LTS release may have to upgrade a new major release to fix the bugs and may have to handle some new bugs by the new features. I'm afraid that eventually people would only think about the LTS one. 2015-03-19 8:48 GMT+08:00 Pavel Yaskevich pove...@gmail.com: +1 On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman mkjell...@internalcircle.com wrote: For most of my life I’ve lived on the software bleeding edge both personally and professionally. Maybe it’s a personal weakness, but I guess I get a thrill out of the problem solving aspect? Recently I came to a bit of an epiphany — the closer I keep to the daily build — generally the happier I am on a daily basis. Bugs happen, but for the most part (aside from show stopper bugs), pain points for myself in a given daily build can generally can be debugged to 1 or maybe 2 root causes, fixed in ~24 hours, and then life is better the next day again. In comparison, the old waterfall model generally means taking an “official” release at some point and waiting for some poor soul (or developer) to actually run the thing. No matter how good the QA team is, until it’s actually used in the real world, most bugs aren’t found. If you and your organization can wait 24 hours * number of bugs discovered after people actually started using the thing, you end up with a “usable build” around the holy-grail minor X.X.5 release of Cassandra. I love the idea of the LTS model Jonathan describes because it means more code can get real testing and “bake” for longer instead of
Re: 3.0 and the Cassandra release process
I'm taking notes from the infrastructure doc and wrote down some action items for my team: https://gist.github.com/EnigmaCurry/d53eccb55f5d0986c976 -- [image: datastax_logo.png] http://www.datastax.com/ Ryan McGuire Software Engineering Manager in Test | r...@datastax.com [image: linkedin.png] https://www.linkedin.com/in/enigmacurry [image: twitter.png] http://twitter.com/enigmacurry http://github.com/enigmacurry On Thu, Mar 19, 2015 at 1:08 PM, Ariel Weisberg ariel.weisb...@datastax.com wrote: Hi, I realized one of the documents we didn't send out was the infrastructure side changes I am looking for. This one is maybe a little rougher as it was the first one I wrote on the subject. https://docs.google.com/document/d/1Seku0vPwChbnH3uYYxon0UO-b6LDtSqluZiH--sWWi0/edit?usp=sharing The goal is to have infrastructure that gives developers as close to immediate feedback as possible on their code before they merge. Feedback that is delayed to after merging to trunk should come in a day or two and there is a product owner (Michael Shuler) responsible for making sure that issues are addressed quickly. QA is going to help by providing developers with a better tools for writing higher level functional tests that explore all of the functions together along with the configuration space without developers having to do any work other then plugging in functionality to exercise and then validate something specific. This kind of harness is hard to get right and make reliable and expressive so they have their work cut out for them. It's going to be an iterative process where the tests improve as new work introduces missing coverage and as bugs/regressions drive the introduction of new tests. The monthly retrospective (planning on doing that first of the month) is also going to help us refine the testing and development process. Ariel On Thu, Mar 19, 2015 at 7:23 AM, Jason Brown jasedbr...@gmail.com wrote: +1 to this general proposal. I think the time has finally come for us to try something new, and this sounds legit. Thanks! On Thu, Mar 19, 2015 at 12:49 AM, Phil Yang ud1...@gmail.com wrote: Can I regard the odd version as the development preview and the even version as the production ready? IMO, as a database infrastructure project, stable is more important than other kinds of projects. LTS is a good idea, but if we don't support non-LTS releases for enough time to fix their bugs, users on non-LTS release may have to upgrade a new major release to fix the bugs and may have to handle some new bugs by the new features. I'm afraid that eventually people would only think about the LTS one. 2015-03-19 8:48 GMT+08:00 Pavel Yaskevich pove...@gmail.com: +1 On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman mkjell...@internalcircle.com wrote: For most of my life I’ve lived on the software bleeding edge both personally and professionally. Maybe it’s a personal weakness, but I guess I get a thrill out of the problem solving aspect? Recently I came to a bit of an epiphany — the closer I keep to the daily build — generally the happier I am on a daily basis. Bugs happen, but for the most part (aside from show stopper bugs), pain points for myself in a given daily build can generally can be debugged to 1 or maybe 2 root causes, fixed in ~24 hours, and then life is better the next day again. In comparison, the old waterfall model generally means taking an “official” release at some point and waiting for some poor soul (or developer) to actually run the thing. No matter how good the QA team is, until it’s actually used in the real world, most bugs aren’t found. If you and your organization can wait 24 hours * number of bugs discovered after people actually started using the thing, you end up with a “usable build” around the holy-grail minor X.X.5 release of Cassandra. I love the idea of the LTS model Jonathan describes because it means more code can get real testing and “bake” for longer instead of sitting largely unused on some git repository in a datacenter far far away. A lot of code has changed between 2.0 and trunk today. The code has diverged to the point that if you write something for 2.0 (as the most stable major branch currently available), merging it forward to 3.0 or after generally means rewriting it. If the only thing that comes out of this is a smaller delta of LOC between the deployable version/branch and what we can develop against and what QA is focused on I think that’s a massive win. Something like CASSANDRA-8099 will need 2x the baking time of even many of the more risky changes the project has made. While I wouldn’t want to run a build with CASSANDRA-8099 in it anytime soon, there
Re: 3.0 and the Cassandra release process
Can I regard the odd version as the development preview and the even version as the production ready? IMO, as a database infrastructure project, stable is more important than other kinds of projects. LTS is a good idea, but if we don't support non-LTS releases for enough time to fix their bugs, users on non-LTS release may have to upgrade a new major release to fix the bugs and may have to handle some new bugs by the new features. I'm afraid that eventually people would only think about the LTS one. 2015-03-19 8:48 GMT+08:00 Pavel Yaskevich pove...@gmail.com: +1 On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman mkjell...@internalcircle.com wrote: For most of my life I’ve lived on the software bleeding edge both personally and professionally. Maybe it’s a personal weakness, but I guess I get a thrill out of the problem solving aspect? Recently I came to a bit of an epiphany — the closer I keep to the daily build — generally the happier I am on a daily basis. Bugs happen, but for the most part (aside from show stopper bugs), pain points for myself in a given daily build can generally can be debugged to 1 or maybe 2 root causes, fixed in ~24 hours, and then life is better the next day again. In comparison, the old waterfall model generally means taking an “official” release at some point and waiting for some poor soul (or developer) to actually run the thing. No matter how good the QA team is, until it’s actually used in the real world, most bugs aren’t found. If you and your organization can wait 24 hours * number of bugs discovered after people actually started using the thing, you end up with a “usable build” around the holy-grail minor X.X.5 release of Cassandra. I love the idea of the LTS model Jonathan describes because it means more code can get real testing and “bake” for longer instead of sitting largely unused on some git repository in a datacenter far far away. A lot of code has changed between 2.0 and trunk today. The code has diverged to the point that if you write something for 2.0 (as the most stable major branch currently available), merging it forward to 3.0 or after generally means rewriting it. If the only thing that comes out of this is a smaller delta of LOC between the deployable version/branch and what we can develop against and what QA is focused on I think that’s a massive win. Something like CASSANDRA-8099 will need 2x the baking time of even many of the more risky changes the project has made. While I wouldn’t want to run a build with CASSANDRA-8099 in it anytime soon, there are now hundreds of other changes blocked, most likely many containing new bugs of their own, but have no exposure at all to even the most involved C* developers. I really think this will be a huge win for the project and I’m super thankful for Sylvian, Ariel, Jonathan, Aleksey, and Jake for guiding this change to a much more sustainable release model for the entire community. best, kjellman On Mar 18, 2015, at 3:02 PM, Ariel Weisberg ariel.weisb...@datastax.com wrote: Hi, Keep in mind it is a bug fix release every month and a feature release every two months. For development that is really a two month cycle with all bug fixes being backported one release. As a developer if you want to get something in a release you have two months and you should be sizing pieces of large tasks so they ship at least every two months. Ariel On Mar 18, 2015, at 5:58 PM, Terrance Shepherd tscana...@gmail.com wrote: I like the idea but I agree that every month is a bit aggressive. I have no say but: I would say 4 releases a year instead of 12. with 2 months of new features and 1 month of bug squashing per a release. With the 4th quarter just bugs. I would also proposed 2 year LTS releases for the releases after the 4th quarter. So everyone could get a new feature release every quarter and the stability of super major versions for 2 years. On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius dbros...@mebigfatguy.com wrote: It would seem the practical implications of this is that there would be significantly more development on branches, with potentially more significant delays on merging these branches. This would imply to me that more Jenkins servers would need to be set up to handle auto-testing of more branches, as if feature work spends more time on external branches, it is then likely to be be less tested (even if by accident) as less developers would be working on that branch. Only when a feature was blessed to make it to the release-tracked branch, would it become exposed to the majority of developers/testers, etc doing normal running/playing/testing. This isn't to knock the idea in anyway, just wanted to mention what i think the outcome would be. dave On Tue, Mar
Re: 3.0 and the Cassandra release process
+1 to this general proposal. I think the time has finally come for us to try something new, and this sounds legit. Thanks! On Thu, Mar 19, 2015 at 12:49 AM, Phil Yang ud1...@gmail.com wrote: Can I regard the odd version as the development preview and the even version as the production ready? IMO, as a database infrastructure project, stable is more important than other kinds of projects. LTS is a good idea, but if we don't support non-LTS releases for enough time to fix their bugs, users on non-LTS release may have to upgrade a new major release to fix the bugs and may have to handle some new bugs by the new features. I'm afraid that eventually people would only think about the LTS one. 2015-03-19 8:48 GMT+08:00 Pavel Yaskevich pove...@gmail.com: +1 On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman mkjell...@internalcircle.com wrote: For most of my life I’ve lived on the software bleeding edge both personally and professionally. Maybe it’s a personal weakness, but I guess I get a thrill out of the problem solving aspect? Recently I came to a bit of an epiphany — the closer I keep to the daily build — generally the happier I am on a daily basis. Bugs happen, but for the most part (aside from show stopper bugs), pain points for myself in a given daily build can generally can be debugged to 1 or maybe 2 root causes, fixed in ~24 hours, and then life is better the next day again. In comparison, the old waterfall model generally means taking an “official” release at some point and waiting for some poor soul (or developer) to actually run the thing. No matter how good the QA team is, until it’s actually used in the real world, most bugs aren’t found. If you and your organization can wait 24 hours * number of bugs discovered after people actually started using the thing, you end up with a “usable build” around the holy-grail minor X.X.5 release of Cassandra. I love the idea of the LTS model Jonathan describes because it means more code can get real testing and “bake” for longer instead of sitting largely unused on some git repository in a datacenter far far away. A lot of code has changed between 2.0 and trunk today. The code has diverged to the point that if you write something for 2.0 (as the most stable major branch currently available), merging it forward to 3.0 or after generally means rewriting it. If the only thing that comes out of this is a smaller delta of LOC between the deployable version/branch and what we can develop against and what QA is focused on I think that’s a massive win. Something like CASSANDRA-8099 will need 2x the baking time of even many of the more risky changes the project has made. While I wouldn’t want to run a build with CASSANDRA-8099 in it anytime soon, there are now hundreds of other changes blocked, most likely many containing new bugs of their own, but have no exposure at all to even the most involved C* developers. I really think this will be a huge win for the project and I’m super thankful for Sylvian, Ariel, Jonathan, Aleksey, and Jake for guiding this change to a much more sustainable release model for the entire community. best, kjellman On Mar 18, 2015, at 3:02 PM, Ariel Weisberg ariel.weisb...@datastax.com wrote: Hi, Keep in mind it is a bug fix release every month and a feature release every two months. For development that is really a two month cycle with all bug fixes being backported one release. As a developer if you want to get something in a release you have two months and you should be sizing pieces of large tasks so they ship at least every two months. Ariel On Mar 18, 2015, at 5:58 PM, Terrance Shepherd tscana...@gmail.com wrote: I like the idea but I agree that every month is a bit aggressive. I have no say but: I would say 4 releases a year instead of 12. with 2 months of new features and 1 month of bug squashing per a release. With the 4th quarter just bugs. I would also proposed 2 year LTS releases for the releases after the 4th quarter. So everyone could get a new feature release every quarter and the stability of super major versions for 2 years. On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius dbros...@mebigfatguy.com wrote: It would seem the practical implications of this is that there would be significantly more development on branches, with potentially more significant delays on merging these branches. This would imply to me that more Jenkins servers would need to be set up to handle auto-testing of more branches, as if feature work spends more time on external branches, it is then likely to be be less tested (even if by accident) as less developers would be working on that branch. Only
Re: 3.0 and the Cassandra release process
Hi, I realized one of the documents we didn't send out was the infrastructure side changes I am looking for. This one is maybe a little rougher as it was the first one I wrote on the subject. https://docs.google.com/document/d/1Seku0vPwChbnH3uYYxon0UO-b6LDtSqluZiH--sWWi0/edit?usp=sharing The goal is to have infrastructure that gives developers as close to immediate feedback as possible on their code before they merge. Feedback that is delayed to after merging to trunk should come in a day or two and there is a product owner (Michael Shuler) responsible for making sure that issues are addressed quickly. QA is going to help by providing developers with a better tools for writing higher level functional tests that explore all of the functions together along with the configuration space without developers having to do any work other then plugging in functionality to exercise and then validate something specific. This kind of harness is hard to get right and make reliable and expressive so they have their work cut out for them. It's going to be an iterative process where the tests improve as new work introduces missing coverage and as bugs/regressions drive the introduction of new tests. The monthly retrospective (planning on doing that first of the month) is also going to help us refine the testing and development process. Ariel On Thu, Mar 19, 2015 at 7:23 AM, Jason Brown jasedbr...@gmail.com wrote: +1 to this general proposal. I think the time has finally come for us to try something new, and this sounds legit. Thanks! On Thu, Mar 19, 2015 at 12:49 AM, Phil Yang ud1...@gmail.com wrote: Can I regard the odd version as the development preview and the even version as the production ready? IMO, as a database infrastructure project, stable is more important than other kinds of projects. LTS is a good idea, but if we don't support non-LTS releases for enough time to fix their bugs, users on non-LTS release may have to upgrade a new major release to fix the bugs and may have to handle some new bugs by the new features. I'm afraid that eventually people would only think about the LTS one. 2015-03-19 8:48 GMT+08:00 Pavel Yaskevich pove...@gmail.com: +1 On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman mkjell...@internalcircle.com wrote: For most of my life I’ve lived on the software bleeding edge both personally and professionally. Maybe it’s a personal weakness, but I guess I get a thrill out of the problem solving aspect? Recently I came to a bit of an epiphany — the closer I keep to the daily build — generally the happier I am on a daily basis. Bugs happen, but for the most part (aside from show stopper bugs), pain points for myself in a given daily build can generally can be debugged to 1 or maybe 2 root causes, fixed in ~24 hours, and then life is better the next day again. In comparison, the old waterfall model generally means taking an “official” release at some point and waiting for some poor soul (or developer) to actually run the thing. No matter how good the QA team is, until it’s actually used in the real world, most bugs aren’t found. If you and your organization can wait 24 hours * number of bugs discovered after people actually started using the thing, you end up with a “usable build” around the holy-grail minor X.X.5 release of Cassandra. I love the idea of the LTS model Jonathan describes because it means more code can get real testing and “bake” for longer instead of sitting largely unused on some git repository in a datacenter far far away. A lot of code has changed between 2.0 and trunk today. The code has diverged to the point that if you write something for 2.0 (as the most stable major branch currently available), merging it forward to 3.0 or after generally means rewriting it. If the only thing that comes out of this is a smaller delta of LOC between the deployable version/branch and what we can develop against and what QA is focused on I think that’s a massive win. Something like CASSANDRA-8099 will need 2x the baking time of even many of the more risky changes the project has made. While I wouldn’t want to run a build with CASSANDRA-8099 in it anytime soon, there are now hundreds of other changes blocked, most likely many containing new bugs of their own, but have no exposure at all to even the most involved C* developers. I really think this will be a huge win for the project and I’m super thankful for Sylvian, Ariel, Jonathan, Aleksey, and Jake for guiding this change to a much more sustainable release model for the entire community. best, kjellman On Mar 18, 2015, at 3:02 PM, Ariel Weisberg ariel.weisb...@datastax.com wrote: Hi, Keep in mind it is a bug fix release every month and a
Re: 3.0 and the Cassandra release process
+1 On Tue, Mar 17, 2015 at 10:06 PM, Jonathan Ellis jbel...@gmail.com wrote: Cassandra 2.1 was released in September, which means that if we were on track with our stated goal of six month releases, 3.0 would be done about now. Instead, we haven't even delivered a beta. The immediate cause this time is blocking for 8099 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is that nobody should really be surprised. Something always comes up -- we've averaged about nine months since 1.0, with 2.1 taking an entire year. We could make theory align with reality by acknowledging, if nine months is our 'natural' release schedule, then so be it. But I think we can do better. Broadly speaking, we have two constituencies with Cassandra releases: First, we have the users who are building or porting an application on Cassandra. These users want the newest features to make their job easier. If 2.1.0 has a few bugs, it's not the end of the world. They have time to wait for 2.1.x to stabilize while they write their code. They would like to see us deliver on our six month schedule or even faster. Second, we have the users who have an application in production. These users, or their bosses, want Cassandra to be as stable as possible. Assuming they deploy on a stable release like 2.0.12, they don't want to touch it. They would like to see us release *less* often. (Because that means they have to do less upgrades while remaining in our backwards compatibility window.) With our current big release every X months model, these users' needs are in tension. We discussed this six months ago, and ended up with this: What if we tried a [four month] release cycle, BUT we would guarantee that you could do a rolling upgrade until we bump the supermajor version? So 2.0 could upgrade to 3.0 without having to go through 2.1. (But to go to 3.1 or 4.0 you would have to go through 3.0.) Crucially, I added Whether this is reasonable depends on how fast we can stabilize releases. 2.1.0 will be a good test of this. Unfortunately, even after DataStax hired half a dozen full-time test engineers, 2.1.0 continued the proud tradition of being unready for production use, with wait for .5 before upgrading once again looking like a good guideline. I’m starting to think that the entire model of “write a bunch of new features all at once and then try to stabilize it for release” is broken. We’ve been trying that for years and empirically speaking the evidence is that it just doesn’t work, either from a stability standpoint or even just shipping on time. A big reason that it takes us so long to stabilize new releases now is that, because our major release cycle is so long, it’s super tempting to slip in “just one” new feature into bugfix releases, and I’m as guilty of that as anyone. For similar reasons, it’s difficult to do a meaningful freeze with big feature releases. A look at 3.0 shows why: we have 8099 coming, but we also have significant work done (but not finished) on 6230, 7970, 6696, and 6477, all of which are meaningful improvements that address demonstrated user pain. So if we keep doing what we’ve been doing, our choices are to either delay 3.0 further while we finish and stabilize these, or we wait nine months to a year for the next release. Either way, one of our constituencies gets disappointed. So, I’d like to try something different. I think we were on the right track with shorter releases with more compatibility. But I’d like to throw in a twist. Intel cuts down on risk with a “tick-tock” schedule for new architectures and process shrinks instead of trying to do both at once. We can do something similar here: One month releases. Period. If it’s not done, it can wait. *Every other release only accepts bug fixes.* By itself, one-month releases are going to dramatically reduce the complexity of testing and debugging new releases -- and bugs that do slip past us will only affect a smaller percentage of users, avoiding the “big release has a bunch of bugs no one has seen before and pretty much everyone is hit by something” scenario. But by adding in the second rule, I think we have a real chance to make a quantum leap here: stable, production-ready releases every two months. So here is my proposal for 3.0: We’re just about ready to start serious review of 8099. When that’s done, we branch 3.0 and cut a beta and then release candidates. Whatever isn’t done by then, has to wait; unlike prior betas, we will only accept bug fixes into 3.0 after branching. One month after 3.0, we will ship 3.1 (with new features). At the same time, we will branch 3.2. New features in trunk will go into 3.3. The 3.2 branch will only get bug fixes. We will maintain backwards compatibility for all of 3.x; eventually (no less than a year) we will pick a release to be 4.0, and drop deprecated features and old backwards
Re: 3.0 and the Cassandra release process
+1 I also appreciate Ariel’s effort. The improved CI integration is great - being able to run a huge amount of tests on different platforms against one's development branch is a huge improvement. Am 17.03.2015 um 22:06 schrieb Jonathan Ellis jbel...@gmail.com: Cassandra 2.1 was released in September, which means that if we were on track with our stated goal of six month releases, 3.0 would be done about now. Instead, we haven't even delivered a beta. The immediate cause this time is blocking for 8099 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is that nobody should really be surprised. Something always comes up -- we've averaged about nine months since 1.0, with 2.1 taking an entire year. We could make theory align with reality by acknowledging, if nine months is our 'natural' release schedule, then so be it. But I think we can do better. Broadly speaking, we have two constituencies with Cassandra releases: First, we have the users who are building or porting an application on Cassandra. These users want the newest features to make their job easier. If 2.1.0 has a few bugs, it's not the end of the world. They have time to wait for 2.1.x to stabilize while they write their code. They would like to see us deliver on our six month schedule or even faster. Second, we have the users who have an application in production. These users, or their bosses, want Cassandra to be as stable as possible. Assuming they deploy on a stable release like 2.0.12, they don't want to touch it. They would like to see us release *less* often. (Because that means they have to do less upgrades while remaining in our backwards compatibility window.) With our current big release every X months model, these users' needs are in tension. We discussed this six months ago, and ended up with this: What if we tried a [four month] release cycle, BUT we would guarantee that you could do a rolling upgrade until we bump the supermajor version? So 2.0 could upgrade to 3.0 without having to go through 2.1. (But to go to 3.1 or 4.0 you would have to go through 3.0.) Crucially, I added Whether this is reasonable depends on how fast we can stabilize releases. 2.1.0 will be a good test of this. Unfortunately, even after DataStax hired half a dozen full-time test engineers, 2.1.0 continued the proud tradition of being unready for production use, with wait for .5 before upgrading once again looking like a good guideline. I’m starting to think that the entire model of “write a bunch of new features all at once and then try to stabilize it for release” is broken. We’ve been trying that for years and empirically speaking the evidence is that it just doesn’t work, either from a stability standpoint or even just shipping on time. A big reason that it takes us so long to stabilize new releases now is that, because our major release cycle is so long, it’s super tempting to slip in “just one” new feature into bugfix releases, and I’m as guilty of that as anyone. For similar reasons, it’s difficult to do a meaningful freeze with big feature releases. A look at 3.0 shows why: we have 8099 coming, but we also have significant work done (but not finished) on 6230, 7970, 6696, and 6477, all of which are meaningful improvements that address demonstrated user pain. So if we keep doing what we’ve been doing, our choices are to either delay 3.0 further while we finish and stabilize these, or we wait nine months to a year for the next release. Either way, one of our constituencies gets disappointed. So, I’d like to try something different. I think we were on the right track with shorter releases with more compatibility. But I’d like to throw in a twist. Intel cuts down on risk with a “tick-tock” schedule for new architectures and process shrinks instead of trying to do both at once. We can do something similar here: One month releases. Period. If it’s not done, it can wait. *Every other release only accepts bug fixes.* By itself, one-month releases are going to dramatically reduce the complexity of testing and debugging new releases -- and bugs that do slip past us will only affect a smaller percentage of users, avoiding the “big release has a bunch of bugs no one has seen before and pretty much everyone is hit by something” scenario. But by adding in the second rule, I think we have a real chance to make a quantum leap here: stable, production-ready releases every two months. So here is my proposal for 3.0: We’re just about ready to start serious review of 8099. When that’s done, we branch 3.0 and cut a beta and then release candidates. Whatever isn’t done by then, has to wait; unlike prior betas, we will only accept bug fixes into 3.0 after branching. One month after 3.0, we will ship 3.1 (with new features). At the same time, we will branch 3.2. New features in trunk will go into 3.3. The 3.2
Re: 3.0 and the Cassandra release process
+1. This sounds like a step in a better direction. Gary. On Tue, Mar 17, 2015 at 4:06 PM, Jonathan Ellis jbel...@gmail.com wrote: Cassandra 2.1 was released in September, which means that if we were on track with our stated goal of six month releases, 3.0 would be done about now. Instead, we haven't even delivered a beta. The immediate cause this time is blocking for 8099 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is that nobody should really be surprised. Something always comes up -- we've averaged about nine months since 1.0, with 2.1 taking an entire year. We could make theory align with reality by acknowledging, if nine months is our 'natural' release schedule, then so be it. But I think we can do better. Broadly speaking, we have two constituencies with Cassandra releases: First, we have the users who are building or porting an application on Cassandra. These users want the newest features to make their job easier. If 2.1.0 has a few bugs, it's not the end of the world. They have time to wait for 2.1.x to stabilize while they write their code. They would like to see us deliver on our six month schedule or even faster. Second, we have the users who have an application in production. These users, or their bosses, want Cassandra to be as stable as possible. Assuming they deploy on a stable release like 2.0.12, they don't want to touch it. They would like to see us release *less* often. (Because that means they have to do less upgrades while remaining in our backwards compatibility window.) With our current big release every X months model, these users' needs are in tension. We discussed this six months ago, and ended up with this: What if we tried a [four month] release cycle, BUT we would guarantee that you could do a rolling upgrade until we bump the supermajor version? So 2.0 could upgrade to 3.0 without having to go through 2.1. (But to go to 3.1 or 4.0 you would have to go through 3.0.) Crucially, I added Whether this is reasonable depends on how fast we can stabilize releases. 2.1.0 will be a good test of this. Unfortunately, even after DataStax hired half a dozen full-time test engineers, 2.1.0 continued the proud tradition of being unready for production use, with wait for .5 before upgrading once again looking like a good guideline. I'm starting to think that the entire model of write a bunch of new features all at once and then try to stabilize it for release is broken. We've been trying that for years and empirically speaking the evidence is that it just doesn't work, either from a stability standpoint or even just shipping on time. A big reason that it takes us so long to stabilize new releases now is that, because our major release cycle is so long, it's super tempting to slip in just one new feature into bugfix releases, and I'm as guilty of that as anyone. For similar reasons, it's difficult to do a meaningful freeze with big feature releases. A look at 3.0 shows why: we have 8099 coming, but we also have significant work done (but not finished) on 6230, 7970, 6696, and 6477, all of which are meaningful improvements that address demonstrated user pain. So if we keep doing what we've been doing, our choices are to either delay 3.0 further while we finish and stabilize these, or we wait nine months to a year for the next release. Either way, one of our constituencies gets disappointed. So, I'd like to try something different. I think we were on the right track with shorter releases with more compatibility. But I'd like to throw in a twist. Intel cuts down on risk with a tick-tock schedule for new architectures and process shrinks instead of trying to do both at once. We can do something similar here: One month releases. Period. If it's not done, it can wait. *Every other release only accepts bug fixes.* By itself, one-month releases are going to dramatically reduce the complexity of testing and debugging new releases -- and bugs that do slip past us will only affect a smaller percentage of users, avoiding the big release has a bunch of bugs no one has seen before and pretty much everyone is hit by something scenario. But by adding in the second rule, I think we have a real chance to make a quantum leap here: stable, production-ready releases every two months. So here is my proposal for 3.0: We're just about ready to start serious review of 8099. When that's done, we branch 3.0 and cut a beta and then release candidates. Whatever isn't done by then, has to wait; unlike prior betas, we will only accept bug fixes into 3.0 after branching. One month after 3.0, we will ship 3.1 (with new features). At the same time, we will branch 3.2. New features in trunk will go into 3.3. The 3.2 branch will only get bug fixes. We will maintain backwards compatibility for all of 3.x; eventually (no less than a year) we will pick a release to be 4.0,
Re: 3.0 and the Cassandra release process
+1 On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis jbel...@gmail.com wrote: Cassandra 2.1 was released in September, which means that if we were on track with our stated goal of six month releases, 3.0 would be done about now. Instead, we haven't even delivered a beta. The immediate cause this time is blocking for 8099 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is that nobody should really be surprised. Something always comes up -- we've averaged about nine months since 1.0, with 2.1 taking an entire year. We could make theory align with reality by acknowledging, if nine months is our 'natural' release schedule, then so be it. But I think we can do better. Broadly speaking, we have two constituencies with Cassandra releases: First, we have the users who are building or porting an application on Cassandra. These users want the newest features to make their job easier. If 2.1.0 has a few bugs, it's not the end of the world. They have time to wait for 2.1.x to stabilize while they write their code. They would like to see us deliver on our six month schedule or even faster. Second, we have the users who have an application in production. These users, or their bosses, want Cassandra to be as stable as possible. Assuming they deploy on a stable release like 2.0.12, they don't want to touch it. They would like to see us release *less* often. (Because that means they have to do less upgrades while remaining in our backwards compatibility window.) With our current big release every X months model, these users' needs are in tension. We discussed this six months ago, and ended up with this: What if we tried a [four month] release cycle, BUT we would guarantee that you could do a rolling upgrade until we bump the supermajor version? So 2.0 could upgrade to 3.0 without having to go through 2.1. (But to go to 3.1 or 4.0 you would have to go through 3.0.) Crucially, I added Whether this is reasonable depends on how fast we can stabilize releases. 2.1.0 will be a good test of this. Unfortunately, even after DataStax hired half a dozen full-time test engineers, 2.1.0 continued the proud tradition of being unready for production use, with wait for .5 before upgrading once again looking like a good guideline. I’m starting to think that the entire model of “write a bunch of new features all at once and then try to stabilize it for release” is broken. We’ve been trying that for years and empirically speaking the evidence is that it just doesn’t work, either from a stability standpoint or even just shipping on time. A big reason that it takes us so long to stabilize new releases now is that, because our major release cycle is so long, it’s super tempting to slip in “just one” new feature into bugfix releases, and I’m as guilty of that as anyone. For similar reasons, it’s difficult to do a meaningful freeze with big feature releases. A look at 3.0 shows why: we have 8099 coming, but we also have significant work done (but not finished) on 6230, 7970, 6696, and 6477, all of which are meaningful improvements that address demonstrated user pain. So if we keep doing what we’ve been doing, our choices are to either delay 3.0 further while we finish and stabilize these, or we wait nine months to a year for the next release. Either way, one of our constituencies gets disappointed. So, I’d like to try something different. I think we were on the right track with shorter releases with more compatibility. But I’d like to throw in a twist. Intel cuts down on risk with a “tick-tock” schedule for new architectures and process shrinks instead of trying to do both at once. We can do something similar here: One month releases. Period. If it’s not done, it can wait. *Every other release only accepts bug fixes.* By itself, one-month releases are going to dramatically reduce the complexity of testing and debugging new releases -- and bugs that do slip past us will only affect a smaller percentage of users, avoiding the “big release has a bunch of bugs no one has seen before and pretty much everyone is hit by something” scenario. But by adding in the second rule, I think we have a real chance to make a quantum leap here: stable, production-ready releases every two months. So here is my proposal for 3.0: We’re just about ready to start serious review of 8099. When that’s done, we branch 3.0 and cut a beta and then release candidates. Whatever isn’t done by then, has to wait; unlike prior betas, we will only accept bug fixes into 3.0 after branching. One month after 3.0, we will ship 3.1 (with new features). At the same time, we will branch 3.2. New features in trunk will go into 3.3. The 3.2 branch will only get bug fixes. We will maintain backwards compatibility for all of 3.x; eventually (no less than a year) we will pick a release to be 4.0, and drop deprecated features and old backwards
Re: 3.0 and the Cassandra release process
+1 -- AY On March 17, 2015 at 14:07:03, Jonathan Ellis (jbel...@gmail.com) wrote: Cassandra 2.1 was released in September, which means that if we were on track with our stated goal of six month releases, 3.0 would be done about now. Instead, we haven't even delivered a beta. The immediate cause this time is blocking for 8099 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is that nobody should really be surprised. Something always comes up -- we've averaged about nine months since 1.0, with 2.1 taking an entire year. We could make theory align with reality by acknowledging, if nine months is our 'natural' release schedule, then so be it. But I think we can do better. Broadly speaking, we have two constituencies with Cassandra releases: First, we have the users who are building or porting an application on Cassandra. These users want the newest features to make their job easier. If 2.1.0 has a few bugs, it's not the end of the world. They have time to wait for 2.1.x to stabilize while they write their code. They would like to see us deliver on our six month schedule or even faster. Second, we have the users who have an application in production. These users, or their bosses, want Cassandra to be as stable as possible. Assuming they deploy on a stable release like 2.0.12, they don't want to touch it. They would like to see us release *less* often. (Because that means they have to do less upgrades while remaining in our backwards compatibility window.) With our current big release every X months model, these users' needs are in tension. We discussed this six months ago, and ended up with this: What if we tried a [four month] release cycle, BUT we would guarantee that you could do a rolling upgrade until we bump the supermajor version? So 2.0 could upgrade to 3.0 without having to go through 2.1. (But to go to 3.1 or 4.0 you would have to go through 3.0.) Crucially, I added Whether this is reasonable depends on how fast we can stabilize releases. 2.1.0 will be a good test of this. Unfortunately, even after DataStax hired half a dozen full-time test engineers, 2.1.0 continued the proud tradition of being unready for production use, with wait for .5 before upgrading once again looking like a good guideline. I’m starting to think that the entire model of “write a bunch of new features all at once and then try to stabilize it for release” is broken. We’ve been trying that for years and empirically speaking the evidence is that it just doesn’t work, either from a stability standpoint or even just shipping on time. A big reason that it takes us so long to stabilize new releases now is that, because our major release cycle is so long, it’s super tempting to slip in “just one” new feature into bugfix releases, and I’m as guilty of that as anyone. For similar reasons, it’s difficult to do a meaningful freeze with big feature releases. A look at 3.0 shows why: we have 8099 coming, but we also have significant work done (but not finished) on 6230, 7970, 6696, and 6477, all of which are meaningful improvements that address demonstrated user pain. So if we keep doing what we’ve been doing, our choices are to either delay 3.0 further while we finish and stabilize these, or we wait nine months to a year for the next release. Either way, one of our constituencies gets disappointed. So, I’d like to try something different. I think we were on the right track with shorter releases with more compatibility. But I’d like to throw in a twist. Intel cuts down on risk with a “tick-tock” schedule for new architectures and process shrinks instead of trying to do both at once. We can do something similar here: One month releases. Period. If it’s not done, it can wait. *Every other release only accepts bug fixes.* By itself, one-month releases are going to dramatically reduce the complexity of testing and debugging new releases -- and bugs that do slip past us will only affect a smaller percentage of users, avoiding the “big release has a bunch of bugs no one has seen before and pretty much everyone is hit by something” scenario. But by adding in the second rule, I think we have a real chance to make a quantum leap here: stable, production-ready releases every two months. So here is my proposal for 3.0: We’re just about ready to start serious review of 8099. When that’s done, we branch 3.0 and cut a beta and then release candidates. Whatever isn’t done by then, has to wait; unlike prior betas, we will only accept bug fixes into 3.0 after branching. One month after 3.0, we will ship 3.1 (with new features). At the same time, we will branch 3.2. New features in trunk will go into 3.3. The 3.2 branch will only get bug fixes. We will maintain backwards compatibility for all of 3.x; eventually (no less than a year) we will pick a release to
Re: 3.0 and the Cassandra release process
+1 On Wed, Mar 18, 2015 at 7:54 AM, Jake Luciani jak...@gmail.com wrote: +1 On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis jbel...@gmail.com wrote: Cassandra 2.1 was released in September, which means that if we were on track with our stated goal of six month releases, 3.0 would be done about now. Instead, we haven't even delivered a beta. The immediate cause this time is blocking for 8099 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is that nobody should really be surprised. Something always comes up -- we've averaged about nine months since 1.0, with 2.1 taking an entire year. We could make theory align with reality by acknowledging, if nine months is our 'natural' release schedule, then so be it. But I think we can do better. Broadly speaking, we have two constituencies with Cassandra releases: First, we have the users who are building or porting an application on Cassandra. These users want the newest features to make their job easier. If 2.1.0 has a few bugs, it's not the end of the world. They have time to wait for 2.1.x to stabilize while they write their code. They would like to see us deliver on our six month schedule or even faster. Second, we have the users who have an application in production. These users, or their bosses, want Cassandra to be as stable as possible. Assuming they deploy on a stable release like 2.0.12, they don't want to touch it. They would like to see us release *less* often. (Because that means they have to do less upgrades while remaining in our backwards compatibility window.) With our current big release every X months model, these users' needs are in tension. We discussed this six months ago, and ended up with this: What if we tried a [four month] release cycle, BUT we would guarantee that you could do a rolling upgrade until we bump the supermajor version? So 2.0 could upgrade to 3.0 without having to go through 2.1. (But to go to 3.1 or 4.0 you would have to go through 3.0.) Crucially, I added Whether this is reasonable depends on how fast we can stabilize releases. 2.1.0 will be a good test of this. Unfortunately, even after DataStax hired half a dozen full-time test engineers, 2.1.0 continued the proud tradition of being unready for production use, with wait for .5 before upgrading once again looking like a good guideline. I’m starting to think that the entire model of “write a bunch of new features all at once and then try to stabilize it for release” is broken. We’ve been trying that for years and empirically speaking the evidence is that it just doesn’t work, either from a stability standpoint or even just shipping on time. A big reason that it takes us so long to stabilize new releases now is that, because our major release cycle is so long, it’s super tempting to slip in “just one” new feature into bugfix releases, and I’m as guilty of that as anyone. For similar reasons, it’s difficult to do a meaningful freeze with big feature releases. A look at 3.0 shows why: we have 8099 coming, but we also have significant work done (but not finished) on 6230, 7970, 6696, and 6477, all of which are meaningful improvements that address demonstrated user pain. So if we keep doing what we’ve been doing, our choices are to either delay 3.0 further while we finish and stabilize these, or we wait nine months to a year for the next release. Either way, one of our constituencies gets disappointed. So, I’d like to try something different. I think we were on the right track with shorter releases with more compatibility. But I’d like to throw in a twist. Intel cuts down on risk with a “tick-tock” schedule for new architectures and process shrinks instead of trying to do both at once. We can do something similar here: One month releases. Period. If it’s not done, it can wait. *Every other release only accepts bug fixes.* By itself, one-month releases are going to dramatically reduce the complexity of testing and debugging new releases -- and bugs that do slip past us will only affect a smaller percentage of users, avoiding the “big release has a bunch of bugs no one has seen before and pretty much everyone is hit by something” scenario. But by adding in the second rule, I think we have a real chance to make a quantum leap here: stable, production-ready releases every two months. So here is my proposal for 3.0: We’re just about ready to start serious review of 8099. When that’s done, we branch 3.0 and cut a beta and then release candidates. Whatever isn’t done by then, has to wait; unlike prior betas, we will only accept bug fixes into 3.0 after branching. One month after 3.0, we will ship 3.1 (with new features). At the same time, we will branch 3.2. New features in trunk will go into 3.3. The 3.2 branch will only get bug
Re: 3.0 and the Cassandra release process
For most of my life I’ve lived on the software bleeding edge both personally and professionally. Maybe it’s a personal weakness, but I guess I get a thrill out of the problem solving aspect? Recently I came to a bit of an epiphany — the closer I keep to the daily build — generally the happier I am on a daily basis. Bugs happen, but for the most part (aside from show stopper bugs), pain points for myself in a given daily build can generally can be debugged to 1 or maybe 2 root causes, fixed in ~24 hours, and then life is better the next day again. In comparison, the old waterfall model generally means taking an “official” release at some point and waiting for some poor soul (or developer) to actually run the thing. No matter how good the QA team is, until it’s actually used in the real world, most bugs aren’t found. If you and your organization can wait 24 hours * number of bugs discovered after people actually started using the thing, you end up with a “usable build” around the holy-grail minor X.X.5 release of Cassandra. I love the idea of the LTS model Jonathan describes because it means more code can get real testing and “bake” for longer instead of sitting largely unused on some git repository in a datacenter far far away. A lot of code has changed between 2.0 and trunk today. The code has diverged to the point that if you write something for 2.0 (as the most stable major branch currently available), merging it forward to 3.0 or after generally means rewriting it. If the only thing that comes out of this is a smaller delta of LOC between the deployable version/branch and what we can develop against and what QA is focused on I think that’s a massive win. Something like CASSANDRA-8099 will need 2x the baking time of even many of the more risky changes the project has made. While I wouldn’t want to run a build with CASSANDRA-8099 in it anytime soon, there are now hundreds of other changes blocked, most likely many containing new bugs of their own, but have no exposure at all to even the most involved C* developers. I really think this will be a huge win for the project and I’m super thankful for Sylvian, Ariel, Jonathan, Aleksey, and Jake for guiding this change to a much more sustainable release model for the entire community. best, kjellman On Mar 18, 2015, at 3:02 PM, Ariel Weisberg ariel.weisb...@datastax.com wrote: Hi, Keep in mind it is a bug fix release every month and a feature release every two months. For development that is really a two month cycle with all bug fixes being backported one release. As a developer if you want to get something in a release you have two months and you should be sizing pieces of large tasks so they ship at least every two months. Ariel On Mar 18, 2015, at 5:58 PM, Terrance Shepherd tscana...@gmail.com wrote: I like the idea but I agree that every month is a bit aggressive. I have no say but: I would say 4 releases a year instead of 12. with 2 months of new features and 1 month of bug squashing per a release. With the 4th quarter just bugs. I would also proposed 2 year LTS releases for the releases after the 4th quarter. So everyone could get a new feature release every quarter and the stability of super major versions for 2 years. On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius dbros...@mebigfatguy.com wrote: It would seem the practical implications of this is that there would be significantly more development on branches, with potentially more significant delays on merging these branches. This would imply to me that more Jenkins servers would need to be set up to handle auto-testing of more branches, as if feature work spends more time on external branches, it is then likely to be be less tested (even if by accident) as less developers would be working on that branch. Only when a feature was blessed to make it to the release-tracked branch, would it become exposed to the majority of developers/testers, etc doing normal running/playing/testing. This isn't to knock the idea in anyway, just wanted to mention what i think the outcome would be. dave On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis jbel...@gmail.com wrote: Cassandra 2.1 was released in September, which means that if we were on track with our stated goal of six month releases, 3.0 would be done about now. Instead, we haven't even delivered a beta. The immediate cause this time is blocking for 8099 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is that nobody should really be surprised. Something always comes up -- we've averaged about nine months since 1.0, with 2.1 taking an entire year. We could make theory align with reality by acknowledging, if nine months is our 'natural' release schedule, then so be it. But I think we can do better. Broadly speaking, we have two constituencies with Cassandra releases: First, we have the users who are building or
Re: 3.0 and the Cassandra release process
+1 On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman mkjell...@internalcircle.com wrote: For most of my life I’ve lived on the software bleeding edge both personally and professionally. Maybe it’s a personal weakness, but I guess I get a thrill out of the problem solving aspect? Recently I came to a bit of an epiphany — the closer I keep to the daily build — generally the happier I am on a daily basis. Bugs happen, but for the most part (aside from show stopper bugs), pain points for myself in a given daily build can generally can be debugged to 1 or maybe 2 root causes, fixed in ~24 hours, and then life is better the next day again. In comparison, the old waterfall model generally means taking an “official” release at some point and waiting for some poor soul (or developer) to actually run the thing. No matter how good the QA team is, until it’s actually used in the real world, most bugs aren’t found. If you and your organization can wait 24 hours * number of bugs discovered after people actually started using the thing, you end up with a “usable build” around the holy-grail minor X.X.5 release of Cassandra. I love the idea of the LTS model Jonathan describes because it means more code can get real testing and “bake” for longer instead of sitting largely unused on some git repository in a datacenter far far away. A lot of code has changed between 2.0 and trunk today. The code has diverged to the point that if you write something for 2.0 (as the most stable major branch currently available), merging it forward to 3.0 or after generally means rewriting it. If the only thing that comes out of this is a smaller delta of LOC between the deployable version/branch and what we can develop against and what QA is focused on I think that’s a massive win. Something like CASSANDRA-8099 will need 2x the baking time of even many of the more risky changes the project has made. While I wouldn’t want to run a build with CASSANDRA-8099 in it anytime soon, there are now hundreds of other changes blocked, most likely many containing new bugs of their own, but have no exposure at all to even the most involved C* developers. I really think this will be a huge win for the project and I’m super thankful for Sylvian, Ariel, Jonathan, Aleksey, and Jake for guiding this change to a much more sustainable release model for the entire community. best, kjellman On Mar 18, 2015, at 3:02 PM, Ariel Weisberg ariel.weisb...@datastax.com wrote: Hi, Keep in mind it is a bug fix release every month and a feature release every two months. For development that is really a two month cycle with all bug fixes being backported one release. As a developer if you want to get something in a release you have two months and you should be sizing pieces of large tasks so they ship at least every two months. Ariel On Mar 18, 2015, at 5:58 PM, Terrance Shepherd tscana...@gmail.com wrote: I like the idea but I agree that every month is a bit aggressive. I have no say but: I would say 4 releases a year instead of 12. with 2 months of new features and 1 month of bug squashing per a release. With the 4th quarter just bugs. I would also proposed 2 year LTS releases for the releases after the 4th quarter. So everyone could get a new feature release every quarter and the stability of super major versions for 2 years. On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius dbros...@mebigfatguy.com wrote: It would seem the practical implications of this is that there would be significantly more development on branches, with potentially more significant delays on merging these branches. This would imply to me that more Jenkins servers would need to be set up to handle auto-testing of more branches, as if feature work spends more time on external branches, it is then likely to be be less tested (even if by accident) as less developers would be working on that branch. Only when a feature was blessed to make it to the release-tracked branch, would it become exposed to the majority of developers/testers, etc doing normal running/playing/testing. This isn't to knock the idea in anyway, just wanted to mention what i think the outcome would be. dave On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis jbel...@gmail.com wrote: Cassandra 2.1 was released in September, which means that if we were on track with our stated goal of six month releases, 3.0 would be done about now. Instead, we haven't even delivered a beta. The immediate cause this time is blocking for 8099 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is that nobody should really be surprised. Something always comes up -- we've averaged about nine months since 1.0, with 2.1 taking an entire year. We could make theory align with reality by acknowledging, if nine months is our 'natural' release schedule, then so be it.
Re: 3.0 and the Cassandra release process
If every other release is a bug fix release, would the versioning go: 3.1.0 -- feature release 3.1.1 -- bug fix release Eventually it seems like it might be possible to be able to push out a bug fix release more frequently than once a month? On Wed, Mar 18, 2015 at 7:59 AM Josh McKenzie josh.mcken...@datastax.com wrote: +1 On Wed, Mar 18, 2015 at 7:54 AM, Jake Luciani jak...@gmail.com wrote: +1 On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis jbel...@gmail.com wrote: Cassandra 2.1 was released in September, which means that if we were on track with our stated goal of six month releases, 3.0 would be done about now. Instead, we haven't even delivered a beta. The immediate cause this time is blocking for 8099 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is that nobody should really be surprised. Something always comes up -- we've averaged about nine months since 1.0, with 2.1 taking an entire year. We could make theory align with reality by acknowledging, if nine months is our 'natural' release schedule, then so be it. But I think we can do better. Broadly speaking, we have two constituencies with Cassandra releases: First, we have the users who are building or porting an application on Cassandra. These users want the newest features to make their job easier. If 2.1.0 has a few bugs, it's not the end of the world. They have time to wait for 2.1.x to stabilize while they write their code. They would like to see us deliver on our six month schedule or even faster. Second, we have the users who have an application in production. These users, or their bosses, want Cassandra to be as stable as possible. Assuming they deploy on a stable release like 2.0.12, they don't want to touch it. They would like to see us release *less* often. (Because that means they have to do less upgrades while remaining in our backwards compatibility window.) With our current big release every X months model, these users' needs are in tension. We discussed this six months ago, and ended up with this: What if we tried a [four month] release cycle, BUT we would guarantee that you could do a rolling upgrade until we bump the supermajor version? So 2.0 could upgrade to 3.0 without having to go through 2.1. (But to go to 3.1 or 4.0 you would have to go through 3.0.) Crucially, I added Whether this is reasonable depends on how fast we can stabilize releases. 2.1.0 will be a good test of this. Unfortunately, even after DataStax hired half a dozen full-time test engineers, 2.1.0 continued the proud tradition of being unready for production use, with wait for .5 before upgrading once again looking like a good guideline. I’m starting to think that the entire model of “write a bunch of new features all at once and then try to stabilize it for release” is broken. We’ve been trying that for years and empirically speaking the evidence is that it just doesn’t work, either from a stability standpoint or even just shipping on time. A big reason that it takes us so long to stabilize new releases now is that, because our major release cycle is so long, it’s super tempting to slip in “just one” new feature into bugfix releases, and I’m as guilty of that as anyone. For similar reasons, it’s difficult to do a meaningful freeze with big feature releases. A look at 3.0 shows why: we have 8099 coming, but we also have significant work done (but not finished) on 6230, 7970, 6696, and 6477, all of which are meaningful improvements that address demonstrated user pain. So if we keep doing what we’ve been doing, our choices are to either delay 3.0 further while we finish and stabilize these, or we wait nine months to a year for the next release. Either way, one of our constituencies gets disappointed. So, I’d like to try something different. I think we were on the right track with shorter releases with more compatibility. But I’d like to throw in a twist. Intel cuts down on risk with a “tick-tock” schedule for new architectures and process shrinks instead of trying to do both at once. We can do something similar here: One month releases. Period. If it’s not done, it can wait. *Every other release only accepts bug fixes.* By itself, one-month releases are going to dramatically reduce the complexity of testing and debugging new releases -- and bugs that do slip past us will only affect a smaller percentage of users, avoiding the “big release has a bunch of bugs no one has seen before and pretty much everyone is hit by something” scenario. But by adding in the second rule, I think we have a real chance to make a quantum leap here: stable, production-ready releases every two months. So here is my
Re: 3.0 and the Cassandra release process
I like the idea but I agree that every month is a bit aggressive. I have no say but: I would say 4 releases a year instead of 12. with 2 months of new features and 1 month of bug squashing per a release. With the 4th quarter just bugs. I would also proposed 2 year LTS releases for the releases after the 4th quarter. So everyone could get a new feature release every quarter and the stability of super major versions for 2 years. On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius dbros...@mebigfatguy.com wrote: It would seem the practical implications of this is that there would be significantly more development on branches, with potentially more significant delays on merging these branches. This would imply to me that more Jenkins servers would need to be set up to handle auto-testing of more branches, as if feature work spends more time on external branches, it is then likely to be be less tested (even if by accident) as less developers would be working on that branch. Only when a feature was blessed to make it to the release-tracked branch, would it become exposed to the majority of developers/testers, etc doing normal running/playing/testing. This isn't to knock the idea in anyway, just wanted to mention what i think the outcome would be. dave On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis jbel...@gmail.com wrote: Cassandra 2.1 was released in September, which means that if we were on track with our stated goal of six month releases, 3.0 would be done about now. Instead, we haven't even delivered a beta. The immediate cause this time is blocking for 8099 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is that nobody should really be surprised. Something always comes up -- we've averaged about nine months since 1.0, with 2.1 taking an entire year. We could make theory align with reality by acknowledging, if nine months is our 'natural' release schedule, then so be it. But I think we can do better. Broadly speaking, we have two constituencies with Cassandra releases: First, we have the users who are building or porting an application on Cassandra. These users want the newest features to make their job easier. If 2.1.0 has a few bugs, it's not the end of the world. They have time to wait for 2.1.x to stabilize while they write their code. They would like to see us deliver on our six month schedule or even faster. Second, we have the users who have an application in production. These users, or their bosses, want Cassandra to be as stable as possible. Assuming they deploy on a stable release like 2.0.12, they don't want to touch it. They would like to see us release *less* often. (Because that means they have to do less upgrades while remaining in our backwards compatibility window.) With our current big release every X months model, these users' needs are in tension. We discussed this six months ago, and ended up with this: What if we tried a [four month] release cycle, BUT we would guarantee that you could do a rolling upgrade until we bump the supermajor version? So 2.0 could upgrade to 3.0 without having to go through 2.1. (But to go to 3.1 or 4.0 you would have to go through 3.0.) Crucially, I added Whether this is reasonable depends on how fast we can stabilize releases. 2.1.0 will be a good test of this. Unfortunately, even after DataStax hired half a dozen full-time test engineers, 2.1.0 continued the proud tradition of being unready for production use, with wait for .5 before upgrading once again looking like a good guideline. I’m starting to think that the entire model of “write a bunch of new features all at once and then try to stabilize it for release” is broken. We’ve been trying that for years and empirically speaking the evidence is that it just doesn’t work, either from a stability standpoint or even just shipping on time. A big reason that it takes us so long to stabilize new releases now is that, because our major release cycle is so long, it’s super tempting to slip in “just one” new feature into bugfix releases, and I’m as guilty of that as anyone. For similar reasons, it’s difficult to do a meaningful freeze with big feature releases. A look at 3.0 shows why: we have 8099 coming, but we also have significant work done (but not finished) on 6230, 7970, 6696, and 6477, all of which are meaningful improvements that address demonstrated user pain. So if we keep doing what we’ve been doing, our choices are to either delay 3.0 further while we finish and stabilize these, or we wait nine months to a year for the next release. Either way, one of our constituencies gets disappointed. So, I’d like to try something different. I think we were on the right track with shorter
Re: 3.0 and the Cassandra release process
Hi, Long lived feature branches are already a thing and orthogonal IMO to release frequency. The goal is that developers will implement larger features as smaller tested components that have already shipped. Some times this means working in a less destructive fashion so you can always ship a working implementation of everything (which is a mixed bag). Developers should be able to put their work on trunk faster because they will know before the merge what the impact of their changes will be. That is why we are emphasizing have Jenkin’s run on all commits (trunk and branch). We want the testing that is performed on branches to be as close to the testing performed on trunk. Once something is merged to trunk we want it to be about as tested as it is going to get within a day or two. Part of releasing more frequently is getting away from relying on developers/testers running things and moving towards automated testing that exercises the database the same way users do with the same expectations of correctness. We also have to address the process issues that are causing the tests we have to demonstrate that trunk is not releasable on a regular basis. Ariel On Mar 18, 2015, at 5:34 PM, Dave Brosius dbros...@mebigfatguy.com wrote: It would seem the practical implications of this is that there would be significantly more development on branches, with potentially more significant delays on merging these branches. This would imply to me that more Jenkins servers would need to be set up to handle auto-testing of more branches, as if feature work spends more time on external branches, it is then likely to be be less tested (even if by accident) as less developers would be working on that branch. Only when a feature was blessed to make it to the release-tracked branch, would it become exposed to the majority of developers/testers, etc doing normal running/playing/testing. This isn't to knock the idea in anyway, just wanted to mention what i think the outcome would be. dave On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis jbel...@gmail.com wrote: Cassandra 2.1 was released in September, which means that if we were on track with our stated goal of six month releases, 3.0 would be done about now. Instead, we haven't even delivered a beta. The immediate cause this time is blocking for 8099 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is that nobody should really be surprised. Something always comes up -- we've averaged about nine months since 1.0, with 2.1 taking an entire year. We could make theory align with reality by acknowledging, if nine months is our 'natural' release schedule, then so be it. But I think we can do better. Broadly speaking, we have two constituencies with Cassandra releases: First, we have the users who are building or porting an application on Cassandra. These users want the newest features to make their job easier. If 2.1.0 has a few bugs, it's not the end of the world. They have time to wait for 2.1.x to stabilize while they write their code. They would like to see us deliver on our six month schedule or even faster. Second, we have the users who have an application in production. These users, or their bosses, want Cassandra to be as stable as possible. Assuming they deploy on a stable release like 2.0.12, they don't want to touch it. They would like to see us release *less* often. (Because that means they have to do less upgrades while remaining in our backwards compatibility window.) With our current big release every X months model, these users' needs are in tension. We discussed this six months ago, and ended up with this: What if we tried a [four month] release cycle, BUT we would guarantee that you could do a rolling upgrade until we bump the supermajor version? So 2.0 could upgrade to 3.0 without having to go through 2.1. (But to go to 3.1 or 4.0 you would have to go through 3.0.) Crucially, I added Whether this is reasonable depends on how fast we can stabilize releases. 2.1.0 will be a good test of this. Unfortunately, even after DataStax hired half a dozen full-time test engineers, 2.1.0 continued the proud tradition of being unready for production use, with wait for .5 before upgrading once again looking like a good guideline. I’m starting to think that the entire model of “write a bunch of new features all at once and then try to stabilize it for release” is broken. We’ve been trying that for years and empirically speaking the evidence is that it just doesn’t work, either from a stability standpoint or even just shipping on time. A big reason that it takes us so long to stabilize new releases now is that, because our major release cycle is so long, it’s super tempting to slip in “just one” new
Re: 3.0 and the Cassandra release process
Hi, Keep in mind it is a bug fix release every month and a feature release every two months. For development that is really a two month cycle with all bug fixes being backported one release. As a developer if you want to get something in a release you have two months and you should be sizing pieces of large tasks so they ship at least every two months. Ariel On Mar 18, 2015, at 5:58 PM, Terrance Shepherd tscana...@gmail.com wrote: I like the idea but I agree that every month is a bit aggressive. I have no say but: I would say 4 releases a year instead of 12. with 2 months of new features and 1 month of bug squashing per a release. With the 4th quarter just bugs. I would also proposed 2 year LTS releases for the releases after the 4th quarter. So everyone could get a new feature release every quarter and the stability of super major versions for 2 years. On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius dbros...@mebigfatguy.com wrote: It would seem the practical implications of this is that there would be significantly more development on branches, with potentially more significant delays on merging these branches. This would imply to me that more Jenkins servers would need to be set up to handle auto-testing of more branches, as if feature work spends more time on external branches, it is then likely to be be less tested (even if by accident) as less developers would be working on that branch. Only when a feature was blessed to make it to the release-tracked branch, would it become exposed to the majority of developers/testers, etc doing normal running/playing/testing. This isn't to knock the idea in anyway, just wanted to mention what i think the outcome would be. dave On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis jbel...@gmail.com wrote: Cassandra 2.1 was released in September, which means that if we were on track with our stated goal of six month releases, 3.0 would be done about now. Instead, we haven't even delivered a beta. The immediate cause this time is blocking for 8099 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is that nobody should really be surprised. Something always comes up -- we've averaged about nine months since 1.0, with 2.1 taking an entire year. We could make theory align with reality by acknowledging, if nine months is our 'natural' release schedule, then so be it. But I think we can do better. Broadly speaking, we have two constituencies with Cassandra releases: First, we have the users who are building or porting an application on Cassandra. These users want the newest features to make their job easier. If 2.1.0 has a few bugs, it's not the end of the world. They have time to wait for 2.1.x to stabilize while they write their code. They would like to see us deliver on our six month schedule or even faster. Second, we have the users who have an application in production. These users, or their bosses, want Cassandra to be as stable as possible. Assuming they deploy on a stable release like 2.0.12, they don't want to touch it. They would like to see us release *less* often. (Because that means they have to do less upgrades while remaining in our backwards compatibility window.) With our current big release every X months model, these users' needs are in tension. We discussed this six months ago, and ended up with this: What if we tried a [four month] release cycle, BUT we would guarantee that you could do a rolling upgrade until we bump the supermajor version? So 2.0 could upgrade to 3.0 without having to go through 2.1. (But to go to 3.1 or 4.0 you would have to go through 3.0.) Crucially, I added Whether this is reasonable depends on how fast we can stabilize releases. 2.1.0 will be a good test of this. Unfortunately, even after DataStax hired half a dozen full-time test engineers, 2.1.0 continued the proud tradition of being unready for production use, with wait for .5 before upgrading once again looking like a good guideline. I’m starting to think that the entire model of “write a bunch of new features all at once and then try to stabilize it for release” is broken. We’ve been trying that for years and empirically speaking the evidence is that it just doesn’t work, either from a stability standpoint or even just shipping on time. A big reason that it takes us so long to stabilize new releases now is that, because our major release cycle is so long, it’s super tempting to slip in “just one” new feature into bugfix releases, and I’m as guilty of that as anyone. For similar reasons, it’s difficult to do a meaningful freeze with big feature releases. A look at 3.0 shows why: we have 8099 coming, but we also have significant work done (but not finished) on 6230, 7970, 6696, and 6477, all of which are meaningful improvements that address demonstrated user pain. So if we keep
Re: 3.0 and the Cassandra release process
It would seem the practical implications of this is that there would be significantly more development on branches, with potentially more significant delays on merging these branches. This would imply to me that more Jenkins servers would need to be set up to handle auto-testing of more branches, as if feature work spends more time on external branches, it is then likely to be be less tested (even if by accident) as less developers would be working on that branch. Only when a feature was blessed to make it to the release-tracked branch, would it become exposed to the majority of developers/testers, etc doing normal running/playing/testing. This isn't to knock the idea in anyway, just wanted to mention what i think the outcome would be. dave On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis jbel...@gmail.com wrote: Cassandra 2.1 was released in September, which means that if we were on track with our stated goal of six month releases, 3.0 would be done about now. Instead, we haven't even delivered a beta. The immediate cause this time is blocking for 8099 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is that nobody should really be surprised. Something always comes up -- we've averaged about nine months since 1.0, with 2.1 taking an entire year. We could make theory align with reality by acknowledging, if nine months is our 'natural' release schedule, then so be it. But I think we can do better. Broadly speaking, we have two constituencies with Cassandra releases: First, we have the users who are building or porting an application on Cassandra. These users want the newest features to make their job easier. If 2.1.0 has a few bugs, it's not the end of the world. They have time to wait for 2.1.x to stabilize while they write their code. They would like to see us deliver on our six month schedule or even faster. Second, we have the users who have an application in production. These users, or their bosses, want Cassandra to be as stable as possible. Assuming they deploy on a stable release like 2.0.12, they don't want to touch it. They would like to see us release *less* often. (Because that means they have to do less upgrades while remaining in our backwards compatibility window.) With our current big release every X months model, these users' needs are in tension. We discussed this six months ago, and ended up with this: What if we tried a [four month] release cycle, BUT we would guarantee that you could do a rolling upgrade until we bump the supermajor version? So 2.0 could upgrade to 3.0 without having to go through 2.1. (But to go to 3.1 or 4.0 you would have to go through 3.0.) Crucially, I added Whether this is reasonable depends on how fast we can stabilize releases. 2.1.0 will be a good test of this. Unfortunately, even after DataStax hired half a dozen full-time test engineers, 2.1.0 continued the proud tradition of being unready for production use, with wait for .5 before upgrading once again looking like a good guideline. I’m starting to think that the entire model of “write a bunch of new features all at once and then try to stabilize it for release” is broken. We’ve been trying that for years and empirically speaking the evidence is that it just doesn’t work, either from a stability standpoint or even just shipping on time. A big reason that it takes us so long to stabilize new releases now is that, because our major release cycle is so long, it’s super tempting to slip in “just one” new feature into bugfix releases, and I’m as guilty of that as anyone. For similar reasons, it’s difficult to do a meaningful freeze with big feature releases. A look at 3.0 shows why: we have 8099 coming, but we also have significant work done (but not finished) on 6230, 7970, 6696, and 6477, all of which are meaningful improvements that address demonstrated user pain. So if we keep doing what we’ve been doing, our choices are to either delay 3.0 further while we finish and stabilize these, or we wait nine months to a year for the next release. Either way, one of our constituencies gets disappointed. So, I’d like to try something different. I think we were on the right track with shorter releases with more compatibility. But I’d like to throw in a twist. Intel cuts down on risk with a “tick-tock” schedule for new architectures and process shrinks instead of trying to do both at once. We can do something similar here: One month releases. Period. If it’s not done, it can wait. *Every other release only accepts bug fixes.* By itself, one-month releases are going to dramatically reduce the complexity of testing and debugging new releases -- and bugs that do slip past us will only affect a smaller percentage of users, avoiding the “big release has a bunch of bugs no one has seen before
3.0 and the Cassandra release process
Cassandra 2.1 was released in September, which means that if we were on track with our stated goal of six month releases, 3.0 would be done about now. Instead, we haven't even delivered a beta. The immediate cause this time is blocking for 8099 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is that nobody should really be surprised. Something always comes up -- we've averaged about nine months since 1.0, with 2.1 taking an entire year. We could make theory align with reality by acknowledging, if nine months is our 'natural' release schedule, then so be it. But I think we can do better. Broadly speaking, we have two constituencies with Cassandra releases: First, we have the users who are building or porting an application on Cassandra. These users want the newest features to make their job easier. If 2.1.0 has a few bugs, it's not the end of the world. They have time to wait for 2.1.x to stabilize while they write their code. They would like to see us deliver on our six month schedule or even faster. Second, we have the users who have an application in production. These users, or their bosses, want Cassandra to be as stable as possible. Assuming they deploy on a stable release like 2.0.12, they don't want to touch it. They would like to see us release *less* often. (Because that means they have to do less upgrades while remaining in our backwards compatibility window.) With our current big release every X months model, these users' needs are in tension. We discussed this six months ago, and ended up with this: What if we tried a [four month] release cycle, BUT we would guarantee that you could do a rolling upgrade until we bump the supermajor version? So 2.0 could upgrade to 3.0 without having to go through 2.1. (But to go to 3.1 or 4.0 you would have to go through 3.0.) Crucially, I added Whether this is reasonable depends on how fast we can stabilize releases. 2.1.0 will be a good test of this. Unfortunately, even after DataStax hired half a dozen full-time test engineers, 2.1.0 continued the proud tradition of being unready for production use, with wait for .5 before upgrading once again looking like a good guideline. I’m starting to think that the entire model of “write a bunch of new features all at once and then try to stabilize it for release” is broken. We’ve been trying that for years and empirically speaking the evidence is that it just doesn’t work, either from a stability standpoint or even just shipping on time. A big reason that it takes us so long to stabilize new releases now is that, because our major release cycle is so long, it’s super tempting to slip in “just one” new feature into bugfix releases, and I’m as guilty of that as anyone. For similar reasons, it’s difficult to do a meaningful freeze with big feature releases. A look at 3.0 shows why: we have 8099 coming, but we also have significant work done (but not finished) on 6230, 7970, 6696, and 6477, all of which are meaningful improvements that address demonstrated user pain. So if we keep doing what we’ve been doing, our choices are to either delay 3.0 further while we finish and stabilize these, or we wait nine months to a year for the next release. Either way, one of our constituencies gets disappointed. So, I’d like to try something different. I think we were on the right track with shorter releases with more compatibility. But I’d like to throw in a twist. Intel cuts down on risk with a “tick-tock” schedule for new architectures and process shrinks instead of trying to do both at once. We can do something similar here: One month releases. Period. If it’s not done, it can wait. *Every other release only accepts bug fixes.* By itself, one-month releases are going to dramatically reduce the complexity of testing and debugging new releases -- and bugs that do slip past us will only affect a smaller percentage of users, avoiding the “big release has a bunch of bugs no one has seen before and pretty much everyone is hit by something” scenario. But by adding in the second rule, I think we have a real chance to make a quantum leap here: stable, production-ready releases every two months. So here is my proposal for 3.0: We’re just about ready to start serious review of 8099. When that’s done, we branch 3.0 and cut a beta and then release candidates. Whatever isn’t done by then, has to wait; unlike prior betas, we will only accept bug fixes into 3.0 after branching. One month after 3.0, we will ship 3.1 (with new features). At the same time, we will branch 3.2. New features in trunk will go into 3.3. The 3.2 branch will only get bug fixes. We will maintain backwards compatibility for all of 3.x; eventually (no less than a year) we will pick a release to be 4.0, and drop deprecated features and old backwards compatibilities. Otherwise there will be nothing special about the 4.0 designation. (Note that with an “odd releases have new features, even releases only
Re: 3.0 and the Cassandra release process
Thanks for everyone's hard work and perseverance, Cassandra to is truly amazing. It really does make redundancy so much easier making my life far less stressful (: it surely is this awesomeness that creates the demand for features in the first place. So this is a great problem to have. Certainly having a product where the user base continually encourages people not to use the current major version is a situation that could be improved. Doing something to attempt to improve the current process is better than (for example) doing nothing. Modelling a process based on another companies proven strategy seems better than making it up as you go. I suggest anyone who would minus one this should also need to include an alternate proposal to change the status quo. Thanks, Jacob __ Sent from iPhone On 18 Mar 2015, at 8:06 am, Jonathan Ellis jbel...@gmail.com wrote: Cassandra 2.1 was released in September, which means that if we were on track with our stated goal of six month releases, 3.0 would be done about now. Instead, we haven't even delivered a beta. The immediate cause this time is blocking for 8099 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is that nobody should really be surprised. Something always comes up -- we've averaged about nine months since 1.0, with 2.1 taking an entire year. We could make theory align with reality by acknowledging, if nine months is our 'natural' release schedule, then so be it. But I think we can do better. Broadly speaking, we have two constituencies with Cassandra releases: First, we have the users who are building or porting an application on Cassandra. These users want the newest features to make their job easier. If 2.1.0 has a few bugs, it's not the end of the world. They have time to wait for 2.1.x to stabilize while they write their code. They would like to see us deliver on our six month schedule or even faster. Second, we have the users who have an application in production. These users, or their bosses, want Cassandra to be as stable as possible. Assuming they deploy on a stable release like 2.0.12, they don't want to touch it. They would like to see us release *less* often. (Because that means they have to do less upgrades while remaining in our backwards compatibility window.) With our current big release every X months model, these users' needs are in tension. We discussed this six months ago, and ended up with this: What if we tried a [four month] release cycle, BUT we would guarantee that you could do a rolling upgrade until we bump the supermajor version? So 2.0 could upgrade to 3.0 without having to go through 2.1. (But to go to 3.1 or 4.0 you would have to go through 3.0.) Crucially, I added Whether this is reasonable depends on how fast we can stabilize releases. 2.1.0 will be a good test of this. Unfortunately, even after DataStax hired half a dozen full-time test engineers, 2.1.0 continued the proud tradition of being unready for production use, with wait for .5 before upgrading once again looking like a good guideline. I’m starting to think that the entire model of “write a bunch of new features all at once and then try to stabilize it for release” is broken. We’ve been trying that for years and empirically speaking the evidence is that it just doesn’t work, either from a stability standpoint or even just shipping on time. A big reason that it takes us so long to stabilize new releases now is that, because our major release cycle is so long, it’s super tempting to slip in “just one” new feature into bugfix releases, and I’m as guilty of that as anyone. For similar reasons, it’s difficult to do a meaningful freeze with big feature releases. A look at 3.0 shows why: we have 8099 coming, but we also have significant work done (but not finished) on 6230, 7970, 6696, and 6477, all of which are meaningful improvements that address demonstrated user pain. So if we keep doing what we’ve been doing, our choices are to either delay 3.0 further while we finish and stabilize these, or we wait nine months to a year for the next release. Either way, one of our constituencies gets disappointed. So, I’d like to try something different. I think we were on the right track with shorter releases with more compatibility. But I’d like to throw in a twist. Intel cuts down on risk with a “tick-tock” schedule for new architectures and process shrinks instead of trying to do both at once. We can do something similar here: One month releases. Period. If it’s not done, it can wait. *Every other release only accepts bug fixes.* By itself, one-month releases are going to dramatically reduce the complexity of testing and debugging new releases -- and bugs that do slip past us will only affect a smaller percentage of users, avoiding the “big release has a bunch of bugs no one has seen before and pretty
Re: 3.0 and the Cassandra release process
❤️ it. +1 -kjellman On Mar 17, 2015, at 2:06 PM, Jonathan Ellis jbel...@gmail.com wrote: Cassandra 2.1 was released in September, which means that if we were on track with our stated goal of six month releases, 3.0 would be done about now. Instead, we haven't even delivered a beta. The immediate cause this time is blocking for 8099 https://issues.apache.org/jira/browse/CASSANDRA-8099, but the reality is that nobody should really be surprised. Something always comes up -- we've averaged about nine months since 1.0, with 2.1 taking an entire year. We could make theory align with reality by acknowledging, if nine months is our 'natural' release schedule, then so be it. But I think we can do better. Broadly speaking, we have two constituencies with Cassandra releases: First, we have the users who are building or porting an application on Cassandra. These users want the newest features to make their job easier. If 2.1.0 has a few bugs, it's not the end of the world. They have time to wait for 2.1.x to stabilize while they write their code. They would like to see us deliver on our six month schedule or even faster. Second, we have the users who have an application in production. These users, or their bosses, want Cassandra to be as stable as possible. Assuming they deploy on a stable release like 2.0.12, they don't want to touch it. They would like to see us release *less* often. (Because that means they have to do less upgrades while remaining in our backwards compatibility window.) With our current big release every X months model, these users' needs are in tension. We discussed this six months ago, and ended up with this: What if we tried a [four month] release cycle, BUT we would guarantee that you could do a rolling upgrade until we bump the supermajor version? So 2.0 could upgrade to 3.0 without having to go through 2.1. (But to go to 3.1 or 4.0 you would have to go through 3.0.) Crucially, I added Whether this is reasonable depends on how fast we can stabilize releases. 2.1.0 will be a good test of this. Unfortunately, even after DataStax hired half a dozen full-time test engineers, 2.1.0 continued the proud tradition of being unready for production use, with wait for .5 before upgrading once again looking like a good guideline. I’m starting to think that the entire model of “write a bunch of new features all at once and then try to stabilize it for release” is broken. We’ve been trying that for years and empirically speaking the evidence is that it just doesn’t work, either from a stability standpoint or even just shipping on time. A big reason that it takes us so long to stabilize new releases now is that, because our major release cycle is so long, it’s super tempting to slip in “just one” new feature into bugfix releases, and I’m as guilty of that as anyone. For similar reasons, it’s difficult to do a meaningful freeze with big feature releases. A look at 3.0 shows why: we have 8099 coming, but we also have significant work done (but not finished) on 6230, 7970, 6696, and 6477, all of which are meaningful improvements that address demonstrated user pain. So if we keep doing what we’ve been doing, our choices are to either delay 3.0 further while we finish and stabilize these, or we wait nine months to a year for the next release. Either way, one of our constituencies gets disappointed. So, I’d like to try something different. I think we were on the right track with shorter releases with more compatibility. But I’d like to throw in a twist. Intel cuts down on risk with a “tick-tock” schedule for new architectures and process shrinks instead of trying to do both at once. We can do something similar here: One month releases. Period. If it’s not done, it can wait. *Every other release only accepts bug fixes.* By itself, one-month releases are going to dramatically reduce the complexity of testing and debugging new releases -- and bugs that do slip past us will only affect a smaller percentage of users, avoiding the “big release has a bunch of bugs no one has seen before and pretty much everyone is hit by something” scenario. But by adding in the second rule, I think we have a real chance to make a quantum leap here: stable, production-ready releases every two months. So here is my proposal for 3.0: We’re just about ready to start serious review of 8099. When that’s done, we branch 3.0 and cut a beta and then release candidates. Whatever isn’t done by then, has to wait; unlike prior betas, we will only accept bug fixes into 3.0 after branching. One month after 3.0, we will ship 3.1 (with new features). At the same time, we will branch 3.2. New features in trunk will go into 3.3. The 3.2 branch will only get bug fixes. We will maintain backwards compatibility for all of 3.x; eventually (no less than a year) we will pick a release to be 4.0, and drop