Re: [discuss] Modernization of Cassandra build system
There are three distinct problems you raise: code structure, documentation, and build system. The build system, as far as I can tell, is a matter of personal preference. I personally dislike the few interactions I've had with maven, but gratefully my interactions with build system innards have been fairly limited. I mostly just use them. Unless a concrete and significant benefit is delivered by maven, though, it just doesn't seem worth the upheaval to me. If you can make the argument that it actually improves the project in a way that justifies the upheaval, it will certainly be considered, but so far no justification has been made. The documentation problem is common to many projects, though: out of codebase documentation gets stale very rapidly. When we say to read the code we mean read the code and its inline documentation - the quality of this documentation has itself generally been substandard, but has been improving significantly over the past year or so, and we are endeavouring to improve with every change. In the meantime, there are videos from a recent bootcamp we've run for both internal and external contributors http://www.datastax.com/dev/blog/deep-into-cassandra-internals. The code structure would be great to modularise, but the reality is that it is not currently modular. There are no good clear dividing lines for much of the project. The problem with refactoring the entire codebase to create separate projects is that it is a significant undertaking that makes maintenance of the project across versions significantly more costly. This create a net drag on all productivity in the project. Such a major change requires strong consensus, and strong evidence justifying it. So the question is: would this create more new work than it loses? The evidence isn't there that it would. It might, but I personally guess that it would not, judging by the results of our other attempts to drive up contributions to the project. Perhaps we can have a wider dialogue about the endeavour, though, and see if a consensus can in fact be built. On Thu, Apr 2, 2015 at 9:31 AM, Pierre Devops pierredev...@gmail.com wrote: Hi all, Not a cassandra contributor here, but I'm working on the cassandra sources too. This big cassandra source root caused me trouble too, firstly it was not easy to import in an IDE, try to import cassandra sources in netbeans, it's a headcache. It would be great if we had more small modules/projects in separate POM. It will be more easier to work on small part of the project, and as a consequences, I'm sure you will have more external contribution to this project. I know cassandra devs are used to ant build model, but it's like a thread I opened about updated and more complete documentation about sstable structures. I got answer that it was not needed to understand how to use Cassandra, and the only way to learn about that is to rtfcode. Because people working on cassandra already know how sstable structure are, it's not needed to provide up to date documentation. So it will take me a very long time to read and understand all the serialization code in cassandra to understand the sttable structure before I can work on the code. Up to date documentation about internals would have gave me the knowledge I need to contribute much quicker. Here we have the same problem, we have a complex non modular build system, and core cassandra dev are used to it, so it's not needed to make something more flexible, even if it could facilite external contribution. 2015-03-31 23:42 GMT+02:00 Benedict Elliott Smith belliottsm...@datastax.com: I think the problem is everyone currently contributing is comfortable with ant, and as much as it is imperfect, it isn't clear maven is going to be better. Having the requisite maven functionality linked under the hood doesn't seem particularly preferable to the inverse. The status quo has the bonus of zero upheaval for the project and its contributors, though, so it would have to be a very clear win to justify the change in my opinion. On Tue, Mar 31, 2015 at 10:24 PM, Łukasz Dywicki l...@code-house.org wrote: Hey Tyler, Thank you very much for coming back. I already lost faith that I will get reply. :-) I am fine with code relocations. Moving constants into one place where they cause no circular dependencies is cool, I’m all for doing such thing. Currently Cassandra uses ant for doing some of maven functionalities (such deploying POM.xml into repositories with dependency information), it uses also maven type of artifact repositories. This can be easily flipped. Maven can call ant tasks for these parts which can not be made with existing maven plugins. Here is simplest example: http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin - you can see ant task definition embedded in maven pom.xml. Most of
Re: 3.0 and the Cassandra release process
In this tick tock cycle, is there still a long term release that's maintained, meant for production? Will bug fixes be back ported to 3.0 (stable) with new stuff going forward to 3.x? On Thu, Mar 26, 2015 at 6:50 AM Aleksey Yeschenko alek...@apache.org wrote: Hey Jason. I think pretty much everybody is on board with: 1) A monthly release cycle 2) Keeping trunk releasable all the times And that’s what my personal +1 was for. The tick-tock mechanism details and bug fix policy for the maintained stable lines should be fleshed out before we proceed. I believe that once they are explained better, the concerns will mostly, or entirely, go away. -- AY On Mon, Mar 23, 2015 at 11:15 PM, Jason Brown jasedbr...@gmail.com wrote: Hey all, I had a hallway conversation with some folks here last week, and they expressed some concerns with this proposal. I will not attempt to summarize their arguments as I don't believe I could do them ample justice, but I strongly encouraged those individuals to speak up and be heard on this thread (I know they are watching!). Thanks, -Jason On Mon, Mar 23, 2015 at 6:32 AM, 曹志富 cao.zh...@gmail.com wrote: +1 -- Ranger Tsao 2015-03-20 22:57 GMT+08:00 Ryan McGuire r...@datastax.com: I'm taking notes from the infrastructure doc and wrote down some action items for my team: https://gist.github.com/EnigmaCurry/d53eccb55f5d0986c976 -- [image: datastax_logo.png] http://www.datastax.com/ Ryan McGuire Software Engineering Manager in Test | r...@datastax.com [image: linkedin.png] https://www.linkedin.com/in/enigmacurry [image: twitter.png] http://twitter.com/enigmacurry http://github.com/enigmacurry On Thu, Mar 19, 2015 at 1:08 PM, Ariel Weisberg ariel.weisb...@datastax.com wrote: Hi, I realized one of the documents we didn't send out was the infrastructure side changes I am looking for. This one is maybe a little rougher as it was the first one I wrote on the subject. https://docs.google.com/document/d/1Seku0vPwChbnH3uYYxon0UO- b6LDtSqluZiH--sWWi0/edit?usp=sharing The goal is to have infrastructure that gives developers as close to immediate feedback as possible on their code before they merge. Feedback that is delayed to after merging to trunk should come in a day or two and there is a product owner (Michael Shuler) responsible for making sure that issues are addressed quickly. QA is going to help by providing developers with a better tools for writing higher level functional tests that explore all of the functions together along with the configuration space without developers having to do any work other then plugging in functionality to exercise and then validate something specific. This kind of harness is hard to get right and make reliable and expressive so they have their work cut out for them. It's going to be an iterative process where the tests improve as new work introduces missing coverage and as bugs/regressions drive the introduction of new tests. The monthly retrospective (planning on doing that first of the month) is also going to help us refine the testing and development process. Ariel On Thu, Mar 19, 2015 at 7:23 AM, Jason Brown jasedbr...@gmail.com wrote: +1 to this general proposal. I think the time has finally come for us to try something new, and this sounds legit. Thanks! On Thu, Mar 19, 2015 at 12:49 AM, Phil Yang ud1...@gmail.com wrote: Can I regard the odd version as the development preview and the even version as the production ready? IMO, as a database infrastructure project, stable is more important than other kinds of projects. LTS is a good idea, but if we don't support non-LTS releases for enough time to fix their bugs, users on non-LTS release may have to upgrade a new major release to fix the bugs and may have to handle some new bugs by the new features. I'm afraid that eventually people would only think about the LTS one. 2015-03-19 8:48 GMT+08:00 Pavel Yaskevich pove...@gmail.com: +1 On Wed, Mar 18, 2015 at 3:50 PM, Michael Kjellman mkjell...@internalcircle.com wrote: For most of my life I’ve lived on the software bleeding edge both personally and professionally. Maybe it’s a personal weakness, but I guess I get a thrill out of the problem solving aspect? Recently I came to a bit of an epiphany — the closer I keep to the daily build — generally the happier
Re: March 2015 QA retrospective
To add to this: *Went well* Tyler Hobbs has reduced failing dtests on trunk by ~90%. By next month, test results should be at 100% pass. *Went poorly* We've failed to make progress on running the full test suite across all contributor branches. By the end of this month, I assume we will at least have limited functionality in this area. On Wed, Apr 1, 2015 at 3:57 PM, Ariel Weisberg ariel.weisb...@datastax.com wrote: Hi all, It’s time for the first retrospective. For those not familiar this is the part of the development process where we discuss what is and isn’t working when it comes to making reliable releases. We go over the things that worked, the things that didn’t work, and what changes we are going to make. This is not a forum for discussing individual bugs (or bugs fixed before release due to successful process) although you can cite one and we can discuss what we could have done differently to catch it. Even if a bug wasn’t released if it was caught the wrong way (blind luck) and you think our process wouldn’t have caught it you can bring that up as well. I don’t expect this retrospective to be the most productive because we already know we are far behind in several areas (passing utests, dtests, running utests and dtests for on each commit, running a larger black box system test) and many issues will circle back around to being addressed by one of those three. If your a developer you can review all things you have committed (or reviewed) in the past month and ask yourself if it met the criteria of done that we agreed on including adding tests for existing untested code (usually the thing missed). Better to do it now then after discovering your definition of done was flawed because it released a preventible bug. For this one retrospective you can reach back further to something already released that you feel passionate about, and if you can point to a utest or dtest that should have caught it that is still missing we can add that to the list of things to test. That would go under CASSANDRA-9012 (Triage missing test coverage) https://issues.apache.org/jira/browse/CASSANDRA-9012. There is a root JIRA https://issues.apache.org/jira/browse/CASSANDRA-9042 for making trunk always releasable. A lot falls under CASSANDRA-9007 ( Run stress nightly against trunk in a way that validates ) https://issues.apache.org/jira/browse/CASSANDRA-9007 which is the root for a new kitchen sink style test that validates the entire feature set together in a black box fashion. Philip Thompson has a basic job running so we are close to (or at) the tipping point where the doneness criteria for every ticket needs to include making sure this job covers the thing you added/changed. If you aren’t going to add the coverage you need to justify (to yourself and your reviewer) breaking it out into something separate and file a JIRA indicating the coverage was missing (if one doesn’t already exist). Make sure to link it to 9007 so we can see what has already been reported. The reason I say we might not be at the tipping point is that while we have the job we haven’t ironed out how stress (or something new) will act as a container for validating multiple features. Especially in an environment where things like cluster/node failures and topology changes occur. Retrospectives aren’t supposed to include the preceding paragraphs we should funnel discussion about them into a separate email thread. On to the retrospective. This is more for me to solicit from information from you then for me to push information to you. Went well Positive response to the definition of done Lot’s of manpower from QA and progress on test infrastructure Went poorly Some wanting to add validation to a kitchen sink style test, but not being able to yet Not having a way to know if we are effectively implementing the definition of done without waiting for bugs as feedback Changes Coordinate with Philip Thompson to see how we can get to having developers able to add validation to the kitchen sink style test Regards, Ariel
Re: 3.0 and the Cassandra release process
We are moving away from designating major releases like 3.0 as special, other than as a marker of compatibility. In fact we are moving away from major releases entirely, with each release being a much smaller, digestible unit of change, and the ultimate goal of every even release being production-quality. This means that bugs won't pile up and compound each other. And bugs that do slip through will affect less users. As 3.x stabilizes, more people will try out the releases, yielding better quality, yielding even more people trying them out in a virtuous cycle. This won't just happen by wishing for it. I am very serious about investing the energy we would have spent on backporting fixes to a stable branch, into improving our QA process and test coverage. After a very short list of in-progress features that may not make the 3.0 cutoff (#6477, #6696 come to mind) I'm willing to virtually pause new feature development entirely to make this happen. Some patience will be necessary with the first few releases. But at this point, people are used to about six months of waiting for a new major to stabilize. So, let's give this a try until 3.6. If that still hasn't materially stabilized, then we need to go back to the drawing board. But I'm optimistic that it will. On Thu, Apr 2, 2015 at 5:04 PM, Jonathan Haddad j...@jonhaddad.com wrote: In this tick tock cycle, is there still a long term release that's maintained, meant for production? Will bug fixes be back ported to 3.0 (stable) with new stuff going forward to 3.x? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: 3.0 and the Cassandra release process
Hey Jonathan, I have been hoping for this approach for years now-one of the reasons I left Datastax was due to my feeling that quality was always on the backburner and never really taken seriously vs marketing driven releases. I sincerely hope this approach reverses that perceived trend. -- Colin +1 612 859 6129 Skype colin.p.clark On Apr 2, 2015, at 5:54 PM, Jonathan Ellis jbel...@gmail.com wrote: We are moving away from designating major releases like 3.0 as special, other than as a marker of compatibility. In fact we are moving away from major releases entirely, with each release being a much smaller, digestible unit of change, and the ultimate goal of every even release being production-quality. This means that bugs won't pile up and compound each other. And bugs that do slip through will affect less users. As 3.x stabilizes, more people will try out the releases, yielding better quality, yielding even more people trying them out in a virtuous cycle. This won't just happen by wishing for it. I am very serious about investing the energy we would have spent on backporting fixes to a stable branch, into improving our QA process and test coverage. After a very short list of in-progress features that may not make the 3.0 cutoff (#6477, #6696 come to mind) I'm willing to virtually pause new feature development entirely to make this happen. Some patience will be necessary with the first few releases. But at this point, people are used to about six months of waiting for a new major to stabilize. So, let's give this a try until 3.6. If that still hasn't materially stabilized, then we need to go back to the drawing board. But I'm optimistic that it will. On Thu, Apr 2, 2015 at 5:04 PM, Jonathan Haddad j...@jonhaddad.com wrote: In this tick tock cycle, is there still a long term release that's maintained, meant for production? Will bug fixes be back ported to 3.0 (stable) with new stuff going forward to 3.x? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: [discuss] Modernization of Cassandra build system
TL;DR - Benedict is right. IMO Maven is a nice, straight-forward tool if you know what you’re doing and start on a _new_ project. But Maven easily becomes a pita if you want to do something that’s not supported out-of-the-box. I bet that Maven would just not work for C* source tree with all the little nice features that C*’s build.xml offers (just look at the scripted stuff in build.xml). Eventually gradle would be an option; I proposed to switch to gradle several months ago. Same story (although gradle is better than Maven ;) ). But… you need to know that build.xml is not just used to build the code and artifacts. It is also used in CI, ccm, cstar-perf and a some other custom systems that exist and just work. So - if we would exchange ant with something else, it would force a lot of effort to change several tools and systems. And there must be a guarantee that everything works like it did before. Regarding IDEs: i’m using IDEA every day and it works like a charm with C*. Eclipse is ”supported natively” by ”ant generate-eclipse-files”. TBH I don’t know NetBeans. As Benedict pointed out, the code has improved and still improves a lot - in structure, in inline-doc, in nomenclature and whatever else. As soon as we can get rid of Thrift in the tree, there’s another big opportunity to cleanup more stuff. TBH I don’t think that (beside the tools) there would be a need to generate multiple artifacts for C* daemon - you can do ”separation of concerns” (via packages) even with discipline and then measure it. IMO The only artifact worth to extract out of C* tree, and useful for a (limited) set of 3rd party code, is something like ”cassandra-jmx-interfaces.jar” Robert Am 02.04.2015 um 11:30 schrieb Benedict Elliott Smith belliottsm...@datastax.com: There are three distinct problems you raise: code structure, documentation, and build system. The build system, as far as I can tell, is a matter of personal preference. I personally dislike the few interactions I've had with maven, but gratefully my interactions with build system innards have been fairly limited. I mostly just use them. Unless a concrete and significant benefit is delivered by maven, though, it just doesn't seem worth the upheaval to me. If you can make the argument that it actually improves the project in a way that justifies the upheaval, it will certainly be considered, but so far no justification has been made. The documentation problem is common to many projects, though: out of codebase documentation gets stale very rapidly. When we say to read the code we mean read the code and its inline documentation - the quality of this documentation has itself generally been substandard, but has been improving significantly over the past year or so, and we are endeavouring to improve with every change. In the meantime, there are videos from a recent bootcamp we've run for both internal and external contributors http://www.datastax.com/dev/blog/deep-into-cassandra-internals. The code structure would be great to modularise, but the reality is that it is not currently modular. There are no good clear dividing lines for much of the project. The problem with refactoring the entire codebase to create separate projects is that it is a significant undertaking that makes maintenance of the project across versions significantly more costly. This create a net drag on all productivity in the project. Such a major change requires strong consensus, and strong evidence justifying it. So the question is: would this create more new work than it loses? The evidence isn't there that it would. It might, but I personally guess that it would not, judging by the results of our other attempts to drive up contributions to the project. Perhaps we can have a wider dialogue about the endeavour, though, and see if a consensus can in fact be built. On Thu, Apr 2, 2015 at 9:31 AM, Pierre Devops pierredev...@gmail.com wrote: Hi all, Not a cassandra contributor here, but I'm working on the cassandra sources too. This big cassandra source root caused me trouble too, firstly it was not easy to import in an IDE, try to import cassandra sources in netbeans, it's a headcache. It would be great if we had more small modules/projects in separate POM. It will be more easier to work on small part of the project, and as a consequences, I'm sure you will have more external contribution to this project. I know cassandra devs are used to ant build model, but it's like a thread I opened about updated and more complete documentation about sstable structures. I got answer that it was not needed to understand how to use Cassandra, and the only way to learn about that is to rtfcode. Because people working on cassandra already know how sstable structure are, it's not needed to provide up to date documentation. So it will take me a very long time to read and understand all the serialization code in cassandra to