Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…
> > Now that we have CASSANDRA-16079 created and work is proceeding, can we > please commit CASSANDRA-13701? > Closing the loop on this. CASSANDRA-16079 is ready to be committed. It involves a patch to ccm and to cassandra-dtest, and takes advantage of CASSANDRA-16205. DTest run times are on par (when run with and without CASSANDRA-13701). A spun off ticket will be created for some additional dtest performance improvements that have been identified by taking use of `ring_delay_ms`, but it will not block 13701. Speak up if you spot anything you think we may have missed.
Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…
Thanks Adam. Now that we have CASSANDRA-16079 created and work is proceeding, can we please commit CASSANDRA-13701? There are many clusters out there that use the default of 256 num_tokens and migrating to something more sane is a lot of work. It would also be very helpful for having reasonable defaults getting tested for the release. It's been a bad default for a long, long time and we need to get this in before we do more testing for the release. > On Aug 28, 2020, at 5:25 AM, Adam Holmberg wrote: > > After discussing with a few stakeholders it sounds like folks agree that > optimizing dtest speed is a worthy endeavor. What is less clear are > concrete things that should be done. Since a brainstorming session failed > to materialize on CASSANDRA-13701, we thought it would make sense to start > with an open analysis. > > A ticket[1] has been created for analysis and further follow-up. We welcome > any concrete ideas people already have in mind. > > Kind regards, > Adam Holmberg > > [1] https://issues.apache.org/jira/browse/CASSANDRA-16079 > > On Mon, Aug 24, 2020 at 12:17 PM Mick Semb Wever wrote: > >> I believe the speed of our CI runs >>> is of common interest. What do you think? Does this sound feasible? Who >> is >>> in?* >>> >> >> >> I agree. I'm in. I have some ideas to add to the mix. Thanks Ekaterina. >> > > > -- > Adam Holmberg > e. adam.holmb...@datastax.com > w. www.datastax.com - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…
After discussing with a few stakeholders it sounds like folks agree that optimizing dtest speed is a worthy endeavor. What is less clear are concrete things that should be done. Since a brainstorming session failed to materialize on CASSANDRA-13701, we thought it would make sense to start with an open analysis. A ticket[1] has been created for analysis and further follow-up. We welcome any concrete ideas people already have in mind. Kind regards, Adam Holmberg [1] https://issues.apache.org/jira/browse/CASSANDRA-16079 On Mon, Aug 24, 2020 at 12:17 PM Mick Semb Wever wrote: > I believe the speed of our CI runs > > is of common interest. What do you think? Does this sound feasible? Who > is > > in?* > > > > > I agree. I'm in. I have some ideas to add to the mix. Thanks Ekaterina. > -- Adam Holmberg e. adam.holmb...@datastax.com w. www.datastax.com
Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…
I believe the speed of our CI runs > is of common interest. What do you think? Does this sound feasible? Who is > in?* > I agree. I'm in. I have some ideas to add to the mix. Thanks Ekaterina.
Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…
*Hello everyone,* *I was thinking about the latest discussions on the topic. During our last Cassandra Contributor meeting we agreed it would be good to cover in our agenda discussions/brainstorming on different topics. So I would like to suggest to organize a brainstorming session on CASSANDRA-13701 between all people who are interested on the topic. I believe the speed of our CI runs is of common interest. What do you think? Does this sound feasible? Who is in?* *Best regards,* *Ekaterina* On Sat, 22 Aug 2020 at 9:21, Joshua McKenzie wrote: > I think what Jordan is exploring, and I agree on, is that we need clear > > next steps to help reduce the 75% ish increase in dtest runtime. For > > sponsored contributors using circle to run the entire suites, throwing more > > money at the problem through parallelization isn't a long-term solution. > > > > On Sat, Aug 22, 2020 at 8:40 AM Jeremy Hanna > > wrote: > > > > > I know the dtests take a long time and this will make them longer. As a > > > counter point most people run individual dtests locally and the full set > on > > > dedicated test infrastructure. For the dedicated test infrastructure Mick > > > also improved the wall clock runtime when parallelizing the dtests on > > > https://issues.apache.org/jira/browse/CASSANDRA-16006. > > > > > > Even with the longer dtest full runtime, I firmly believe that for the > > > sake of new users and how hard it is to change num_tokens once data is > > > written, this change to the default of num_tokens is long overdue. > Another > > > hidden benefit of this change is that the dtests will now run bootstraps > > > the way operators should run them in practice with the new defaults. That > > > will make the more common default case much more tested and hopefully > catch > > > regressions in that execution path faster. > > > > > > So while it is not a trivial change in full dtest runtime, the benefits > to > > > the community and project are also not trivial. I’m really grateful to > all > > > who have put in effort to make this a reality and know that new users in > > > 4.0 will benefit from these improved defaults. > > > > > > In other words my non binding vote is to merge this and look to improve > > > execution time separately with that effort not being as urgent for the > > > reasons stated above. > > > > > > Jeremy > > > > > > > On Aug 20, 2020, at 2:49 AM, Mick Semb Wever wrote: > > > > > > > > It was agreed¹ that 4.0 should have the new configuration defaults of > > > > num_tokens: 16 > > > > allocate_tokens_for_local_replication_factor: 3 > > > > > > > > 13701's patches: against cassandra, cassandra-builds, cassandra-dtest, > > > ccm; > > > > are reviewed, tested, and ready to commit. But the ccm and dtest > patches > > > > required ccm having to now start nodes sequentially, and adding some > > > longer > > > > timeout values in the dtests. > > > > > > > > The consequence of this is CI runs now take longer. ci-cassandra.a.o's > > > > dtests take ~30% longer, and circleci's dtests (with vnodes) have gone > > > from > > > > ~22 to ~43 minutes. The general opinion (on slack²) is to commit, and > > > work > > > > on improving ccm and dtest startup times in a subsequent ticket. > > > > > > > > 13701 was intended to be committed before the first beta release > because > > > of > > > > its user-facing changes. But these numbers are significant enough it > > > makes > > > > sense to touch base with dev@ > > > > > > > > Does anyone (strongly) object to the "commit + follow up ticket" > > > approach? > > > > > > > > regards, > > > > Mick > > > > > > > > > > > > ¹ – > > > > > > > > https://lists.apache.org/thread.html/ra829084fcf344e9e96fa5c61cb31e909c8629091993471594b65ea89%40%3Cdev.cassandra.apache.org%3E > > > > ² – https://the-asf.slack.com/archives/CK23JSY2K/p1597747395032600 and > > > > > > > > https://the-asf.slack.com/archives/CK23JSY2K/p1597849774078200?thread_ts=1597762085.048300&cid=CK23JSY2K > > > > >
Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…
I think what Jordan is exploring, and I agree on, is that we need clear next steps to help reduce the 75% ish increase in dtest runtime. For sponsored contributors using circle to run the entire suites, throwing more money at the problem through parallelization isn't a long-term solution. On Sat, Aug 22, 2020 at 8:40 AM Jeremy Hanna wrote: > I know the dtests take a long time and this will make them longer. As a > counter point most people run individual dtests locally and the full set on > dedicated test infrastructure. For the dedicated test infrastructure Mick > also improved the wall clock runtime when parallelizing the dtests on > https://issues.apache.org/jira/browse/CASSANDRA-16006. > > Even with the longer dtest full runtime, I firmly believe that for the > sake of new users and how hard it is to change num_tokens once data is > written, this change to the default of num_tokens is long overdue. Another > hidden benefit of this change is that the dtests will now run bootstraps > the way operators should run them in practice with the new defaults. That > will make the more common default case much more tested and hopefully catch > regressions in that execution path faster. > > So while it is not a trivial change in full dtest runtime, the benefits to > the community and project are also not trivial. I’m really grateful to all > who have put in effort to make this a reality and know that new users in > 4.0 will benefit from these improved defaults. > > In other words my non binding vote is to merge this and look to improve > execution time separately with that effort not being as urgent for the > reasons stated above. > > Jeremy > > > On Aug 20, 2020, at 2:49 AM, Mick Semb Wever wrote: > > > > It was agreed¹ that 4.0 should have the new configuration defaults of > > num_tokens: 16 > > allocate_tokens_for_local_replication_factor: 3 > > > > 13701's patches: against cassandra, cassandra-builds, cassandra-dtest, > ccm; > > are reviewed, tested, and ready to commit. But the ccm and dtest patches > > required ccm having to now start nodes sequentially, and adding some > longer > > timeout values in the dtests. > > > > The consequence of this is CI runs now take longer. ci-cassandra.a.o's > > dtests take ~30% longer, and circleci's dtests (with vnodes) have gone > from > > ~22 to ~43 minutes. The general opinion (on slack²) is to commit, and > work > > on improving ccm and dtest startup times in a subsequent ticket. > > > > 13701 was intended to be committed before the first beta release because > of > > its user-facing changes. But these numbers are significant enough it > makes > > sense to touch base with dev@ > > > > Does anyone (strongly) object to the "commit + follow up ticket" > approach? > > > > regards, > > Mick > > > > > > ¹ – > > > https://lists.apache.org/thread.html/ra829084fcf344e9e96fa5c61cb31e909c8629091993471594b65ea89%40%3Cdev.cassandra.apache.org%3E > > ² – https://the-asf.slack.com/archives/CK23JSY2K/p1597747395032600 and > > > https://the-asf.slack.com/archives/CK23JSY2K/p1597849774078200?thread_ts=1597762085.048300&cid=CK23JSY2K >
Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…
I know the dtests take a long time and this will make them longer. As a counter point most people run individual dtests locally and the full set on dedicated test infrastructure. For the dedicated test infrastructure Mick also improved the wall clock runtime when parallelizing the dtests on https://issues.apache.org/jira/browse/CASSANDRA-16006. Even with the longer dtest full runtime, I firmly believe that for the sake of new users and how hard it is to change num_tokens once data is written, this change to the default of num_tokens is long overdue. Another hidden benefit of this change is that the dtests will now run bootstraps the way operators should run them in practice with the new defaults. That will make the more common default case much more tested and hopefully catch regressions in that execution path faster. So while it is not a trivial change in full dtest runtime, the benefits to the community and project are also not trivial. I’m really grateful to all who have put in effort to make this a reality and know that new users in 4.0 will benefit from these improved defaults. In other words my non binding vote is to merge this and look to improve execution time separately with that effort not being as urgent for the reasons stated above. Jeremy > On Aug 20, 2020, at 2:49 AM, Mick Semb Wever wrote: > > It was agreed¹ that 4.0 should have the new configuration defaults of > num_tokens: 16 > allocate_tokens_for_local_replication_factor: 3 > > 13701's patches: against cassandra, cassandra-builds, cassandra-dtest, ccm; > are reviewed, tested, and ready to commit. But the ccm and dtest patches > required ccm having to now start nodes sequentially, and adding some longer > timeout values in the dtests. > > The consequence of this is CI runs now take longer. ci-cassandra.a.o's > dtests take ~30% longer, and circleci's dtests (with vnodes) have gone from > ~22 to ~43 minutes. The general opinion (on slack²) is to commit, and work > on improving ccm and dtest startup times in a subsequent ticket. > > 13701 was intended to be committed before the first beta release because of > its user-facing changes. But these numbers are significant enough it makes > sense to touch base with dev@ > > Does anyone (strongly) object to the "commit + follow up ticket" approach? > > regards, > Mick > > > ¹ – > https://lists.apache.org/thread.html/ra829084fcf344e9e96fa5c61cb31e909c8629091993471594b65ea89%40%3Cdev.cassandra.apache.org%3E > ² – https://the-asf.slack.com/archives/CK23JSY2K/p1597747395032600 and > https://the-asf.slack.com/archives/CK23JSY2K/p1597849774078200?thread_ts=1597762085.048300&cid=CK23JSY2K
Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…
What sort of commitment is there to the follow-up tickets? Are the follow-ups "make this faster" or are there specific tasks we know will help? I'm concerned by the increase in testing run times on circle but don't think that should prevent a good/decided upon default from merging. Jordan On Wed, Aug 19, 2020 at 9:49 AM Mick Semb Wever wrote: > It was agreed¹ that 4.0 should have the new configuration defaults of > num_tokens: 16 > allocate_tokens_for_local_replication_factor: 3 > > 13701's patches: against cassandra, cassandra-builds, cassandra-dtest, ccm; > are reviewed, tested, and ready to commit. But the ccm and dtest patches > required ccm having to now start nodes sequentially, and adding some longer > timeout values in the dtests. > > The consequence of this is CI runs now take longer. ci-cassandra.a.o's > dtests take ~30% longer, and circleci's dtests (with vnodes) have gone from > ~22 to ~43 minutes. The general opinion (on slack²) is to commit, and work > on improving ccm and dtest startup times in a subsequent ticket. > > 13701 was intended to be committed before the first beta release because of > its user-facing changes. But these numbers are significant enough it makes > sense to touch base with dev@ > > Does anyone (strongly) object to the "commit + follow up ticket" approach? > > regards, > Mick > > > ¹ – > > https://lists.apache.org/thread.html/ra829084fcf344e9e96fa5c61cb31e909c8629091993471594b65ea89%40%3Cdev.cassandra.apache.org%3E > ² – https://the-asf.slack.com/archives/CK23JSY2K/p1597747395032600 and > > https://the-asf.slack.com/archives/CK23JSY2K/p1597849774078200?thread_ts=1597762085.048300&cid=CK23JSY2K >
Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…
Hi Mick, No objections from me. It will be good to get this change into the 4.0 release. Whilst the slow down of the dtests is annoying, I am happy to see this change committed. As long as there are no regressions in the number of tests that pass then it should be fine. The proposal to raise a follow up ticket to accompany the commit is a good idea. Regards, Anthony On Thu, 20 Aug 2020 at 02:49, Mick Semb Wever wrote: > It was agreed¹ that 4.0 should have the new configuration defaults of > num_tokens: 16 > allocate_tokens_for_local_replication_factor: 3 > > 13701's patches: against cassandra, cassandra-builds, cassandra-dtest, ccm; > are reviewed, tested, and ready to commit. But the ccm and dtest patches > required ccm having to now start nodes sequentially, and adding some longer > timeout values in the dtests. > > The consequence of this is CI runs now take longer. ci-cassandra.a.o's > dtests take ~30% longer, and circleci's dtests (with vnodes) have gone from > ~22 to ~43 minutes. The general opinion (on slack²) is to commit, and work > on improving ccm and dtest startup times in a subsequent ticket. > > 13701 was intended to be committed before the first beta release because of > its user-facing changes. But these numbers are significant enough it makes > sense to touch base with dev@ > > Does anyone (strongly) object to the "commit + follow up ticket" approach? > > regards, > Mick > > > ¹ – > > https://lists.apache.org/thread.html/ra829084fcf344e9e96fa5c61cb31e909c8629091993471594b65ea89%40%3Cdev.cassandra.apache.org%3E > ² – https://the-asf.slack.com/archives/CK23JSY2K/p1597747395032600 and > > https://the-asf.slack.com/archives/CK23JSY2K/p1597849774078200?thread_ts=1597762085.048300&cid=CK23JSY2K >
Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…
It was agreed¹ that 4.0 should have the new configuration defaults of num_tokens: 16 allocate_tokens_for_local_replication_factor: 3 13701's patches: against cassandra, cassandra-builds, cassandra-dtest, ccm; are reviewed, tested, and ready to commit. But the ccm and dtest patches required ccm having to now start nodes sequentially, and adding some longer timeout values in the dtests. The consequence of this is CI runs now take longer. ci-cassandra.a.o's dtests take ~30% longer, and circleci's dtests (with vnodes) have gone from ~22 to ~43 minutes. The general opinion (on slack²) is to commit, and work on improving ccm and dtest startup times in a subsequent ticket. 13701 was intended to be committed before the first beta release because of its user-facing changes. But these numbers are significant enough it makes sense to touch base with dev@ Does anyone (strongly) object to the "commit + follow up ticket" approach? regards, Mick ¹ – https://lists.apache.org/thread.html/ra829084fcf344e9e96fa5c61cb31e909c8629091993471594b65ea89%40%3Cdev.cassandra.apache.org%3E ² – https://the-asf.slack.com/archives/CK23JSY2K/p1597747395032600 and https://the-asf.slack.com/archives/CK23JSY2K/p1597849774078200?thread_ts=1597762085.048300&cid=CK23JSY2K