Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…

2020-08-22 Thread Joshua McKenzie
I think what Jordan is exploring, and I agree on, is that we need clear
next steps to help reduce the 75% ish increase in dtest runtime. For
sponsored contributors using circle to run the entire suites, throwing more
money at the problem through parallelization isn't a long-term solution.

On Sat, Aug 22, 2020 at 8:40 AM Jeremy Hanna 
wrote:

> I know the dtests take a long time and this will make them longer. As a
> counter point most people run individual dtests locally and the full set on
> dedicated test infrastructure. For the dedicated test infrastructure Mick
> also improved the wall clock runtime when parallelizing the dtests on
> https://issues.apache.org/jira/browse/CASSANDRA-16006.
>
> Even with the longer dtest full runtime, I firmly believe that for the
> sake of new users and how hard it is to change num_tokens once data is
> written, this change to the default of num_tokens is long overdue. Another
> hidden benefit of this change is that the dtests will now run bootstraps
> the way operators should run them in practice with the new defaults. That
> will make the more common default case much more tested and hopefully catch
> regressions in that execution path faster.
>
> So while it is not a trivial change in full dtest runtime, the benefits to
> the community and project are also not trivial. I’m really grateful to all
> who have put in effort to make this a reality and know that new users in
> 4.0 will benefit from these improved defaults.
>
> In other words my non binding vote is to merge this and look to improve
> execution time separately with that effort not being as urgent for the
> reasons stated above.
>
> Jeremy
>
> > On Aug 20, 2020, at 2:49 AM, Mick Semb Wever  wrote:
> >
> > It was agreed¹ that 4.0 should have the new configuration defaults of
> >  num_tokens: 16
> >  allocate_tokens_for_local_replication_factor: 3
> >
> > 13701's patches: against cassandra, cassandra-builds, cassandra-dtest,
> ccm;
> > are reviewed, tested, and ready to commit. But the ccm and dtest patches
> > required ccm having to now start nodes sequentially, and adding some
> longer
> > timeout values in the dtests.
> >
> > The consequence of this is CI runs now take longer. ci-cassandra.a.o's
> > dtests take ~30% longer, and circleci's dtests (with vnodes) have gone
> from
> > ~22 to ~43 minutes. The general opinion (on slack²) is to commit, and
> work
> > on improving ccm and dtest startup times in a subsequent ticket.
> >
> > 13701 was intended to be committed before the first beta release because
> of
> > its user-facing changes. But these numbers are significant enough it
> makes
> > sense to touch base with dev@
> >
> > Does anyone (strongly) object to the "commit + follow up ticket"
> approach?
> >
> > regards,
> > Mick
> >
> >
> > ¹ –
> >
> https://lists.apache.org/thread.html/ra829084fcf344e9e96fa5c61cb31e909c8629091993471594b65ea89%40%3Cdev.cassandra.apache.org%3E
> > ² – https://the-asf.slack.com/archives/CK23JSY2K/p1597747395032600 and
> >
> https://the-asf.slack.com/archives/CK23JSY2K/p1597849774078200?thread_ts=1597762085.048300&cid=CK23JSY2K
>


Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…

2020-08-22 Thread Jeremy Hanna
I know the dtests take a long time and this will make them longer. As a counter 
point most people run individual dtests locally and the full set on dedicated 
test infrastructure. For the dedicated test infrastructure Mick also improved 
the wall clock runtime when parallelizing the dtests on 
https://issues.apache.org/jira/browse/CASSANDRA-16006. 

Even with the longer dtest full runtime, I firmly believe that for the sake of 
new users and how hard it is to change num_tokens once data is written, this 
change to the default of num_tokens is long overdue. Another hidden benefit of 
this change is that the dtests will now run bootstraps the way operators should 
run them in practice with the new defaults. That will make the more common 
default case much more tested and hopefully catch regressions in that execution 
path faster.

So while it is not a trivial change in full dtest runtime, the benefits to the 
community and project are also not trivial. I’m really grateful to all who have 
put in effort to make this a reality and know that new users in 4.0 will 
benefit from these improved defaults.

In other words my non binding vote is to merge this and look to improve 
execution time separately with that effort not being as urgent for the reasons 
stated above.

Jeremy

> On Aug 20, 2020, at 2:49 AM, Mick Semb Wever  wrote:
> 
> It was agreed¹ that 4.0 should have the new configuration defaults of
>  num_tokens: 16
>  allocate_tokens_for_local_replication_factor: 3
> 
> 13701's patches: against cassandra, cassandra-builds, cassandra-dtest, ccm;
> are reviewed, tested, and ready to commit. But the ccm and dtest patches
> required ccm having to now start nodes sequentially, and adding some longer
> timeout values in the dtests.
> 
> The consequence of this is CI runs now take longer. ci-cassandra.a.o's
> dtests take ~30% longer, and circleci's dtests (with vnodes) have gone from
> ~22 to ~43 minutes. The general opinion (on slack²) is to commit, and work
> on improving ccm and dtest startup times in a subsequent ticket.
> 
> 13701 was intended to be committed before the first beta release because of
> its user-facing changes. But these numbers are significant enough it makes
> sense to touch base with dev@
> 
> Does anyone (strongly) object to the "commit + follow up ticket" approach?
> 
> regards,
> Mick
> 
> 
> ¹ –
> https://lists.apache.org/thread.html/ra829084fcf344e9e96fa5c61cb31e909c8629091993471594b65ea89%40%3Cdev.cassandra.apache.org%3E
> ² – https://the-asf.slack.com/archives/CK23JSY2K/p1597747395032600 and
> https://the-asf.slack.com/archives/CK23JSY2K/p1597849774078200?thread_ts=1597762085.048300&cid=CK23JSY2K