Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Mick Semb Wever
> TLDR, based on availability concerns, skew concerns, operational > concerns, and based on the fact that the new allocation algorithm can > be configured fairly simply now, this is a proposal to go with 4 as the > new default and the allocate_tokens_for_local_replication_factor set to > 3.

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Alexander Dejanovski
While I (mostly) understand the maths behind using 4 vnodes as a default (which really is a question of extreme availability), I don't think they provide noticeable performance improvements over using 16, while 16 vnodes will protect folks from imbalances. It is very hard to deal with unbalanced

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Joshua McKenzie
> > We should be using the default value that benefits the most people, rather > than an arbitrary compromise. I'd caution we're talking about the default value *we believe* will benefit the most people according to our respective understandings of C* usage. Most clusters don't shrink, they

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Dimitar Dimitrov
Hey all, At some point not too long ago I spent some time trying to make the token allocation algorithm the default. I didn't foresee it, although it might be obvious for many of you, but one corollary of the way the algorithm works (or more precisely might not work) with multiple seeds or

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Carl Mueller
So why even have virtual nodes at all, why not work on improving single token approaches so that we can support cluster doubling, which IMO would enable cassandra to more quickly scale for volatile loads? It's my guess/understanding that vnodes eliminate the token rebalancing that existed back in

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Carl Mueller
"large/giant clusters and admins are the target audience for the value we select" There are reasons aside from massive scale to pick cassandra, but the primary reason cassandra is selected technically is to support vertically scaling to large clusters. Why pick a value that once you reach scale

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Carl Mueller
edit: 4 is bad at small cluster sizes and could scare off adoption On Fri, Jan 31, 2020 at 12:15 PM Carl Mueller wrote: > "large/giant clusters and admins are the target audience for the value we > select" > > There are reasons aside from massive scale to pick cassandra, but the > primary

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Jeff Jirsa
On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch wrote: > I think that we might be bikeshedding this number a bit because it is easy > to debate and there is not yet one right answer. > https://www.youtube.com/watch?v=v465T5u9UKo

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Michael Shuler
On 1/31/20 9:58 AM, Dimitar Dimitrov wrote: one corollary of the way the algorithm works (or more precisely might not work) with multiple seeds or simultaneous multi-node bootstraps or decommissions, is that a lot of dtests start failing due to deterministic token conflicts. I wasn't able to fix

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Joseph Lynch
I think that we might be bikeshedding this number a bit because it is easy to debate and there is not yet one right answer. I hope we recognize either choice (4 or 16) is fine in that users can always override us and we can always change our minds later or better yet improve allocation so users

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Jeremy Hanna
I think Mick and Anthony make some valid operational and skew points for smaller/starting clusters with 4 num_tokens. There’s an arbitrary line between small and large clusters but I think most would agree that most clusters are on the small to medium side. (A small nuance is afaict the

Re: [VOTE] Release Apache Cassandra 4.0-alpha3

2020-01-31 Thread Yuji Ito
+1 (non-binding) I've briefly tested the build with Jepsen. https://github.com/scalar-labs/scalar-jepsen 2020年1月31日(金) 13:37 Anthony Grasso : > +1 (non-binding) > > On Fri, 31 Jan 2020 at 08:48, Joshua McKenzie > wrote: > > > +1 > > > > On Thu, Jan 30, 2020 at 4:31 PM Brandon Williams >