Re: [Discuss] num_tokens default in Cassandra 4.0

2020-02-04 Thread Carl Mueller
Your case seems to argue for completely eliminating vnodes. Which the Priam
people have been preaching for a long time.

There is not, certainly to a cassandra user-level person, good
documentation on the pros and cons of vnodes vs single tokens, and as we
see here the impacts of various vnode counts isn't an obvious/trivial
concept in a database system already loaded with nontrivial concepts.

I've never seen a good honest discussion on why vnodes made their way into
cassandra as the default, and 256 as the node count. They immediately broke
secondary indexes, and as max says they muddle the data distribution for
resiliency and scaling more than one node at once.

The only advantage seems to be "no manual token management" which to me was
just laziness at the tooling/nodetool level, and better streaming impacts
on node standup, although you seem completely limited to single node at a
time scaling which is a huge restriction.

Also, there is an inability to change the node count, which seems really
strange given than vnodes are supposed to be able to address heterogenous
node hardware and subsegment the data to a finer grain. I get that changing
node count would be a "hard problem", but the current solution is basically
"spin up a new datacenter with a new node count" which is a sucky solution.

RE: the racks point by Jeremiah, we do have rack alignment, and although I
understand theoretically that I should be able to do things with rack
aligned quorum safety (And did in some extreme instances of an LCS --> STCS
--> LCS local recompaction to force tombstone purges that were in
inaccessible sections of the LCS tree), the current warnings about scaling
simultaneously and lack of discussion on how we can use racks in the case
of vnodes to do so, and given all the tickets about problems with multiple
node scaling, we're kind of stuck.

I get that the ideal case of cassandra is gradually growing data with
balanced load growth.

But for more chaotic loads, which things like IoT fleets coming online at
once and misbehaving networks of IoT devices, it would be really nice to
increase our load scaling abilities. We are kind of stuck with vertical
node scaling, which has rapidly diminishing returns, and spinning up nodes
one at a time.

Vnode count seems to impact all of this in some ways, and in opaque ways.

Anyway, I'm fine with 16, agree that token selection should be improved,
but think a priority for adding the ability to change node counts online
should be explored, even if it involves slowly picking off 1 vnode at a
time from one machine. VNode evolution would be very rare, more rare than
version upgrades.

On Mon, Feb 3, 2020 at 11:07 PM Max C.  wrote:

> Let’s say you have a 6 node cluster, with RF=3, and no vnodes.  In that
> case each piece of data is stored as follows:
>
> : 
> *N1: N2 N3*
> N2: N3 N4
> N3: N4 N5
> *N4: N5 N6*
> N5: N6 N1
> N6: N1 N2
>
> With this setup, there are some circumstances where you could lose 2 nodes
> (ex: N1 & N4) and still be able to maintain CL=quorum.  If your cluster is
> very large, then you could lose even more — and that’s a good thing,
> because if you have hundreds/thousands of nodes then you don’t want the
> world to come tumbling down if  > 1 node is down.  Or maybe you want to
> upgrade the OS on your nodes, and want to (with very careful planning!) do
> it by taking down more than 1 node at a time.
>
> … but if you have a large number of vnodes, then a given node will share a
> small segment of data with LOTS of other nodes, which destroys this
> property.  The more vnodes, the less likely you’re able to handle > 1 node
> down.
>
> For example, see this diagram in the Datastax docs —
>
>
> https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes
>
> In that bottom picture, you can’t knock out 2 nodes and still maintain
> CL=quorum.  Ex:  If you knock out node 1 & 4, then ranges B & L would no
> longer meet CL=quorum;  but you can do that in the top diagram, since there
> are no ranges shared between node 1 & 4.
>
> Hope that helps.
>
> - Max
>
>
> On Feb 3, 2020, at 8:39 pm, onmstester onmstester <
> onmstes...@zoho.com.INVALID> wrote:
>
> Sorry if its trivial, but i do not understand how num_tokens affects
> availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost
> at most one node and all of the tokens assigned to that node would be also
> assigned to two other nodes no matter what num_tokens is, right?
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>  Forwarded message 
> From: Jon Haddad 
> To: 
> Date: Tue, 04 Feb 2020 01:15:21 +0330
> Subject: Re: [Discuss] num_tokens default in Cassandra

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-02-04 Thread Jeremiah D Jordan
JustFYI if being able to operationally do things many nodes at a time, you 
should look at setting up racks.  With num racks = RF you can take down all 
nodes in a given rack at once without affecting LOCAL_QUORUM.  Your single 
token example has the same functionality in this respect as a vnodes cluster 
using racks (and actually if you setup a single token cluster using racks you 
would have setup nodes N1 and N4 to be in the same rack).

> On Feb 3, 2020, at 11:07 PM, Max C.  wrote:
> 
> Let’s say you have a 6 node cluster, with RF=3, and no vnodes.  In that case 
> each piece of data is stored as follows:
> 
> : 
> N1: N2 N3
> N2: N3 N4
> N3: N4 N5
> N4: N5 N6
> N5: N6 N1
> N6: N1 N2
> 
> With this setup, there are some circumstances where you could lose 2 nodes 
> (ex: N1 & N4) and still be able to maintain CL=quorum.  If your cluster is 
> very large, then you could lose even more — and that’s a good thing, because 
> if you have hundreds/thousands of nodes then you don’t want the world to come 
> tumbling down if  > 1 node is down.  Or maybe you want to upgrade the OS on 
> your nodes, and want to (with very careful planning!) do it by taking down 
> more than 1 node at a time.
> 
> … but if you have a large number of vnodes, then a given node will share a 
> small segment of data with LOTS of other nodes, which destroys this property. 
>  The more vnodes, the less likely you’re able to handle > 1 node down.
> 
> For example, see this diagram in the Datastax docs —
> 
> https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes
>  
> <https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes>
> 
> In that bottom picture, you can’t knock out 2 nodes and still maintain 
> CL=quorum.  Ex:  If you knock out node 1 & 4, then ranges B & L would no 
> longer meet CL=quorum;  but you can do that in the top diagram, since there 
> are no ranges shared between node 1 & 4.
> 
> Hope that helps.
> 
> - Max
> 
> 
>> On Feb 3, 2020, at 8:39 pm, onmstester onmstester 
>> mailto:onmstes...@zoho.com.INVALID>> wrote:
>> 
>> Sorry if its trivial, but i do not understand how num_tokens affects 
>> availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost 
>> at most one node and all of the tokens assigned to that node would be also 
>> assigned to two other nodes no matter what num_tokens is, right?
>> 
>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>> 
>> 
>> ==== Forwarded message 
>> From: Jon Haddad mailto:j...@jonhaddad.com>>
>> To: mailto:d...@cassandra.apache.org>>
>> Date: Tue, 04 Feb 2020 01:15:21 +0330
>> Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
>>  Forwarded message 
>> 
>> I think it's a good idea to take a step back and get a high level view of 
>> the problem we're trying to solve. 
>> 
>> First, high token counts result in decreased availability as each node has 
>> data overlap with with more nodes in the cluster. Specifically, a node can 
>> share data with RF-1 * 2 * num_tokens. So a 256 token cluster at RF=3 is 
>> going to almost always share data with every other node in the cluster that 
>> isn't in the same rack, unless you're doing something wild like using more 
>> than a thousand nodes in a cluster. We advertise 
>> 
>> With 16 tokens, that is vastly improved, but you still have up to 64 nodes 
>> each node needs to query against, so you're again, hitting every node 
>> unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs). I 
>> wouldn't use 16 here, and I doubt any of you would either. I've advocated 
>> for 4 tokens because you'd have overlap with only 16 nodes, which works 
>> well for small clusters as well as large. Assuming I was creating a new 
>> cluster for myself (in a hypothetical brand new application I'm building) I 
>> would put this in production. I have worked with several teams where I 
>> helped them put 4 token clusters in prod and it has worked very well. We 
>> didn't see any wild imbalance issues. 
>> 
>> As Mick's pointed out, our current method of using random token assignment 
>> for the default number of problematic for 4 tokens. I fully agree with 
>> this, and I think if we were to try to use 4 tokens, we'd want to address 
>> this in tandem. We can discuss how to better allocate tokens by default 
>> (something more predictable than random), but I'd like to avoid the 
>&g

Fwd: Re: [Discuss] num_tokens default in Cassandra 4.0

2020-02-03 Thread onmstester onmstester
Thank you so much



Sent using https://www.zoho.com/mail/






 Forwarded message 
From: Max C. 
To: 
Date: Tue, 04 Feb 2020 08:37:21 +0330
Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
 Forwarded message 



Let’s say you have a 6 node cluster, with RF=3, and no vnodes.  In that case 
each piece of data is stored as follows:



: 

N1: N2 N3

N2: N3 N4

N3: N4 N5

N4: N5 N6

N5: N6 N1

N6: N1 N2



With this setup, there are some circumstances where you could lose 2 nodes (ex: 
N1 & N4) and still be able to maintain CL=quorum.  If your cluster is very 
large, then you could lose even more — and that’s a good thing, because if you 
have hundreds/thousands of nodes then you don’t want the world to come tumbling 
down if  > 1 node is down.  Or maybe you want to upgrade the OS on your nodes, 
and want to (with very careful planning!) do it by taking down more than 1 node 
at a time.



… but if you have a large number of vnodes, then a given node will share a 
small segment of data with LOTS of other nodes, which destroys this property.  
The more vnodes, the less likely you’re able to handle > 1 node down.



For example, see this diagram in the Datastax docs —



https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes



In that bottom picture, you can’t knock out 2 nodes and still maintain 
CL=quorum.  Ex:  If you knock out node 1 & 4, then ranges B & L would no longer 
meet CL=quorum;  but you can do that in the top diagram, since there are no 
ranges shared between node 1 & 4.



Hope that helps.



- Max





On Feb 3, 2020, at 8:39 pm, onmstester onmstester 
<mailto:onmstes...@zoho.com.INVALID> wrote:



Sorry if its trivial, but i do not understand how num_tokens affects 
availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost at 
most one node and all of the tokens assigned to that node would be also 
assigned to two other nodes no matter what num_tokens is, right?



Sent using https://www.zoho.com/mail/






 Forwarded message 
From: Jon Haddad <mailto:j...@jonhaddad.com>
To: <mailto:d...@cassandra.apache.org>
Date: Tue, 04 Feb 2020 01:15:21 +0330
Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
 Forwarded message 



I think it's a good idea to take a step back and get a high level view of 
the problem we're trying to solve. 
 
First, high token counts result in decreased availability as each node has 
data overlap with with more nodes in the cluster.  Specifically, a node can 
share data with RF-1 * 2 * num_tokens.  So a 256 token cluster at RF=3 is 
going to almost always share data with every other node in the cluster that 
isn't in the same rack, unless you're doing something wild like using more 
than a thousand nodes in a cluster.  We advertise 
 
With 16 tokens, that is vastly improved, but you still have up to 64 nodes 
each node needs to query against, so you're again, hitting every node 
unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs).  I 
wouldn't use 16 here, and I doubt any of you would either.  I've advocated 
for 4 tokens because you'd have overlap with only 16 nodes, which works 
well for small clusters as well as large.  Assuming I was creating a new 
cluster for myself (in a hypothetical brand new application I'm building) I 
would put this in production.  I have worked with several teams where I 
helped them put 4 token clusters in prod and it has worked very well.  We 
didn't see any wild imbalance issues. 
 
As Mick's pointed out, our current method of using random token assignment 
for the default number of problematic for 4 tokens.  I fully agree with 
this, and I think if we were to try to use 4 tokens, we'd want to address 
this in tandem.  We can discuss how to better allocate tokens by default 
(something more predictable than random), but I'd like to avoid the 
specifics of that for the sake of this email. 
 
To Alex's point, repairs are problematic with lower token counts due to 
over streaming.  I think this is a pretty serious issue and I we'd have to 
address it before going all the way down to 4.  This, in my opinion, is a 
more complex problem to solve and I think trying to fix it here could make 
shipping 4.0 take even longer, something none of us want. 
 
For the sake of shipping 4.0 without adding extra overhead and time, I'm ok 
with moving to 16 tokens, and in the process adding extensive documentation 
outlining what we recommend for production use.  I think we should also try 
to figure out something better than random as the default to fix the data 
imbalance issues.  I've got a few ideas here I've been noodling on. 
 
As long as folks are fine with potentially changing the default again in C* 
5.0 (after another discussion / debate), 16 is enough of an improvement 
that I'm O

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-02-03 Thread Max C.
Let’s say you have a 6 node cluster, with RF=3, and no vnodes.  In that case 
each piece of data is stored as follows:

: 
N1: N2 N3
N2: N3 N4
N3: N4 N5
N4: N5 N6
N5: N6 N1
N6: N1 N2

With this setup, there are some circumstances where you could lose 2 nodes (ex: 
N1 & N4) and still be able to maintain CL=quorum.  If your cluster is very 
large, then you could lose even more — and that’s a good thing, because if you 
have hundreds/thousands of nodes then you don’t want the world to come tumbling 
down if  > 1 node is down.  Or maybe you want to upgrade the OS on your nodes, 
and want to (with very careful planning!) do it by taking down more than 1 node 
at a time.

… but if you have a large number of vnodes, then a given node will share a 
small segment of data with LOTS of other nodes, which destroys this property.  
The more vnodes, the less likely you’re able to handle > 1 node down.

For example, see this diagram in the Datastax docs —

https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes
 
<https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes>

In that bottom picture, you can’t knock out 2 nodes and still maintain 
CL=quorum.  Ex:  If you knock out node 1 & 4, then ranges B & L would no longer 
meet CL=quorum;  but you can do that in the top diagram, since there are no 
ranges shared between node 1 & 4.

Hope that helps.

- Max


> On Feb 3, 2020, at 8:39 pm, onmstester onmstester 
>  wrote:
> 
> Sorry if its trivial, but i do not understand how num_tokens affects 
> availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost 
> at most one node and all of the tokens assigned to that node would be also 
> assigned to two other nodes no matter what num_tokens is, right?
> 
> Sent using Zoho Mail <https://www.zoho.com/mail/>
> 
> 
>  Forwarded message 
> From: Jon Haddad mailto:j...@jonhaddad.com>>
> To: mailto:d...@cassandra.apache.org>>
> Date: Tue, 04 Feb 2020 01:15:21 +0330
> Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
>  Forwarded message 
> 
> I think it's a good idea to take a step back and get a high level view of 
> the problem we're trying to solve. 
> 
> First, high token counts result in decreased availability as each node has 
> data overlap with with more nodes in the cluster. Specifically, a node can 
> share data with RF-1 * 2 * num_tokens. So a 256 token cluster at RF=3 is 
> going to almost always share data with every other node in the cluster that 
> isn't in the same rack, unless you're doing something wild like using more 
> than a thousand nodes in a cluster. We advertise 
> 
> With 16 tokens, that is vastly improved, but you still have up to 64 nodes 
> each node needs to query against, so you're again, hitting every node 
> unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs). I 
> wouldn't use 16 here, and I doubt any of you would either. I've advocated 
> for 4 tokens because you'd have overlap with only 16 nodes, which works 
> well for small clusters as well as large. Assuming I was creating a new 
> cluster for myself (in a hypothetical brand new application I'm building) I 
> would put this in production. I have worked with several teams where I 
> helped them put 4 token clusters in prod and it has worked very well. We 
> didn't see any wild imbalance issues. 
> 
> As Mick's pointed out, our current method of using random token assignment 
> for the default number of problematic for 4 tokens. I fully agree with 
> this, and I think if we were to try to use 4 tokens, we'd want to address 
> this in tandem. We can discuss how to better allocate tokens by default 
> (something more predictable than random), but I'd like to avoid the 
> specifics of that for the sake of this email. 
> 
> To Alex's point, repairs are problematic with lower token counts due to 
> over streaming. I think this is a pretty serious issue and I we'd have to 
> address it before going all the way down to 4. This, in my opinion, is a 
> more complex problem to solve and I think trying to fix it here could make 
> shipping 4.0 take even longer, something none of us want. 
> 
> For the sake of shipping 4.0 without adding extra overhead and time, I'm ok 
> with moving to 16 tokens, and in the process adding extensive documentation 
> outlining what we recommend for production use. I think we should also try 
> to figure out something better than random as the default to fix the data 
> imbalance issues. I've got a few ideas here I've been noodling on. 
> 
> As long as folks are fine with potentially changing the default again 

Re: Re: [Discuss] num_tokens default in Cassandra 4.0

2020-02-03 Thread Jeff Jirsa
The more vnodes you have on each host, the more likely it becomes that any
2 hosts are adjacent/neighbors/replicas.


On Mon, Feb 3, 2020 at 8:39 PM onmstester onmstester
 wrote:

> Sorry if its trivial, but i do not understand how num_tokens affects
> availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost
> at most one node and all of the tokens assigned to that node would be also
> assigned to two other nodes no matter what num_tokens is, right?
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>  Forwarded message 
> From: Jon Haddad 
> To: 
> Date: Tue, 04 Feb 2020 01:15:21 +0330
> Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
>  Forwarded message 
>
> I think it's a good idea to take a step back and get a high level view of
> the problem we're trying to solve.
>
> First, high token counts result in decreased availability as each node has
> data overlap with with more nodes in the cluster. Specifically, a node can
> share data with RF-1 * 2 * num_tokens. So a 256 token cluster at RF=3 is
> going to almost always share data with every other node in the cluster
> that
> isn't in the same rack, unless you're doing something wild like using more
> than a thousand nodes in a cluster. We advertise
>
> With 16 tokens, that is vastly improved, but you still have up to 64 nodes
> each node needs to query against, so you're again, hitting every node
> unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs). I
> wouldn't use 16 here, and I doubt any of you would either. I've advocated
> for 4 tokens because you'd have overlap with only 16 nodes, which works
> well for small clusters as well as large. Assuming I was creating a new
> cluster for myself (in a hypothetical brand new application I'm building)
> I
> would put this in production. I have worked with several teams where I
> helped them put 4 token clusters in prod and it has worked very well. We
> didn't see any wild imbalance issues.
>
> As Mick's pointed out, our current method of using random token assignment
> for the default number of problematic for 4 tokens. I fully agree with
> this, and I think if we were to try to use 4 tokens, we'd want to address
> this in tandem. We can discuss how to better allocate tokens by default
> (something more predictable than random), but I'd like to avoid the
> specifics of that for the sake of this email.
>
> To Alex's point, repairs are problematic with lower token counts due to
> over streaming. I think this is a pretty serious issue and I we'd have to
> address it before going all the way down to 4. This, in my opinion, is a
> more complex problem to solve and I think trying to fix it here could make
> shipping 4.0 take even longer, something none of us want.
>
> For the sake of shipping 4.0 without adding extra overhead and time, I'm
> ok
> with moving to 16 tokens, and in the process adding extensive
> documentation
> outlining what we recommend for production use. I think we should also try
> to figure out something better than random as the default to fix the data
> imbalance issues. I've got a few ideas here I've been noodling on.
>
> As long as folks are fine with potentially changing the default again in
> C*
> 5.0 (after another discussion / debate), 16 is enough of an improvement
> that I'm OK with the change, and willing to author the docs to help people
> set up their first cluster. For folks that go into production with the
> defaults, we're at least not setting them up for total failure once their
> clusters get large like we are now.
>
> In future versions, we'll probably want to address the issue of data
> imbalance by building something in that shifts individual tokens around. I
> don't think we should try to do this in 4.0 either.
>
> Jon
>
>
>
> On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna 
> wrote:
>
> > I think Mick and Anthony make some valid operational and skew points for
> > smaller/starting clusters with 4 num_tokens. There’s an arbitrary line
> > between small and large clusters but I think most would agree that most
> > clusters are on the small to medium side. (A small nuance is afaict the
> > probabilities have to do with quorum on a full token range, ie it has to
> do
> > with the size of a datacenter not the full cluster
> >
> > As I read this discussion I’m personally more inclined to go with 16 for
> > now. It’s true that if we could fix the skew and topology gotchas for
> those
> > starting things up, 4 would be ideal from an availability perspective.
> > However we’re still in the brainstorming stage for how to address those
> > challenges. I think we should

Fwd: Re: [Discuss] num_tokens default in Cassandra 4.0

2020-02-03 Thread onmstester onmstester
Sorry if its trivial, but i do not understand how num_tokens affects 
availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost at 
most one node and all of the tokens assigned to that node would be also 
assigned to two other nodes no matter what num_tokens is, right?


Sent using https://www.zoho.com/mail/




 Forwarded message 
From: Jon Haddad <mailto:j...@jonhaddad.com>
To: <mailto:d...@cassandra.apache.org>
Date: Tue, 04 Feb 2020 01:15:21 +0330
Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
 Forwarded message 


I think it's a good idea to take a step back and get a high level view of 
the problem we're trying to solve. 
 
First, high token counts result in decreased availability as each node has 
data overlap with with more nodes in the cluster.  Specifically, a node can 
share data with RF-1 * 2 * num_tokens.  So a 256 token cluster at RF=3 is 
going to almost always share data with every other node in the cluster that 
isn't in the same rack, unless you're doing something wild like using more 
than a thousand nodes in a cluster.  We advertise 
 
With 16 tokens, that is vastly improved, but you still have up to 64 nodes 
each node needs to query against, so you're again, hitting every node 
unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs).  I 
wouldn't use 16 here, and I doubt any of you would either.  I've advocated 
for 4 tokens because you'd have overlap with only 16 nodes, which works 
well for small clusters as well as large.  Assuming I was creating a new 
cluster for myself (in a hypothetical brand new application I'm building) I 
would put this in production.  I have worked with several teams where I 
helped them put 4 token clusters in prod and it has worked very well.  We 
didn't see any wild imbalance issues. 
 
As Mick's pointed out, our current method of using random token assignment 
for the default number of problematic for 4 tokens.  I fully agree with 
this, and I think if we were to try to use 4 tokens, we'd want to address 
this in tandem.  We can discuss how to better allocate tokens by default 
(something more predictable than random), but I'd like to avoid the 
specifics of that for the sake of this email. 
 
To Alex's point, repairs are problematic with lower token counts due to 
over streaming.  I think this is a pretty serious issue and I we'd have to 
address it before going all the way down to 4.  This, in my opinion, is a 
more complex problem to solve and I think trying to fix it here could make 
shipping 4.0 take even longer, something none of us want. 
 
For the sake of shipping 4.0 without adding extra overhead and time, I'm ok 
with moving to 16 tokens, and in the process adding extensive documentation 
outlining what we recommend for production use.  I think we should also try 
to figure out something better than random as the default to fix the data 
imbalance issues.  I've got a few ideas here I've been noodling on. 
 
As long as folks are fine with potentially changing the default again in C* 
5.0 (after another discussion / debate), 16 is enough of an improvement 
that I'm OK with the change, and willing to author the docs to help people 
set up their first cluster.  For folks that go into production with the 
defaults, we're at least not setting them up for total failure once their 
clusters get large like we are now. 
 
In future versions, we'll probably want to address the issue of data 
imbalance by building something in that shifts individual tokens around.  I 
don't think we should try to do this in 4.0 either. 
 
Jon 
 
 
 
On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna 
<mailto:jeremy.hanna1...@gmail.com> 
wrote: 
 
> I think Mick and Anthony make some valid operational and skew points for 
> smaller/starting clusters with 4 num_tokens. There’s an arbitrary line 
> between small and large clusters but I think most would agree that most 
> clusters are on the small to medium side. (A small nuance is afaict the 
> probabilities have to do with quorum on a full token range, ie it has to do 
> with the size of a datacenter not the full cluster 
> 
> As I read this discussion I’m personally more inclined to go with 16 for 
> now. It’s true that if we could fix the skew and topology gotchas for those 
> starting things up, 4 would be ideal from an availability perspective. 
> However we’re still in the brainstorming stage for how to address those 
> challenges. I think we should create tickets for those issues and go with 
> 16 for 4.0. 
> 
> This is about an out of the box experience. It balances availability, 
> operations (such as skew and general bootstrap friendliness and 
> streaming/repair), and cluster sizing. Balancing all of those, I think for 
> now I’m more comfortable with 16 as the default with docs on considerations 
> and tickets to unblock 4 as the default for all users