date:20200128

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-28 Thread Dinesh Joshi

Thanks for restarting this discussion Jeremy. I personally think 4 is a good 
number as a default. I think whatever we pick, we should have enough 
documentation for operators to make sense of the new defaults in 4.0. 

Dinesh

> On Jan 28, 2020, at 9:25 PM, Jeremy Hanna  wrote:
> 
> I wanted to start a discussion about the default for num_tokens that we'd 
> like for people starting in Cassandra 4.0.  This is for ticket 
> CASSANDRA-13701  
> (which has been duplicated a number of times, most recently by me).
> 
> TLDR, based on availability concerns, skew concerns, operational concerns, 
> and based on the fact that the new allocation algorithm can be configured 
> fairly simply now, this is a proposal to go with 4 as the new default and the 
> allocate_tokens_for_local_replication_factor set to 3.  That gives a good 
> experience out of the box for people and is the most conservative.  It does 
> assume that racks and DCs have been configured correctly.  We would, of 
> course, go into some detail in the NEWS.txt.
> 
> Joey Lynch and Josh Snyder did an extensive analysis of availability concerns 
> with high num_tokens/virtual nodes in their paper 
> .
>   This worsens as clusters grow larger.  I won't quote the paper here but in 
> order to have a conservative default and with the accompanying new allocation 
> algorithm, I think it makes sense as a default.
> 
> The difficulties have always been that virtual nodes have been beneficial for 
> operations but that 256 is too high for the purposes of repair and as Joey 
> and Josh cover, for availability.  Going lower with the original allocation 
> algorithm has produced skew in allocation in its naive distribution.  Enter 
> CASSANDRA-7032  and the 
> new token allocation algorithm.  CASSANDRA-15260 
>  makes the new 
> algorithm operationally simpler.
> 
> One other item of note - since Joey and Josh's analysis, there have been 
> improvements in streaming and other considerations that can reduce the 
> probability of more than one node representing some token range being 
> unavailable, but it would still be good to be conservative.
> 
> Please chime in with any concerns with having num_tokens=4 and 
> allocate_tokens_for_local_replication_factor=3 and the accompanying rationale 
> so we can improve the experience for all users.
> 
> Other resources:
> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
> https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/config/configVnodes.html
> https://www.datastax.com/blog/2016/01/new-token-allocation-algorithm-cassandra-30
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

[Discuss] num_tokens default in Cassandra 4.0

2020-01-28 Thread Jeremy Hanna

I wanted to start a discussion about the default for num_tokens that we'd like
for people starting in Cassandra 4.0. This is for ticket CASSANDRA-13701
(which has been
duplicated a number of times, most recently by me).

TLDR, based on availability concerns, skew concerns, operational concerns, and
based on the fact that the new allocation algorithm can be configured fairly
simply now, this is a proposal to go with 4 as the new default and the
allocate_tokens_for_local_replication_factor set to 3. That gives a good
experience out of the box for people and is the most conservative. It does
assume that racks and DCs have been configured correctly. We would, of course,
go into some detail in the NEWS.txt.

Joey Lynch and Josh Snyder did an extensive analysis of availability concerns
with high num_tokens/virtual nodes in their paper
.
This worsens as clusters grow larger. I won't quote the paper here but in
order to have a conservative default and with the accompanying new allocation
algorithm, I think it makes sense as a default.

The difficulties have always been that virtual nodes have been beneficial for
operations but that 256 is too high for the purposes of repair and as Joey and
Josh cover, for availability. Going lower with the original allocation
algorithm has produced skew in allocation in its naive distribution. Enter
CASSANDRA-7032 and the
new token allocation algorithm. CASSANDRA-15260
makes the new algorithm
operationally simpler.

One other item of note - since Joey and Josh's analysis, there have been
improvements in streaming and other considerations that can reduce the
probability of more than one node representing some token range being
unavailable, but it would still be good to be conservative.

Please chime in with any concerns with having num_tokens=4 and
allocate_tokens_for_local_replication_factor=3 and the accompanying rationale
so we can improve the experience for all users.

Other resources:
https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/config/configVnodes.html
https://www.datastax.com/blog/2016/01/new-token-allocation-algorithm-cassandra-30

Re: Next round of releases…

2020-01-28 Thread Mick Semb Wever




> I think there is also an issue with 3.0.19+ which prevents starting up 
> Cassandra on Windows at all, thus ideally some sort of fix in that area 
> would be nice to be included as well.


The ticket is CASSANDRA-15426, and we'll wait til a fix is in.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

RE: Next round of releases…

2020-01-28 Thread Steinmaurer, Thomas

I think there is also an issue with 3.0.19+ which prevents starting up 
Cassandra on Windows at all, thus ideally some sort of fix in that area would 
be nice to be included as well.

Regards,
Thomas

-Original Message-
From: Mick Semb Wever 
Sent: Dienstag, 28. Jänner 2020 07:40
To: dev@cassandra.apache.org
Subject: Next round of releases…


Jeff brought up yesterday on #cassandra-dev the need for a round of releases, 
because of CASSANDRA-15400.

Does anyone object, or knows of anything in the works that needs to go into the 
next releases of 2.2, 3.0, 3.11, or trunk?

regards,
Mick

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4020 Linz, Austria, Am 
Fünfundzwanziger Turm 20


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Next round of releases…

2020-01-28 Thread Mick Semb Wever



> Jeff brought up yesterday on #cassandra-dev the need for a round of 
> releases, because of CASSANDRA-15400.
> 
> Does anyone object, or knows of anything in the works that needs to go 
> into the next releases of 2.2, 3.0, 3.11, or trunk?


I'll start the release process over the next few hours. Speak up if there's 
anything. 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [Discuss] num_tokens default in Cassandra 4.0

[Discuss] num_tokens default in Cassandra 4.0

Re: Next round of releases…

RE: Next round of releases…

Re: Next round of releases…

5 matches

Site Navigation

Mail list logo

Footer information