Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-29 Thread Jeremy Hanna
The new default wouldn't be retroactively set for 3.x, but the same principles 
apply.  The new algorithm is in 3.x as well as the simplification of the 
configuration.  So no reason not to use the same configuration on 3.x.

> On Jan 30, 2020, at 4:34 AM, Chen-Becker, Derek  
> wrote:
> 
> Does the same guidance apply to 3.x clusters? I read through the JIRA ticket 
> linked below, along with tickets that it links to, but it's not clear that 
> the new allocation algorithm is available in 3.x or if there are other 
> reasons that this would be problematic.
> 
> Thanks,
> 
> Derek
> 
> On 1/29/20, 9:54 AM, "Jon Haddad"  wrote:
> 
>Ive put a lot of my previous clients on 4 tokens, all of which have
>resulted in a major improvement.
> 
>I wouldn't use any more than 4 except under some pretty unusual
>circumstances.
> 
>Jon
> 
>On Wed, Jan 29, 2020, 11:18 AM Ben Bromhead  wrote:
> 
>> +1 to reducing the number of tokens as low as possible for availability
>> issues. 4 lgtm
>> 
>> On Wed, Jan 29, 2020 at 1:14 AM Dinesh Joshi  wrote:
>> 
>>> Thanks for restarting this discussion Jeremy. I personally think 4 is a
>>> good number as a default. I think whatever we pick, we should have enough
>>> documentation for operators to make sense of the new defaults in 4.0.
>>> 
>>> Dinesh
>>> 
 On Jan 28, 2020, at 9:25 PM, Jeremy Hanna 
>>> wrote:
 
 I wanted to start a discussion about the default for num_tokens that
>>> we'd like for people starting in Cassandra 4.0.  This is for ticket
>>> CASSANDRA-13701 
>>> (which has been duplicated a number of times, most recently by me).
 
 TLDR, based on availability concerns, skew concerns, operational
>>> concerns, and based on the fact that the new allocation algorithm can be
>>> configured fairly simply now, this is a proposal to go with 4 as the new
>>> default and the allocate_tokens_for_local_replication_factor set to 3.
>>> That gives a good experience out of the box for people and is the most
>>> conservative.  It does assume that racks and DCs have been configured
>>> correctly.  We would, of course, go into some detail in the NEWS.txt.
 
 Joey Lynch and Josh Snyder did an extensive analysis of availability
>>> concerns with high num_tokens/virtual nodes in their paper <
>>> 
>> http://mail-archives.apache.org/mod_mbox/cassandra-dev/201804.mbox/%3CCALShVHcz5PixXFO_4bZZZNnKcrpph-=5QmCyb0M=w-mhdyl...@mail.gmail.com%3E
>>> .
>>> This worsens as clusters grow larger.  I won't quote the paper here but
>> in
>>> order to have a conservative default and with the accompanying new
>>> allocation algorithm, I think it makes sense as a default.
 
 The difficulties have always been that virtual nodes have been
>>> beneficial for operations but that 256 is too high for the purposes of
>>> repair and as Joey and Josh cover, for availability.  Going lower with
>> the
>>> original allocation algorithm has produced skew in allocation in its
>> naive
>>> distribution.  Enter CASSANDRA-7032 <
>>> https://issues.apache.org/jira/browse/CASSANDRA-7032> and the new token
>>> allocation algorithm.  CASSANDRA-15260 <
>>> https://issues.apache.org/jira/browse/CASSANDRA-15260> makes the new
>>> algorithm operationally simpler.
 
 One other item of note - since Joey and Josh's analysis, there have
>> been
>>> improvements in streaming and other considerations that can reduce the
>>> probability of more than one node representing some token range being
>>> unavailable, but it would still be good to be conservative.
 
 Please chime in with any concerns with having num_tokens=4 and
>>> allocate_tokens_for_local_replication_factor=3 and the accompanying
>>> rationale so we can improve the experience for all users.
 
 Other resources:
 
>>> 
>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
 
>>> 
>> https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/config/configVnodes.html
 
>>> 
>> https://www.datastax.com/blog/2016/01/new-token-allocation-algorithm-cassandra-30
 
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
>> 
>> --
>> 
>> Ben Bromhead
>> 
>> Instaclustr | www.instaclustr.com | @instaclustr
>>  | (650) 284 9692
>> 
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-29 Thread Chen-Becker, Derek
Does the same guidance apply to 3.x clusters? I read through the JIRA ticket 
linked below, along with tickets that it links to, but it's not clear that the 
new allocation algorithm is available in 3.x or if there are other reasons that 
this would be problematic.

Thanks,

Derek

On 1/29/20, 9:54 AM, "Jon Haddad"  wrote:

Ive put a lot of my previous clients on 4 tokens, all of which have
resulted in a major improvement.

I wouldn't use any more than 4 except under some pretty unusual
circumstances.

Jon

On Wed, Jan 29, 2020, 11:18 AM Ben Bromhead  wrote:

> +1 to reducing the number of tokens as low as possible for availability
> issues. 4 lgtm
>
> On Wed, Jan 29, 2020 at 1:14 AM Dinesh Joshi  wrote:
>
> > Thanks for restarting this discussion Jeremy. I personally think 4 is a
> > good number as a default. I think whatever we pick, we should have 
enough
> > documentation for operators to make sense of the new defaults in 4.0.
> >
> > Dinesh
> >
> > > On Jan 28, 2020, at 9:25 PM, Jeremy Hanna 
> > wrote:
> > >
> > > I wanted to start a discussion about the default for num_tokens that
> > we'd like for people starting in Cassandra 4.0.  This is for ticket
> > CASSANDRA-13701 
> > (which has been duplicated a number of times, most recently by me).
> > >
> > > TLDR, based on availability concerns, skew concerns, operational
> > concerns, and based on the fact that the new allocation algorithm can be
> > configured fairly simply now, this is a proposal to go with 4 as the new
> > default and the allocate_tokens_for_local_replication_factor set to 3.
> > That gives a good experience out of the box for people and is the most
> > conservative.  It does assume that racks and DCs have been configured
> > correctly.  We would, of course, go into some detail in the NEWS.txt.
> > >
> > > Joey Lynch and Josh Snyder did an extensive analysis of availability
> > concerns with high num_tokens/virtual nodes in their paper <
> >
> 
http://mail-archives.apache.org/mod_mbox/cassandra-dev/201804.mbox/%3CCALShVHcz5PixXFO_4bZZZNnKcrpph-=5QmCyb0M=w-mhdyl...@mail.gmail.com%3E
> >.
> > This worsens as clusters grow larger.  I won't quote the paper here but
> in
> > order to have a conservative default and with the accompanying new
> > allocation algorithm, I think it makes sense as a default.
> > >
> > > The difficulties have always been that virtual nodes have been
> > beneficial for operations but that 256 is too high for the purposes of
> > repair and as Joey and Josh cover, for availability.  Going lower with
> the
> > original allocation algorithm has produced skew in allocation in its
> naive
> > distribution.  Enter CASSANDRA-7032 <
> > https://issues.apache.org/jira/browse/CASSANDRA-7032> and the new token
> > allocation algorithm.  CASSANDRA-15260 <
> > https://issues.apache.org/jira/browse/CASSANDRA-15260> makes the new
> > algorithm operationally simpler.
> > >
> > > One other item of note - since Joey and Josh's analysis, there have
> been
> > improvements in streaming and other considerations that can reduce the
> > probability of more than one node representing some token range being
> > unavailable, but it would still be good to be conservative.
> > >
> > > Please chime in with any concerns with having num_tokens=4 and
> > allocate_tokens_for_local_replication_factor=3 and the accompanying
> > rationale so we can improve the experience for all users.
> > >
> > > Other resources:
> > >
> >
> 
https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
> > >
> >
> 
https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/config/configVnodes.html
> > >
> >
> 
https://www.datastax.com/blog/2016/01/new-token-allocation-algorithm-cassandra-30
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>
> --
>
> Ben Bromhead
>
> Instaclustr | www.instaclustr.com | @instaclustr
>  | (650) 284 9692
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-29 Thread Jon Haddad
Ive put a lot of my previous clients on 4 tokens, all of which have
resulted in a major improvement.

I wouldn't use any more than 4 except under some pretty unusual
circumstances.

Jon

On Wed, Jan 29, 2020, 11:18 AM Ben Bromhead  wrote:

> +1 to reducing the number of tokens as low as possible for availability
> issues. 4 lgtm
>
> On Wed, Jan 29, 2020 at 1:14 AM Dinesh Joshi  wrote:
>
> > Thanks for restarting this discussion Jeremy. I personally think 4 is a
> > good number as a default. I think whatever we pick, we should have enough
> > documentation for operators to make sense of the new defaults in 4.0.
> >
> > Dinesh
> >
> > > On Jan 28, 2020, at 9:25 PM, Jeremy Hanna 
> > wrote:
> > >
> > > I wanted to start a discussion about the default for num_tokens that
> > we'd like for people starting in Cassandra 4.0.  This is for ticket
> > CASSANDRA-13701 
> > (which has been duplicated a number of times, most recently by me).
> > >
> > > TLDR, based on availability concerns, skew concerns, operational
> > concerns, and based on the fact that the new allocation algorithm can be
> > configured fairly simply now, this is a proposal to go with 4 as the new
> > default and the allocate_tokens_for_local_replication_factor set to 3.
> > That gives a good experience out of the box for people and is the most
> > conservative.  It does assume that racks and DCs have been configured
> > correctly.  We would, of course, go into some detail in the NEWS.txt.
> > >
> > > Joey Lynch and Josh Snyder did an extensive analysis of availability
> > concerns with high num_tokens/virtual nodes in their paper <
> >
> http://mail-archives.apache.org/mod_mbox/cassandra-dev/201804.mbox/%3CCALShVHcz5PixXFO_4bZZZNnKcrpph-=5QmCyb0M=w-mhdyl...@mail.gmail.com%3E
> >.
> > This worsens as clusters grow larger.  I won't quote the paper here but
> in
> > order to have a conservative default and with the accompanying new
> > allocation algorithm, I think it makes sense as a default.
> > >
> > > The difficulties have always been that virtual nodes have been
> > beneficial for operations but that 256 is too high for the purposes of
> > repair and as Joey and Josh cover, for availability.  Going lower with
> the
> > original allocation algorithm has produced skew in allocation in its
> naive
> > distribution.  Enter CASSANDRA-7032 <
> > https://issues.apache.org/jira/browse/CASSANDRA-7032> and the new token
> > allocation algorithm.  CASSANDRA-15260 <
> > https://issues.apache.org/jira/browse/CASSANDRA-15260> makes the new
> > algorithm operationally simpler.
> > >
> > > One other item of note - since Joey and Josh's analysis, there have
> been
> > improvements in streaming and other considerations that can reduce the
> > probability of more than one node representing some token range being
> > unavailable, but it would still be good to be conservative.
> > >
> > > Please chime in with any concerns with having num_tokens=4 and
> > allocate_tokens_for_local_replication_factor=3 and the accompanying
> > rationale so we can improve the experience for all users.
> > >
> > > Other resources:
> > >
> >
> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
> > >
> >
> https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/config/configVnodes.html
> > >
> >
> https://www.datastax.com/blog/2016/01/new-token-allocation-algorithm-cassandra-30
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>
> --
>
> Ben Bromhead
>
> Instaclustr | www.instaclustr.com | @instaclustr
>  | (650) 284 9692
>


Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-29 Thread Ben Bromhead
+1 to reducing the number of tokens as low as possible for availability
issues. 4 lgtm

On Wed, Jan 29, 2020 at 1:14 AM Dinesh Joshi  wrote:

> Thanks for restarting this discussion Jeremy. I personally think 4 is a
> good number as a default. I think whatever we pick, we should have enough
> documentation for operators to make sense of the new defaults in 4.0.
>
> Dinesh
>
> > On Jan 28, 2020, at 9:25 PM, Jeremy Hanna 
> wrote:
> >
> > I wanted to start a discussion about the default for num_tokens that
> we'd like for people starting in Cassandra 4.0.  This is for ticket
> CASSANDRA-13701 
> (which has been duplicated a number of times, most recently by me).
> >
> > TLDR, based on availability concerns, skew concerns, operational
> concerns, and based on the fact that the new allocation algorithm can be
> configured fairly simply now, this is a proposal to go with 4 as the new
> default and the allocate_tokens_for_local_replication_factor set to 3.
> That gives a good experience out of the box for people and is the most
> conservative.  It does assume that racks and DCs have been configured
> correctly.  We would, of course, go into some detail in the NEWS.txt.
> >
> > Joey Lynch and Josh Snyder did an extensive analysis of availability
> concerns with high num_tokens/virtual nodes in their paper <
> http://mail-archives.apache.org/mod_mbox/cassandra-dev/201804.mbox/%3CCALShVHcz5PixXFO_4bZZZNnKcrpph-=5QmCyb0M=w-mhdyl...@mail.gmail.com%3E>.
> This worsens as clusters grow larger.  I won't quote the paper here but in
> order to have a conservative default and with the accompanying new
> allocation algorithm, I think it makes sense as a default.
> >
> > The difficulties have always been that virtual nodes have been
> beneficial for operations but that 256 is too high for the purposes of
> repair and as Joey and Josh cover, for availability.  Going lower with the
> original allocation algorithm has produced skew in allocation in its naive
> distribution.  Enter CASSANDRA-7032 <
> https://issues.apache.org/jira/browse/CASSANDRA-7032> and the new token
> allocation algorithm.  CASSANDRA-15260 <
> https://issues.apache.org/jira/browse/CASSANDRA-15260> makes the new
> algorithm operationally simpler.
> >
> > One other item of note - since Joey and Josh's analysis, there have been
> improvements in streaming and other considerations that can reduce the
> probability of more than one node representing some token range being
> unavailable, but it would still be good to be conservative.
> >
> > Please chime in with any concerns with having num_tokens=4 and
> allocate_tokens_for_local_replication_factor=3 and the accompanying
> rationale so we can improve the experience for all users.
> >
> > Other resources:
> >
> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
> >
> https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/config/configVnodes.html
> >
> https://www.datastax.com/blog/2016/01/new-token-allocation-algorithm-cassandra-30
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
 | (650) 284 9692