Re: Update defaults for 4.0?

2020-07-08 Thread Jeremy Hanna
hese, maybe we could consider to change other as
> > > well?
> > > > > > Like:
> > > > > >
> > > > > > 1. bump roles_validity_in_ms, permissions_validity_in_ms, and
> > > > > >credentials_validity_in_ms as well - maybe at least to a
> minute,
> > > or
> > > > > 2. I
> > > > > >have seen multiple times when authentication was failing under
> > the
> > > > > heavy
> > > > > >load because queries to system tables were timing out - with
> > these
> > > > > >defaults people may still have the possibility to get updates
> to
> > > > > >roles/credentials faster when specifying _update_interval_
> > > variants
> > > > of
> > > > > >these configurations.
> > > > > > 2. change default snitch from SimpleSnitch to
> > > > > GossipingPropertyFileSnitch -
> > > > > >we're anyway saying that SimpleSnitch is only appropriate for
> > > > > >single-datacenter deployments, and for real production we need
> > to
> > > > use
> > > > > >GossipingPropertyFileSnitch - why not to set it as default?
> > > > > >
> > > > > >
> > > > > > Jeremy Hanna  at "Wed, 22 Jan 2020 11:22:36 +1100" wrote:
> > > > > >  JH> I mentioned this in the contributor meeting as a topic to
> > bring
> > > up
> > > > > on
> > > > > > the list - should we
> > > > > >  JH> take the opportunity to update defaults for Cassandra 4.0?
> > > > > >
> > > > > >  JH> The rationale is two-fold:
> > > > > >  JH> 1) There are best practices and tribal knowledge around
> > certain
> > > > > > properties where people
> > > > > >  JH> just know to update those properties immediately as a
> starting
> > > > > > point.  If it's pretty much
> > > > > >  JH> a given that we set something as a starting point different
> > than
> > > > the
> > > > > > current defaults, why
> > > > > >  JH> not make that the new default?
> > > > > >  JH> 2) We should align the defaults with what we test with.
> There
> > > may
> > > > > be
> > > > > > exceptions if we
> > > > > >  JH> have one-off tests but on the whole, we should be testing
> with
> > > > > > defaults.
> > > > > >
> > > > > >  JH> As a starting point, compaction throughput and number of
> > vnodes
> > > > seem
> > > > > > like good candidates
> > > > > >  JH> but it would be great to get feedback for any others.
> > > > > >
> > > > > >  JH> For compaction throughput (
> > > > > > https://jira.apache.org/jira/browse/CASSANDRA-14902), I've made
> > > > > >  JH> a basic case on the ticket to default to 64 just as a
> starting
> > > > point
> > > > > > because the decision
> > > > > >  JH> for 16 was made when spinning disk was most common.  Hence
> > most
> > > > > > people I know change that
> > > > > >  JH> and I think without too much bikeshedding, 64 is a
> reasonable
> > > > > > starting point.  A case
> > > > > >  JH> could be made that empirically the compaction throughput
> > > throttle
> > > > > may
> > > > > > have less effect
> > > > > >  JH> than many people think, but I still think an updated default
> > > would
> > > > > > make sense.
> > > > > >
> > > > > >  JH> For number of vnodes, Michael Shuler made the point in the
> > > > > discussion
> > > > > > that we already test
> > > > > >  JH> with 32, which is a far better number than the 256
> default.  I
> > > > know
> > > > > > many new users that
> > > > > >  JH> just leave the 256 default and then discover later that it's
> > > > better
> > > > > > to go lower.  I think
> > > > > >  JH> 32 is a good balance.  One could go lower with the new
> > algorithm
> > > > but
> > > > > > I think 32 is much
> > > > &

Re: Update defaults for 4.0?

2020-01-24 Thread Alexander Dejanovski
I support changing the default GC settings. The ones we have now drive me
nuts.
We should raise the max heap size for CMS to 16G instead of 8 now. We
should still not go higher than half the available RAM.
Also, we should set a new gen size between 40% and 50% of the heap size.
The 100MB per core rule for computing the new gen size doesn't make any
sense IMO (at least in the context of Cassandra).

This is one of the most common optimizations we make on clusters, and most
peeps that run Cassandra aren't GC experts (and shouldn't be).

-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


On Fri, Jan 24, 2020 at 3:48 PM Joshua McKenzie 
wrote:

> >
> > I'm unable to create an epic in the project - not sure if that has to do
> > with project permissions.  Could someone create an epic and link these
> > tickets as subtasks?
>
>
> Just realized I can no longer create epics anymore (or the "new" JIRA UI is
> just so obtuse I can't figure it out. I give it 50/50 odds). Thinking this
> may have been due to transition to LDAP.
>
> Since I planned on experimenting with the whole "what does the confluent
> testing page look like in an epic" today, I'll ping Nate once it's not
> godawfully early in NZ about this. Or he'll read this email, either way. :)
>
> On Thu, Jan 23, 2020 at 11:13 PM Jeremy Hanna 
> wrote:
>
> > I've previously created
> > https://issues.apache.org/jira/browse/CASSANDRA-14902 for updating the
> > compaction_throughput_in_mb default.  I created
> > https://issues.apache.org/jira/browse/CASSANDRA-15521 for updating the
> > num_tokens default,
> https://issues.apache.org/jira/browse/CASSANDRA-15522
> > for updating the [roles|permissions|credentials]_validity_in_ms defaults,
> > and https://issues.apache.org/jira/browse/CASSANDRA-15523 for updating
> the
> > default snitch to GossipingPropertyFileSnitch.
> > I'm unable to create an epic in the project - not sure if that has to do
> > with project permissions.  Could someone create an epic and link these
> > tickets as subtasks?
> > Jon - would you mind creating the ticket around JVM defaults?  Are you
> > thinking of the default GC and settings for a better out of the box
> > experience?
> > Thanks all,
> > Jeremy
> >
> > On Fri, Jan 24, 2020 at 1:57 PM Jon Haddad  wrote:
> >
> > > Yes. please do. We should also update our JVM defaults.
> > >
> > > On Thu, Jan 23, 2020, 9:28 PM Jeremy Hanna  >
> > > wrote:
> > >
> > > > To summarize this thread, I think people are generally okay with
> > updating
> > > > certain defaults for 4.0 provided we make sure it doesn't
> unpleasantly
> > > > surprise cluster operators.  I think with the num_tokens and
> > > > compaction_throughput_in_mb we could go with a release note for the
> > > reasons
> > > > in my last email.  I also agree that we should consider bump
> > > > roles_validity_in_ms, permissions_validity_in_ms, and
> > > >credentials_validity_in_ms along with the default snitch (going to
> > > GPFS
> > > > as the default) as that gives people a DC aware default at least to
> > > start.
> > > >
> > > > Is everyone okay if I create tickets for each of these and link them
> > with
> > > > an epic so that we can discuss them separately?
> > > >
> > > > Thanks,
> > > >
> > > > Jeremy
> > > >
> > > > On Thu, Jan 23, 2020 at 5:34 AM Alex Ott  wrote:
> > > >
> > > > > In addition to these, maybe we could consider to change other as
> > well?
> > > > > Like:
> > > > >
> > > > > 1. bump roles_validity_in_ms, permissions_validity_in_ms, and
> > > > >credentials_validity_in_ms as well - maybe at least to a minute,
> > or
> > > > 2. I
> > > > >have seen multiple times when authentication was failing under
> the
> > > > heavy
> > > > >load because queries to system tables were timing out - with
> these
> > > > >defaults people may still have the possibility to get updates to
> > > > >roles/credentials faster when specifying _update_interval_
> > variants
> > > of
> > > > >these configurations.
> > > > > 2. change default snitch from SimpleSnitch to
> > > > GossipingPropertyFileSnitch -
> > > > >we're anyway saying that SimpleSnitch is only appropriate for
>

Re: Update defaults for 4.0?

2020-01-24 Thread Joshua McKenzie
>
> I'm unable to create an epic in the project - not sure if that has to do
> with project permissions.  Could someone create an epic and link these
> tickets as subtasks?


Just realized I can no longer create epics anymore (or the "new" JIRA UI is
just so obtuse I can't figure it out. I give it 50/50 odds). Thinking this
may have been due to transition to LDAP.

Since I planned on experimenting with the whole "what does the confluent
testing page look like in an epic" today, I'll ping Nate once it's not
godawfully early in NZ about this. Or he'll read this email, either way. :)

On Thu, Jan 23, 2020 at 11:13 PM Jeremy Hanna 
wrote:

> I've previously created
> https://issues.apache.org/jira/browse/CASSANDRA-14902 for updating the
> compaction_throughput_in_mb default.  I created
> https://issues.apache.org/jira/browse/CASSANDRA-15521 for updating the
> num_tokens default, https://issues.apache.org/jira/browse/CASSANDRA-15522
> for updating the [roles|permissions|credentials]_validity_in_ms defaults,
> and https://issues.apache.org/jira/browse/CASSANDRA-15523 for updating the
> default snitch to GossipingPropertyFileSnitch.
> I'm unable to create an epic in the project - not sure if that has to do
> with project permissions.  Could someone create an epic and link these
> tickets as subtasks?
> Jon - would you mind creating the ticket around JVM defaults?  Are you
> thinking of the default GC and settings for a better out of the box
> experience?
> Thanks all,
> Jeremy
>
> On Fri, Jan 24, 2020 at 1:57 PM Jon Haddad  wrote:
>
> > Yes. please do. We should also update our JVM defaults.
> >
> > On Thu, Jan 23, 2020, 9:28 PM Jeremy Hanna 
> > wrote:
> >
> > > To summarize this thread, I think people are generally okay with
> updating
> > > certain defaults for 4.0 provided we make sure it doesn't unpleasantly
> > > surprise cluster operators.  I think with the num_tokens and
> > > compaction_throughput_in_mb we could go with a release note for the
> > reasons
> > > in my last email.  I also agree that we should consider bump
> > > roles_validity_in_ms, permissions_validity_in_ms, and
> > >credentials_validity_in_ms along with the default snitch (going to
> > GPFS
> > > as the default) as that gives people a DC aware default at least to
> > start.
> > >
> > > Is everyone okay if I create tickets for each of these and link them
> with
> > > an epic so that we can discuss them separately?
> > >
> > > Thanks,
> > >
> > > Jeremy
> > >
> > > On Thu, Jan 23, 2020 at 5:34 AM Alex Ott  wrote:
> > >
> > > > In addition to these, maybe we could consider to change other as
> well?
> > > > Like:
> > > >
> > > > 1. bump roles_validity_in_ms, permissions_validity_in_ms, and
> > > >credentials_validity_in_ms as well - maybe at least to a minute,
> or
> > > 2. I
> > > >have seen multiple times when authentication was failing under the
> > > heavy
> > > >load because queries to system tables were timing out - with these
> > > >defaults people may still have the possibility to get updates to
> > > >roles/credentials faster when specifying _update_interval_
> variants
> > of
> > > >these configurations.
> > > > 2. change default snitch from SimpleSnitch to
> > > GossipingPropertyFileSnitch -
> > > >    we're anyway saying that SimpleSnitch is only appropriate for
> > > >single-datacenter deployments, and for real production we need to
> > use
> > > >GossipingPropertyFileSnitch - why not to set it as default?
> > > >
> > > >
> > > > Jeremy Hanna  at "Wed, 22 Jan 2020 11:22:36 +1100" wrote:
> > > >  JH> I mentioned this in the contributor meeting as a topic to bring
> up
> > > on
> > > > the list - should we
> > > >  JH> take the opportunity to update defaults for Cassandra 4.0?
> > > >
> > > >  JH> The rationale is two-fold:
> > > >  JH> 1) There are best practices and tribal knowledge around certain
> > > > properties where people
> > > >  JH> just know to update those properties immediately as a starting
> > > > point.  If it's pretty much
> > > >  JH> a given that we set something as a starting point different than
> > the
> > > > current defaults, why
> > > >  JH> not make that the new default?
> > > >  JH> 2) We should align the defa

Re: Update defaults for 4.0?

2020-01-23 Thread Jeremy Hanna
I've previously created
https://issues.apache.org/jira/browse/CASSANDRA-14902 for updating the
compaction_throughput_in_mb default.  I created
https://issues.apache.org/jira/browse/CASSANDRA-15521 for updating the
num_tokens default, https://issues.apache.org/jira/browse/CASSANDRA-15522
for updating the [roles|permissions|credentials]_validity_in_ms defaults,
and https://issues.apache.org/jira/browse/CASSANDRA-15523 for updating the
default snitch to GossipingPropertyFileSnitch.
I'm unable to create an epic in the project - not sure if that has to do
with project permissions.  Could someone create an epic and link these
tickets as subtasks?
Jon - would you mind creating the ticket around JVM defaults?  Are you
thinking of the default GC and settings for a better out of the box
experience?
Thanks all,
Jeremy

On Fri, Jan 24, 2020 at 1:57 PM Jon Haddad  wrote:

> Yes. please do. We should also update our JVM defaults.
>
> On Thu, Jan 23, 2020, 9:28 PM Jeremy Hanna 
> wrote:
>
> > To summarize this thread, I think people are generally okay with updating
> > certain defaults for 4.0 provided we make sure it doesn't unpleasantly
> > surprise cluster operators.  I think with the num_tokens and
> > compaction_throughput_in_mb we could go with a release note for the
> reasons
> > in my last email.  I also agree that we should consider bump
> > roles_validity_in_ms, permissions_validity_in_ms, and
> >credentials_validity_in_ms along with the default snitch (going to
> GPFS
> > as the default) as that gives people a DC aware default at least to
> start.
> >
> > Is everyone okay if I create tickets for each of these and link them with
> > an epic so that we can discuss them separately?
> >
> > Thanks,
> >
> > Jeremy
> >
> > On Thu, Jan 23, 2020 at 5:34 AM Alex Ott  wrote:
> >
> > > In addition to these, maybe we could consider to change other as well?
> > > Like:
> > >
> > > 1. bump roles_validity_in_ms, permissions_validity_in_ms, and
> > >credentials_validity_in_ms as well - maybe at least to a minute, or
> > 2. I
> > >have seen multiple times when authentication was failing under the
> > heavy
> > >load because queries to system tables were timing out - with these
> > >defaults people may still have the possibility to get updates to
> > >roles/credentials faster when specifying _update_interval_ variants
> of
> > >these configurations.
> > > 2. change default snitch from SimpleSnitch to
> > GossipingPropertyFileSnitch -
> > >we're anyway saying that SimpleSnitch is only appropriate for
> > >single-datacenter deployments, and for real production we need to
> use
> > >GossipingPropertyFileSnitch - why not to set it as default?
> > >
> > >
> > > Jeremy Hanna  at "Wed, 22 Jan 2020 11:22:36 +1100" wrote:
> > >  JH> I mentioned this in the contributor meeting as a topic to bring up
> > on
> > > the list - should we
> > >  JH> take the opportunity to update defaults for Cassandra 4.0?
> > >
> > >  JH> The rationale is two-fold:
> > >  JH> 1) There are best practices and tribal knowledge around certain
> > > properties where people
> > >  JH> just know to update those properties immediately as a starting
> > > point.  If it's pretty much
> > >  JH> a given that we set something as a starting point different than
> the
> > > current defaults, why
> > >  JH> not make that the new default?
> > >  JH> 2) We should align the defaults with what we test with.  There may
> > be
> > > exceptions if we
> > >  JH> have one-off tests but on the whole, we should be testing with
> > > defaults.
> > >
> > >  JH> As a starting point, compaction throughput and number of vnodes
> seem
> > > like good candidates
> > >  JH> but it would be great to get feedback for any others.
> > >
> > >  JH> For compaction throughput (
> > > https://jira.apache.org/jira/browse/CASSANDRA-14902), I've made
> > >  JH> a basic case on the ticket to default to 64 just as a starting
> point
> > > because the decision
> > >  JH> for 16 was made when spinning disk was most common.  Hence most
> > > people I know change that
> > >  JH> and I think without too much bikeshedding, 64 is a reasonable
> > > starting point.  A case
> > >  JH> could be made that empirically the compaction throughput throttle
> > may
> > > have less effect
> > >  JH> than ma

Re: Update defaults for 4.0?

2020-01-23 Thread Jon Haddad
Yes. please do. We should also update our JVM defaults.

On Thu, Jan 23, 2020, 9:28 PM Jeremy Hanna 
wrote:

> To summarize this thread, I think people are generally okay with updating
> certain defaults for 4.0 provided we make sure it doesn't unpleasantly
> surprise cluster operators.  I think with the num_tokens and
> compaction_throughput_in_mb we could go with a release note for the reasons
> in my last email.  I also agree that we should consider bump
> roles_validity_in_ms, permissions_validity_in_ms, and
>credentials_validity_in_ms along with the default snitch (going to GPFS
> as the default) as that gives people a DC aware default at least to start.
>
> Is everyone okay if I create tickets for each of these and link them with
> an epic so that we can discuss them separately?
>
> Thanks,
>
> Jeremy
>
> On Thu, Jan 23, 2020 at 5:34 AM Alex Ott  wrote:
>
> > In addition to these, maybe we could consider to change other as well?
> > Like:
> >
> > 1. bump roles_validity_in_ms, permissions_validity_in_ms, and
> >credentials_validity_in_ms as well - maybe at least to a minute, or
> 2. I
> >have seen multiple times when authentication was failing under the
> heavy
> >load because queries to system tables were timing out - with these
> >defaults people may still have the possibility to get updates to
> >roles/credentials faster when specifying _update_interval_ variants of
> >these configurations.
> > 2. change default snitch from SimpleSnitch to
> GossipingPropertyFileSnitch -
> >we're anyway saying that SimpleSnitch is only appropriate for
> >single-datacenter deployments, and for real production we need to use
> >GossipingPropertyFileSnitch - why not to set it as default?
> >
> >
> > Jeremy Hanna  at "Wed, 22 Jan 2020 11:22:36 +1100" wrote:
> >  JH> I mentioned this in the contributor meeting as a topic to bring up
> on
> > the list - should we
> >  JH> take the opportunity to update defaults for Cassandra 4.0?
> >
> >  JH> The rationale is two-fold:
> >  JH> 1) There are best practices and tribal knowledge around certain
> > properties where people
> >  JH> just know to update those properties immediately as a starting
> > point.  If it's pretty much
> >  JH> a given that we set something as a starting point different than the
> > current defaults, why
> >  JH> not make that the new default?
> >  JH> 2) We should align the defaults with what we test with.  There may
> be
> > exceptions if we
> >  JH> have one-off tests but on the whole, we should be testing with
> > defaults.
> >
> >  JH> As a starting point, compaction throughput and number of vnodes seem
> > like good candidates
> >  JH> but it would be great to get feedback for any others.
> >
> >  JH> For compaction throughput (
> > https://jira.apache.org/jira/browse/CASSANDRA-14902), I've made
> >  JH> a basic case on the ticket to default to 64 just as a starting point
> > because the decision
> >  JH> for 16 was made when spinning disk was most common.  Hence most
> > people I know change that
> >  JH> and I think without too much bikeshedding, 64 is a reasonable
> > starting point.  A case
> >  JH> could be made that empirically the compaction throughput throttle
> may
> > have less effect
> >  JH> than many people think, but I still think an updated default would
> > make sense.
> >
> >  JH> For number of vnodes, Michael Shuler made the point in the
> discussion
> > that we already test
> >  JH> with 32, which is a far better number than the 256 default.  I know
> > many new users that
> >  JH> just leave the 256 default and then discover later that it's better
> > to go lower.  I think
> >  JH> 32 is a good balance.  One could go lower with the new algorithm but
> > I think 32 is much
> >  JH> better than 256 without being too skewed, and it's what we currently
> > test.
> >
> >  JH> Jeff brought up a good point that we want to be careful with
> defaults
> > since changing them
> >  JH> could come as an unpleasant surprise to people who don't explicitly
> > set them.  As a
> >  JH> general rule, we should always update release notes to clearly state
> > that a default has
> >  JH> changed.  For these two defaults in particular, I think it's safe.
> > For compaction
> >  JH> throughput I think a release not is sufficient in case they want to
> > modify it.  For number
> >  JH> of vnodes, it won't af

Re: Update defaults for 4.0?

2020-01-23 Thread Jeremy Hanna
To summarize this thread, I think people are generally okay with updating
certain defaults for 4.0 provided we make sure it doesn't unpleasantly
surprise cluster operators.  I think with the num_tokens and
compaction_throughput_in_mb we could go with a release note for the reasons
in my last email.  I also agree that we should consider bump
roles_validity_in_ms, permissions_validity_in_ms, and
   credentials_validity_in_ms along with the default snitch (going to GPFS
as the default) as that gives people a DC aware default at least to start.

Is everyone okay if I create tickets for each of these and link them with
an epic so that we can discuss them separately?

Thanks,

Jeremy

On Thu, Jan 23, 2020 at 5:34 AM Alex Ott  wrote:

> In addition to these, maybe we could consider to change other as well?
> Like:
>
> 1. bump roles_validity_in_ms, permissions_validity_in_ms, and
>credentials_validity_in_ms as well - maybe at least to a minute, or 2. I
>have seen multiple times when authentication was failing under the heavy
>load because queries to system tables were timing out - with these
>defaults people may still have the possibility to get updates to
>roles/credentials faster when specifying _update_interval_ variants of
>these configurations.
> 2. change default snitch from SimpleSnitch to GossipingPropertyFileSnitch -
>we're anyway saying that SimpleSnitch is only appropriate for
>single-datacenter deployments, and for real production we need to use
>GossipingPropertyFileSnitch - why not to set it as default?
>
>
> Jeremy Hanna  at "Wed, 22 Jan 2020 11:22:36 +1100" wrote:
>  JH> I mentioned this in the contributor meeting as a topic to bring up on
> the list - should we
>  JH> take the opportunity to update defaults for Cassandra 4.0?
>
>  JH> The rationale is two-fold:
>  JH> 1) There are best practices and tribal knowledge around certain
> properties where people
>  JH> just know to update those properties immediately as a starting
> point.  If it's pretty much
>  JH> a given that we set something as a starting point different than the
> current defaults, why
>  JH> not make that the new default?
>  JH> 2) We should align the defaults with what we test with.  There may be
> exceptions if we
>  JH> have one-off tests but on the whole, we should be testing with
> defaults.
>
>  JH> As a starting point, compaction throughput and number of vnodes seem
> like good candidates
>  JH> but it would be great to get feedback for any others.
>
>  JH> For compaction throughput (
> https://jira.apache.org/jira/browse/CASSANDRA-14902), I've made
>  JH> a basic case on the ticket to default to 64 just as a starting point
> because the decision
>  JH> for 16 was made when spinning disk was most common.  Hence most
> people I know change that
>  JH> and I think without too much bikeshedding, 64 is a reasonable
> starting point.  A case
>  JH> could be made that empirically the compaction throughput throttle may
> have less effect
>  JH> than many people think, but I still think an updated default would
> make sense.
>
>  JH> For number of vnodes, Michael Shuler made the point in the discussion
> that we already test
>  JH> with 32, which is a far better number than the 256 default.  I know
> many new users that
>  JH> just leave the 256 default and then discover later that it's better
> to go lower.  I think
>  JH> 32 is a good balance.  One could go lower with the new algorithm but
> I think 32 is much
>  JH> better than 256 without being too skewed, and it's what we currently
> test.
>
>  JH> Jeff brought up a good point that we want to be careful with defaults
> since changing them
>  JH> could come as an unpleasant surprise to people who don't explicitly
> set them.  As a
>  JH> general rule, we should always update release notes to clearly state
> that a default has
>  JH> changed.  For these two defaults in particular, I think it's safe.
> For compaction
>  JH> throughput I think a release not is sufficient in case they want to
> modify it.  For number
>  JH> of vnodes, it won't affect existing deployments with data - it would
> be for new clusters,
>  JH> which would honestly benefit from this anyway.
>
>  JH> The other point is whether it's too late to go into 4.0.  For these
> two changes, I think
>  JH> significant testing can still be done with these new defaults before
> release and I think
>  JH> testing more explicitly with 32 vnodes in particular will give people
> more confidence in
>  JH> the lower number with a wider array of testing (where we don't
> already use 32 explicitly).
>
>  JH> In summary, a

Re: Update defaults for 4.0?

2020-01-22 Thread Alex Ott
In addition to these, maybe we could consider to change other as well? Like:

1. bump roles_validity_in_ms, permissions_validity_in_ms, and
   credentials_validity_in_ms as well - maybe at least to a minute, or 2. I
   have seen multiple times when authentication was failing under the heavy
   load because queries to system tables were timing out - with these
   defaults people may still have the possibility to get updates to
   roles/credentials faster when specifying _update_interval_ variants of
   these configurations.
2. change default snitch from SimpleSnitch to GossipingPropertyFileSnitch -
   we're anyway saying that SimpleSnitch is only appropriate for
   single-datacenter deployments, and for real production we need to use
   GossipingPropertyFileSnitch - why not to set it as default?


Jeremy Hanna  at "Wed, 22 Jan 2020 11:22:36 +1100" wrote:
 JH> I mentioned this in the contributor meeting as a topic to bring up on the 
list - should we
 JH> take the opportunity to update defaults for Cassandra 4.0?

 JH> The rationale is two-fold:
 JH> 1) There are best practices and tribal knowledge around certain properties 
where people
 JH> just know to update those properties immediately as a starting point.  If 
it's pretty much
 JH> a given that we set something as a starting point different than the 
current defaults, why
 JH> not make that the new default?
 JH> 2) We should align the defaults with what we test with.  There may be 
exceptions if we
 JH> have one-off tests but on the whole, we should be testing with defaults.

 JH> As a starting point, compaction throughput and number of vnodes seem like 
good candidates
 JH> but it would be great to get feedback for any others.

 JH> For compaction throughput 
(https://jira.apache.org/jira/browse/CASSANDRA-14902), I've made
 JH> a basic case on the ticket to default to 64 just as a starting point 
because the decision
 JH> for 16 was made when spinning disk was most common.  Hence most people I 
know change that
 JH> and I think without too much bikeshedding, 64 is a reasonable starting 
point.  A case
 JH> could be made that empirically the compaction throughput throttle may have 
less effect
 JH> than many people think, but I still think an updated default would make 
sense.

 JH> For number of vnodes, Michael Shuler made the point in the discussion that 
we already test
 JH> with 32, which is a far better number than the 256 default.  I know many 
new users that
 JH> just leave the 256 default and then discover later that it's better to go 
lower.  I think
 JH> 32 is a good balance.  One could go lower with the new algorithm but I 
think 32 is much
 JH> better than 256 without being too skewed, and it's what we currently test.

 JH> Jeff brought up a good point that we want to be careful with defaults 
since changing them
 JH> could come as an unpleasant surprise to people who don't explicitly set 
them.  As a
 JH> general rule, we should always update release notes to clearly state that 
a default has
 JH> changed.  For these two defaults in particular, I think it's safe.  For 
compaction
 JH> throughput I think a release not is sufficient in case they want to modify 
it.  For number
 JH> of vnodes, it won't affect existing deployments with data - it would be 
for new clusters,
 JH> which would honestly benefit from this anyway.

 JH> The other point is whether it's too late to go into 4.0.  For these two 
changes, I think
 JH> significant testing can still be done with these new defaults before 
release and I think
 JH> testing more explicitly with 32 vnodes in particular will give people more 
confidence in
 JH> the lower number with a wider array of testing (where we don't already use 
32 explicitly).

 JH> In summary, are people okay with considering updating these defaults and 
possibly others
 JH> in the alpha stage of a new major release?  Are there other properties to 
consider?

 JH> Jeremy
 JH> -
 JH> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 JH> For additional commands, e-mail: dev-h...@cassandra.apache.org



-- 
With best wishes,Alex Ott
Principal Architect, DataStax
http://datastax.com/

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Update defaults for 4.0?

2020-01-21 Thread Jeremy Hanna
I think what he means is that if users have existing clusters with
num_tokens=256 (current default) and the default changes to 32, the node
won't ignore the value, it will fail to start with an error that you cannot
change from one num_tokens value to another:
ERROR [main] 2020-01-22 17:10:53,159 CassandraDaemon.java:759 - Fatal
configuration error
org.apache.cassandra.exceptions.ConfigurationException: Cannot change the
number of tokens from 256 to 32
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1035)
~[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:717)
~[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:651)
~[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388)
[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620)
[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:742)
[apache-cassandra-3.11.5.jar:3.11.5]

For that, he's correct.  I was thinking of when you change initial_token in
the yaml.  If you change the initial_token value(s) and there's already
data on disk, it will happily start and just tell you that is using the
saved tokens it already has, thank you very much.  So it doesn't fail to
start, it just ignores the change.  I had thought changing the num_tokens
would do something but I was wrong.

So there are two scenarios for changing the num_tokens default:
1) people need to be made aware of the change because if they try to
upgrade their existing nodes to 4.0+ and they don't change from the
defaults, then it will fail to start based on this change.
2) as Jeff mentions, if they add a new node into their cluster with the new
default and they were previously using 256, then the new node will claim
just 1/8th of the data as the other nodes.

For 1, I would hope especially for all of the new changes in 4.0 that
people would read the release notes and do their due diligence to say "I'm
going to diff my config differences from the version it corresponds to and
then modify the new config as seems appropriate."  If not and it's
different, they'll find out quickly because it will fail fast.  I'm not as
worried about this one.

For 2, it's a good point and again, hopefully people do the same due
diligence when adding new nodes to their clusters on the new version so
that they aren't surprised by the data density.  In a practical sense,
presumably the ops person has already upgraded their cluster to 4.0 before
adding a new node running 4.0 to their cluster.  So if they weren't aware
of the change in the default in the yaml file, they would get that error
mentioned previously and then know for new nodes added to their cluster.
Similarly, num_tokens is explicit by default and set to 256.  So it
shouldn't be a matter of having it commented out in the yaml and it being
whatever the code determines as the default for the common case.  In that
sense, I'm happy that it's not something that changes underneath you
because you don't set it.

So while I agree that the consequence can be severe when adding a new node,
the cluster operator will already be aware of the change when they upgrade
their existing nodes even if they didn't read the release notes or do their
config due diligence or just simply missed it, which happens as well - it's
a big upgrade.  So if that's all there is, I don't think the change will be
disruptive outside a surprise if they hadn't noticed the change where it
fails fast.

On Wed, Jan 22, 2020 at 5:02 PM Jeff Jirsa  wrote:

> On Tue, Jan 21, 2020 at 7:41 PM Jonathan Koppenhofer 
> wrote:
>
> > If someone isn't explicitly setting vnodes, and the default changes, it
> > will vary from the number of assigned tokens for existing clusters,
> right?
> > Won't this cause the node to fail to start?
> >
>
> Nope. You can have 32 tokens on some instances and 256 in other instances
> in the same dc/cluster. No error. The hosts with 256 tokens will just have
> 8x as much data as the hosts with 32 tokens. And that's why changing
> defaults is hard.
>
>
>
> >
> > I am in favor of changing these defaults, but should provide very clear
> > guidance on vnodes (unless I am wrong).
> >
> > I'm sure there are others that would be safe to change. I'll review our
> > defaults we typically set and report back tomorrow.
> >
> > On Tue, Jan 21, 2020, 7:22 PM Jeremy Hanna 
> > wrote:
> >
> > > I mentioned this in the contributor meeting as a topic to bring up on
> the
> > > list - should we take the opportunity to update defaults for Cassandra
> > 4.0?
> > >
> > > The rationale is two-f

Re: Update defaults for 4.0?

2020-01-21 Thread Jeff Jirsa
On Tue, Jan 21, 2020 at 7:41 PM Jonathan Koppenhofer 
wrote:

> If someone isn't explicitly setting vnodes, and the default changes, it
> will vary from the number of assigned tokens for existing clusters, right?
> Won't this cause the node to fail to start?
>

Nope. You can have 32 tokens on some instances and 256 in other instances
in the same dc/cluster. No error. The hosts with 256 tokens will just have
8x as much data as the hosts with 32 tokens. And that's why changing
defaults is hard.



>
> I am in favor of changing these defaults, but should provide very clear
> guidance on vnodes (unless I am wrong).
>
> I'm sure there are others that would be safe to change. I'll review our
> defaults we typically set and report back tomorrow.
>
> On Tue, Jan 21, 2020, 7:22 PM Jeremy Hanna 
> wrote:
>
> > I mentioned this in the contributor meeting as a topic to bring up on the
> > list - should we take the opportunity to update defaults for Cassandra
> 4.0?
> >
> > The rationale is two-fold:
> > 1) There are best practices and tribal knowledge around certain
> properties
> > where people just know to update those properties immediately as a
> starting
> > point.  If it's pretty much a given that we set something as a starting
> > point different than the current defaults, why not make that the new
> > default?
> > 2) We should align the defaults with what we test with.  There may be
> > exceptions if we have one-off tests but on the whole, we should be
> testing
> > with defaults.
> >
> > As a starting point, compaction throughput and number of vnodes seem like
> > good candidates but it would be great to get feedback for any others.
> >
> > For compaction throughput (
> > https://jira.apache.org/jira/browse/CASSANDRA-14902), I've made a basic
> > case on the ticket to default to 64 just as a starting point because the
> > decision for 16 was made when spinning disk was most common.  Hence most
> > people I know change that and I think without too much bikeshedding, 64
> is
> > a reasonable starting point.  A case could be made that empirically the
> > compaction throughput throttle may have less effect than many people
> think,
> > but I still think an updated default would make sense.
> >
> > For number of vnodes, Michael Shuler made the point in the discussion
> that
> > we already test with 32, which is a far better number than the 256
> > default.  I know many new users that just leave the 256 default and then
> > discover later that it's better to go lower.  I think 32 is a good
> > balance.  One could go lower with the new algorithm but I think 32 is
> much
> > better than 256 without being too skewed, and it's what we currently
> test.
> >
> > Jeff brought up a good point that we want to be careful with defaults
> > since changing them could come as an unpleasant surprise to people who
> > don't explicitly set them.  As a general rule, we should always update
> > release notes to clearly state that a default has changed.  For these two
> > defaults in particular, I think it's safe.  For compaction throughput I
> > think a release not is sufficient in case they want to modify it.  For
> > number of vnodes, it won't affect existing deployments with data - it
> would
> > be for new clusters, which would honestly benefit from this anyway.
> >
> > The other point is whether it's too late to go into 4.0.  For these two
> > changes, I think significant testing can still be done with these new
> > defaults before release and I think testing more explicitly with 32
> vnodes
> > in particular will give people more confidence in the lower number with a
> > wider array of testing (where we don't already use 32 explicitly).
> >
> > In summary, are people okay with considering updating these defaults and
> > possibly others in the alpha stage of a new major release?  Are there
> other
> > properties to consider?
> >
> > Jeremy
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>


Re: Update defaults for 4.0?

2020-01-21 Thread Jonathan Koppenhofer
If someone isn't explicitly setting vnodes, and the default changes, it
will vary from the number of assigned tokens for existing clusters, right?
Won't this cause the node to fail to start?

I am in favor of changing these defaults, but should provide very clear
guidance on vnodes (unless I am wrong).

I'm sure there are others that would be safe to change. I'll review our
defaults we typically set and report back tomorrow.

On Tue, Jan 21, 2020, 7:22 PM Jeremy Hanna 
wrote:

> I mentioned this in the contributor meeting as a topic to bring up on the
> list - should we take the opportunity to update defaults for Cassandra 4.0?
>
> The rationale is two-fold:
> 1) There are best practices and tribal knowledge around certain properties
> where people just know to update those properties immediately as a starting
> point.  If it's pretty much a given that we set something as a starting
> point different than the current defaults, why not make that the new
> default?
> 2) We should align the defaults with what we test with.  There may be
> exceptions if we have one-off tests but on the whole, we should be testing
> with defaults.
>
> As a starting point, compaction throughput and number of vnodes seem like
> good candidates but it would be great to get feedback for any others.
>
> For compaction throughput (
> https://jira.apache.org/jira/browse/CASSANDRA-14902), I've made a basic
> case on the ticket to default to 64 just as a starting point because the
> decision for 16 was made when spinning disk was most common.  Hence most
> people I know change that and I think without too much bikeshedding, 64 is
> a reasonable starting point.  A case could be made that empirically the
> compaction throughput throttle may have less effect than many people think,
> but I still think an updated default would make sense.
>
> For number of vnodes, Michael Shuler made the point in the discussion that
> we already test with 32, which is a far better number than the 256
> default.  I know many new users that just leave the 256 default and then
> discover later that it's better to go lower.  I think 32 is a good
> balance.  One could go lower with the new algorithm but I think 32 is much
> better than 256 without being too skewed, and it's what we currently test.
>
> Jeff brought up a good point that we want to be careful with defaults
> since changing them could come as an unpleasant surprise to people who
> don't explicitly set them.  As a general rule, we should always update
> release notes to clearly state that a default has changed.  For these two
> defaults in particular, I think it's safe.  For compaction throughput I
> think a release not is sufficient in case they want to modify it.  For
> number of vnodes, it won't affect existing deployments with data - it would
> be for new clusters, which would honestly benefit from this anyway.
>
> The other point is whether it's too late to go into 4.0.  For these two
> changes, I think significant testing can still be done with these new
> defaults before release and I think testing more explicitly with 32 vnodes
> in particular will give people more confidence in the lower number with a
> wider array of testing (where we don't already use 32 explicitly).
>
> In summary, are people okay with considering updating these defaults and
> possibly others in the alpha stage of a new major release?  Are there other
> properties to consider?
>
> Jeremy
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Update defaults for 4.0?

2020-01-21 Thread Jeremy Hanna
I mentioned this in the contributor meeting as a topic to bring up on the list 
- should we take the opportunity to update defaults for Cassandra 4.0?

The rationale is two-fold:
1) There are best practices and tribal knowledge around certain properties 
where people just know to update those properties immediately as a starting 
point.  If it's pretty much a given that we set something as a starting point 
different than the current defaults, why not make that the new default?
2) We should align the defaults with what we test with.  There may be 
exceptions if we have one-off tests but on the whole, we should be testing with 
defaults.

As a starting point, compaction throughput and number of vnodes seem like good 
candidates but it would be great to get feedback for any others.

For compaction throughput 
(https://jira.apache.org/jira/browse/CASSANDRA-14902), I've made a basic case 
on the ticket to default to 64 just as a starting point because the decision 
for 16 was made when spinning disk was most common.  Hence most people I know 
change that and I think without too much bikeshedding, 64 is a reasonable 
starting point.  A case could be made that empirically the compaction 
throughput throttle may have less effect than many people think, but I still 
think an updated default would make sense.

For number of vnodes, Michael Shuler made the point in the discussion that we 
already test with 32, which is a far better number than the 256 default.  I 
know many new users that just leave the 256 default and then discover later 
that it's better to go lower.  I think 32 is a good balance.  One could go 
lower with the new algorithm but I think 32 is much better than 256 without 
being too skewed, and it's what we currently test.

Jeff brought up a good point that we want to be careful with defaults since 
changing them could come as an unpleasant surprise to people who don't 
explicitly set them.  As a general rule, we should always update release notes 
to clearly state that a default has changed.  For these two defaults in 
particular, I think it's safe.  For compaction throughput I think a release not 
is sufficient in case they want to modify it.  For number of vnodes, it won't 
affect existing deployments with data - it would be for new clusters, which 
would honestly benefit from this anyway.

The other point is whether it's too late to go into 4.0.  For these two 
changes, I think significant testing can still be done with these new defaults 
before release and I think testing more explicitly with 32 vnodes in particular 
will give people more confidence in the lower number with a wider array of 
testing (where we don't already use 32 explicitly).

In summary, are people okay with considering updating these defaults and 
possibly others in the alpha stage of a new major release?  Are there other 
properties to consider?

Jeremy
-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org