Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Oleksandr Shulgin
On Thu, Feb 22, 2018 at 8:11 PM, Jon Haddad  wrote:

> Great question.  Unfortunately, our OSS docs lack a step by step process
> on how to add a DC, I’ve created a JIRA to do that:
> https://issues.apache.org/jira/browse/CASSANDRA-14254
>

Thanks.  I'd love to contribute as well, just need some questions to be
clarified, maybe even on this thread.

The datastax docs are pretty good for this though: https://docs.datastax.
> com/en/cassandra/latest/cassandra/operations/opsAddDCToCluster.html
>

I don't understand why should this be relevant if the question was about
creating multi-DC cluster *from scratch*.  There is no need to care about
auto_boostrap (as discussed above) or use nodetool rebuild.

The only detail is that you might want to use NetworkTopologyStrategy for
system keyspaces as well, which is pre-requisite when using rebuild, but
not required when creating from scratch.

Regarding token allocation, it was random prior to 3.0.  In 3.0 and up, it
> is calculated a little more intelligently.
>

But not by default.  To use the new token allocation you need specify
allocate_tokens_for_keyspace in cassandra.yaml (or in JVM_OPTS).

The thing is, if it's a new cluster, there are no user keyspaces yet.  So
you'll have to work around by starting at least one node (which
incidentally gets random tokens), then creating your data keyspace, and
only then continue to add more nodes with the setting
allocate_tokens_for_keyspace=mydata_ks.

This is a bit unfortunate, since the only information the token allocator
actually needs from the keyspace is the replication factors (it doesn't
care about name, replication strategy or actual load on the existing nodes).

The folks at DataStax realized that soon enough, so in DSE the setting is
now called 'allocate_tokens_for_local_replication_factor' and the other one
is deprecated:
https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__allocate_tokens_for_local_replication_factor

I believe it would make sense to Apache Cassandra adopt this change, but I
don't see a JIRA for that.  Should I open one?

in 3.11.2, which was just released, CASSANDRA-13080 was backported which
> will help out when you add your second DC.  If you go this route, you can
> drop your token count down to 16 and get all the benefits with no drawbacks.
>

This is important, because if you would like to use it on 3.0, it will not
work unless you make sure that auto_boostrap is *not* set to false.  This
is not critical when creating DCs from scratch, but requires you to hop
through quite some loops if you already have some data and you want to add
a new DC.  Full details in this email thread:

https://lists.apache.org/thread.html/396f2d20397c36b9cff88a0c2c5523154d420ece24a4dafc9fde3d1f@%3Cuser.cassandra.apache.org%3E

Cheers,
--
Alex


Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Oleksandr Shulgin
On Thu, Feb 22, 2018 at 5:42 PM, Jonathan Haddad  wrote:

> If it's a new cluster, there's no need to disable auto_bootstrap.
>

True.


> That setting prevents the first node in the second DC from being a replica
> for all the data in the first DC.
>

Not sure where did you get that from?  Whether a node in a new DC would
become a replica for any data or not is controlled by RFs of the relevant
keyspaces, and not by the auto_bootstrap setting.

Seeds don't bootstrap by the way, changing the setting on those nodes
> doesn't do anything.
>

Yes, seeds don't bootstrap.  But why?  I don't think I ever seen a
comprehensive explanation of this.

Thanks,
--
Alex


Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Oleksandr Shulgin
On Thu, Feb 22, 2018 at 5:36 PM, Jean Carlo 
wrote:

> Hello
>
> I would like to clarify this,
>
> In order to initialize  a  cassandra multi dc cluster, without data. If I
> follow the documentation datastax
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/
> initializeMultipleDS.html
>
> It says
>
>- auto_bootstrap: false (Add this setting *only* when initializing a
>clean node with no data.)
>
> But I dont understand the way this works regarding to the auto_bootstraps.
>
> If all the machines make their own tokens in a ramdon way using
> murmur3partitioner and vnodes , it isn't probable that two nodes will have
> the tokens in common ?
>

The key is not to start all nodes at the same time.  The token allocation
is random by default, but every node checks with the rest of the cluster to
see if a token has not been already taken, then it generates another random
one if needed.

It is recommended(where?) to wait for ~2 minutes between starting nodes in
a new cluster/DC.

> It is not better to bootstrap first the seeds with auto_bootstrap: false
> and then the rest of the nodes with auto_bootstrap: true ?
>
The auto_bootstrap setting has no influence on token allocation (unless you
want to use the new token allocation algorithm on version 3.0).  It only
allows you to skip streaming from the rest of the nodes, but since there is
no data in a brand new cluster, there is no practical difference.

Regards,
--
Alex


Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jon Haddad
In 2.1 token allocation is random, and the distribution doesn’t work as nicely. 
 Everything else is the same.

Do not use 3.1.  Under any circumstances.  Guessing that’s a typo but I just 
want to be sure.

Jon

> On Feb 22, 2018, at 1:45 PM, Jean Carlo  wrote:
> 
> Hi Jonathan
> 
> Yes I do think this is a good idea about the doc. 
> 
> About the clarification, this is still true for the 2.1 ? We are planing 
> upgrading to the 3.1 but not in the next months. We will stick for few more 
> months on the 2.1. 
> 
> I believe this is true also for the 2.1 but I would like to confirm I am 
> missing something 
> 
> 
> Saludos
> 
> Jean Carlo
> 
> "The best way to predict the future is to invent it" Alan Kay
> 
> On Thu, Feb 22, 2018 at 10:28 PM, Kenneth Brotman 
> > wrote:
> I will heavy lift the docs for a while, do my Slender Cassandra reference 
> project and then I’ll try to find one or two areas where I can contribute 
> code to get going on that.  I have read the section on contributing before I 
> start.  I’ll self-assign the JIRA right now.
> 
>  
> 
> Kenneth Brotman
> 
>  
> 
> From: Jonathan Haddad [mailto:j...@jonhaddad.com ] 
> Sent: Thursday, February 22, 2018 1:21 PM
> To: user@cassandra.apache.org 
> Subject: Re: Initializing a multiple node cluster (multiple datacenters)
> 
>  
> 
> Kenneth, if you want to take the JIRA, feel free to self-assign it to 
> yourself and put up a pull request or patch, and I'll review.  I'd be very 
> happy to get more people involved in the docs.
> 
>  
> 
> On Thu, Feb 22, 2018 at 12:56 PM Kenneth Brotman 
> > wrote:
> 
> That information would have saved me time too.  Thanks for making a JIRA for 
> it Jon.  Perhaps this is a good JIRA for me to begin with.
> 
>  
> 
> Kenneth Brotman 
> 
>  
> 
> From: Jon Haddad [mailto:jonathan.had...@gmail.com 
> ] On Behalf Of Jon Haddad
> Sent: Thursday, February 22, 2018 11:11 AM
> To: user
> Subject: Re: Initializing a multiple node cluster (multiple datacenters)
> 
>  
> 
> Great question.  Unfortunately, our OSS docs lack a step by step process on 
> how to add a DC, I’ve created a JIRA to do that: 
> https://issues.apache.org/jira/browse/CASSANDRA-14254 
> 
>  
> 
> The datastax docs are pretty good for this though: 
> https://docs.datastax.com/en/cassandra/latest/cassandra/operations/opsAddDCToCluster.html
>  
> 
>  
> 
> Regarding token allocation, it was random prior to 3.0.  In 3.0 and up, it is 
> calculated a little more intelligently.  in 3.11.2, which was just released, 
> CASSANDRA-13080 was backported which will help out when you add your second 
> DC.  If you go this route, you can drop your token count down to 16 and get 
> all the benefits with no drawbacks.  
> 
>  
> 
> At this point I would go straight to 3.11.2 and skip 3.0 as there were quite 
> a few improvements that make it worthwhile along the way, in my opinion.  We 
> work with several customers that are running 3.11 and are pretty happy with 
> it 
> 
>  
> 
> Yes, if there’s no data, you can initialize the cluster with auto_boostrap: 
> true.  Be sure to change any key spaces using simple strategy to NTS first, 
> and replica them to the new DC as well. 
> 
>  
> 
> Jon
> 
>  
> 
>  
> 
> On Feb 22, 2018, at 10:53 AM, Jean Carlo  > wrote:
> 
>  
> 
> Hi jonathan
> 
>  
> 
> Thank you for the answer. Do you know where to look to understand why this 
> works. As i understood all the node then will chose ramdoms tokens. How can i 
> assure the correctness of the ring?
> 
>  
> 
> So as you said. Under the condition that there.is  no data 
> in the cluster. I can initialize a cluster multi dc without disable auto 
> bootstrap.?
> 
>  
> 
> On Feb 22, 2018 5:43 PM, "Jonathan Haddad"  > wrote:
> 
> If it's a new cluster, there's no need to disable auto_bootstrap.  That 
> setting prevents the first node in the second DC from being a replica for all 
> the data in the first DC.  If there's no data in the first DC, you can skip a 
> couple steps and just leave it on.
> 
>  
> 
> Leave it on, and enjoy your afternoon.
> 
>  
> 
> Seeds don't bootstrap by the way, changing the setting on those nodes doesn't 
> do anything.
> 
>  
> 
> On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo  > wrote:
> 
> Hello
> 
> I would like to clarify this,
> 
>  
> 
> In order to initialize  a  cassandra multi dc cluster, without data. If I  
> follow the documentation datastax
> 
> 
> 
> 

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jean Carlo
Hi Jonathan

Yes I do think this is a good idea about the doc.

About the clarification, this is still true for the 2.1 ? We are planing
upgrading to the 3.1 but not in the next months. We will stick for few more
months on the 2.1.

I believe this is true also for the 2.1 but I would like to confirm I am
missing something


Saludos

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay

On Thu, Feb 22, 2018 at 10:28 PM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

> I will heavy lift the docs for a while, do my Slender Cassandra reference
> project and then I’ll try to find one or two areas where I can contribute
> code to get going on that.  I have read the section on contributing before
> I start.  I’ll self-assign the JIRA right now.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
> *Sent:* Thursday, February 22, 2018 1:21 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Initializing a multiple node cluster (multiple datacenters)
>
>
>
> Kenneth, if you want to take the JIRA, feel free to self-assign it to
> yourself and put up a pull request or patch, and I'll review.  I'd be very
> happy to get more people involved in the docs.
>
>
>
> On Thu, Feb 22, 2018 at 12:56 PM Kenneth Brotman <
> kenbrot...@yahoo.com.invalid> wrote:
>
> That information would have saved me time too.  Thanks for making a JIRA
> for it Jon.  Perhaps this is a good JIRA for me to begin with.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Jon Haddad [mailto:jonathan.had...@gmail.com] *On Behalf Of *Jon
> Haddad
> *Sent:* Thursday, February 22, 2018 11:11 AM
> *To:* user
> *Subject:* Re: Initializing a multiple node cluster (multiple datacenters)
>
>
>
> Great question.  Unfortunately, our OSS docs lack a step by step process
> on how to add a DC, I’ve created a JIRA to do that: https://issues.apache.
> org/jira/browse/CASSANDRA-14254
>
>
>
> The datastax docs are pretty good for this though: https://docs.datastax.
> com/en/cassandra/latest/cassandra/operations/opsAddDCToCluster.html
>
>
>
> Regarding token allocation, it was random prior to 3.0.  In 3.0 and up, it
> is calculated a little more intelligently.  in 3.11.2, which was just
> released, CASSANDRA-13080 was backported which will help out when you add
> your second DC.  If you go this route, you can drop your token count down
> to 16 and get all the benefits with no drawbacks.
>
>
>
> At this point I would go straight to 3.11.2 and skip 3.0 as there were
> quite a few improvements that make it worthwhile along the way, in my
> opinion.  We work with several customers that are running 3.11 and are
> pretty happy with it
>
>
>
> Yes, if there’s no data, you can initialize the cluster with
> auto_boostrap: true.  Be sure to change any key spaces using simple
> strategy to NTS first, and replica them to the new DC as well.
>
>
>
> Jon
>
>
>
>
>
> On Feb 22, 2018, at 10:53 AM, Jean Carlo 
> wrote:
>
>
>
> Hi jonathan
>
>
>
> Thank you for the answer. Do you know where to look to understand why this
> works. As i understood all the node then will chose ramdoms tokens. How can
> i assure the correctness of the ring?
>
>
>
> So as you said. Under the condition that there.is no data in the cluster.
> I can initialize a cluster multi dc without disable auto bootstrap.?
>
>
>
> On Feb 22, 2018 5:43 PM, "Jonathan Haddad"  wrote:
>
> If it's a new cluster, there's no need to disable auto_bootstrap.  That
> setting prevents the first node in the second DC from being a replica for
> all the data in the first DC.  If there's no data in the first DC, you can
> skip a couple steps and just leave it on.
>
>
>
> Leave it on, and enjoy your afternoon.
>
>
>
> Seeds don't bootstrap by the way, changing the setting on those nodes
> doesn't do anything.
>
>
>
> On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo  > wrote:
>
> Hello
>
> I would like to clarify this,
>
>
>
> In order to initialize  a  cassandra multi dc cluster, without data. If I
> follow the documentation datastax
>
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/
> initializeMultipleDS.html
>
> It says
>
>- auto_bootstrap: false (Add this setting *only* when initializing a
>clean node with no data.)
>
> But I dont understand the way this works regarding to the auto_bootstraps.
>
> If all the machines make their own tokens in a ramdon way using
> murmur3partitioner and vnodes , it isn't probable that two nodes will have
> the tokens in common ?
>
> It is not better to bootstrap first the seeds with auto_bootstrap: false
> and then the rest of the nodes with auto_bootstrap: true ?
>
>
>
> Thank you for the help
>
>
>
> Jean Carlo
>
>
> "The best way to predict the future is to invent it" Alan Kay
>
>
>
>
>
>


RE: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Kenneth Brotman
I will heavy lift the docs for a while, do my Slender Cassandra reference 
project and then I’ll try to find one or two areas where I can contribute code 
to get going on that.  I have read the section on contributing before I start.  
I’ll self-assign the JIRA right now.

 

Kenneth Brotman

 

From: Jonathan Haddad [mailto:j...@jonhaddad.com] 
Sent: Thursday, February 22, 2018 1:21 PM
To: user@cassandra.apache.org
Subject: Re: Initializing a multiple node cluster (multiple datacenters)

 

Kenneth, if you want to take the JIRA, feel free to self-assign it to yourself 
and put up a pull request or patch, and I'll review.  I'd be very happy to get 
more people involved in the docs.

 

On Thu, Feb 22, 2018 at 12:56 PM Kenneth Brotman  
wrote:

That information would have saved me time too.  Thanks for making a JIRA for it 
Jon.  Perhaps this is a good JIRA for me to begin with.

 

Kenneth Brotman  

 

From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad
Sent: Thursday, February 22, 2018 11:11 AM
To: user
Subject: Re: Initializing a multiple node cluster (multiple datacenters)

 

Great question.  Unfortunately, our OSS docs lack a step by step process on how 
to add a DC, I’ve created a JIRA to do that: 
https://issues.apache.org/jira/browse/CASSANDRA-14254

 

The datastax docs are pretty good for this though: 
https://docs.datastax.com/en/cassandra/latest/cassandra/operations/opsAddDCToCluster.html

 

Regarding token allocation, it was random prior to 3.0.  In 3.0 and up, it is 
calculated a little more intelligently.  in 3.11.2, which was just released, 
CASSANDRA-13080 was backported which will help out when you add your second DC. 
 If you go this route, you can drop your token count down to 16 and get all the 
benefits with no drawbacks.  

 

At this point I would go straight to 3.11.2 and skip 3.0 as there were quite a 
few improvements that make it worthwhile along the way, in my opinion.  We work 
with several customers that are running 3.11 and are pretty happy with it 

 

Yes, if there’s no data, you can initialize the cluster with auto_boostrap: 
true.  Be sure to change any key spaces using simple strategy to NTS first, and 
replica them to the new DC as well. 

 

Jon

 

 

On Feb 22, 2018, at 10:53 AM, Jean Carlo  wrote:

 

Hi jonathan

 

Thank you for the answer. Do you know where to look to understand why this 
works. As i understood all the node then will chose ramdoms tokens. How can i 
assure the correctness of the ring?

 

So as you said. Under the condition that there.is   no data 
in the cluster. I can initialize a cluster multi dc without disable auto 
bootstrap.?

 

On Feb 22, 2018 5:43 PM, "Jonathan Haddad"  wrote:

If it's a new cluster, there's no need to disable auto_bootstrap.  That setting 
prevents the first node in the second DC from being a replica for all the data 
in the first DC.  If there's no data in the first DC, you can skip a couple 
steps and just leave it on.

 

Leave it on, and enjoy your afternoon.

 

Seeds don't bootstrap by the way, changing the setting on those nodes doesn't 
do anything.

 

On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo  > wrote:

Hello

I would like to clarify this,

 

In order to initialize  a  cassandra multi dc cluster, without data. If I  
follow the documentation datastax




https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html

It says

*   auto_bootstrap: false (Add this setting only when initializing a clean 
node with no data.) 

But I dont understand the way this works regarding to the auto_bootstraps. 

If all the machines make their own tokens in a ramdon way using 
murmur3partitioner and vnodes , it isn't probable that two nodes will have the 
tokens in common ?

It is not better to bootstrap first the seeds with auto_bootstrap: false and 
then the rest of the nodes with auto_bootstrap: true ?

 

Thank you for the help

 

Jean Carlo


"The best way to predict the future is to invent it" Alan Kay

 

 



Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jonathan Haddad
Kenneth, if you want to take the JIRA, feel free to self-assign it to
yourself and put up a pull request or patch, and I'll review.  I'd be very
happy to get more people involved in the docs.

On Thu, Feb 22, 2018 at 12:56 PM Kenneth Brotman
 wrote:

> That information would have saved me time too.  Thanks for making a JIRA
> for it Jon.  Perhaps this is a good JIRA for me to begin with.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Jon Haddad [mailto:jonathan.had...@gmail.com] *On Behalf Of *Jon
> Haddad
> *Sent:* Thursday, February 22, 2018 11:11 AM
> *To:* user
> *Subject:* Re: Initializing a multiple node cluster (multiple datacenters)
>
>
>
> Great question.  Unfortunately, our OSS docs lack a step by step process
> on how to add a DC, I’ve created a JIRA to do that:
> https://issues.apache.org/jira/browse/CASSANDRA-14254
>
>
>
> The datastax docs are pretty good for this though:
> https://docs.datastax.com/en/cassandra/latest/cassandra/operations/opsAddDCToCluster.html
>
>
>
> Regarding token allocation, it was random prior to 3.0.  In 3.0 and up, it
> is calculated a little more intelligently.  in 3.11.2, which was just
> released, CASSANDRA-13080 was backported which will help out when you add
> your second DC.  If you go this route, you can drop your token count down
> to 16 and get all the benefits with no drawbacks.
>
>
>
> At this point I would go straight to 3.11.2 and skip 3.0 as there were
> quite a few improvements that make it worthwhile along the way, in my
> opinion.  We work with several customers that are running 3.11 and are
> pretty happy with it.
>
>
>
> Yes, if there’s no data, you can initialize the cluster with
> auto_boostrap: true.  Be sure to change any key spaces using simple
> strategy to NTS first, and replica them to the new DC as well.
>
>
>
> Jon
>
>
>
>
>
> On Feb 22, 2018, at 10:53 AM, Jean Carlo 
> wrote:
>
>
>
> Hi jonathan
>
>
>
> Thank you for the answer. Do you know where to look to understand why this
> works. As i understood all the node then will chose ramdoms tokens. How can
> i assure the correctness of the ring?
>
>
>
> So as you said. Under the condition that there.is no data in the cluster.
> I can initialize a cluster multi dc without disable auto bootstrap.?
>
>
>
> On Feb 22, 2018 5:43 PM, "Jonathan Haddad"  wrote:
>
> If it's a new cluster, there's no need to disable auto_bootstrap.  That
> setting prevents the first node in the second DC from being a replica for
> all the data in the first DC.  If there's no data in the first DC, you can
> skip a couple steps and just leave it on.
>
>
>
> Leave it on, and enjoy your afternoon.
>
>
>
> Seeds don't bootstrap by the way, changing the setting on those nodes
> doesn't do anything.
>
>
>
> On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo 
> wrote:
>
> Hello
>
> I would like to clarify this,
>
>
>
> In order to initialize  a  cassandra multi dc cluster, without data. If I
> follow the documentation datastax
>
>
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html
>
> It says
>
>- auto_bootstrap: false (Add this setting *only* when initializing a
>clean node with no data.)
>
> But I dont understand the way this works regarding to the auto_bootstraps.
>
> If all the machines make their own tokens in a ramdon way using
> murmur3partitioner and vnodes , it isn't probable that two nodes will have
> the tokens in common ?
>
> It is not better to bootstrap first the seeds with auto_bootstrap: false
> and then the rest of the nodes with auto_bootstrap: true ?
>
>
>
> Thank you for the help
>
>
>
> Jean Carlo
>
>
> "The best way to predict the future is to invent it" Alan Kay
>
>
>
>
>


RE: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Kenneth Brotman
That information would have saved me time too.  Thanks for making a JIRA for it 
Jon.  Perhaps this is a good JIRA for me to begin with.

 

Kenneth Brotman  

 

From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad
Sent: Thursday, February 22, 2018 11:11 AM
To: user
Subject: Re: Initializing a multiple node cluster (multiple datacenters)

 

Great question.  Unfortunately, our OSS docs lack a step by step process on how 
to add a DC, I’ve created a JIRA to do that: 
https://issues.apache.org/jira/browse/CASSANDRA-14254

 

The datastax docs are pretty good for this though: 
https://docs.datastax.com/en/cassandra/latest/cassandra/operations/opsAddDCToCluster.html

 

Regarding token allocation, it was random prior to 3.0.  In 3.0 and up, it is 
calculated a little more intelligently.  in 3.11.2, which was just released, 
CASSANDRA-13080 was backported which will help out when you add your second DC. 
 If you go this route, you can drop your token count down to 16 and get all the 
benefits with no drawbacks.  

 

At this point I would go straight to 3.11.2 and skip 3.0 as there were quite a 
few improvements that make it worthwhile along the way, in my opinion.  We work 
with several customers that are running 3.11 and are pretty happy with it. 

 

Yes, if there’s no data, you can initialize the cluster with auto_boostrap: 
true.  Be sure to change any key spaces using simple strategy to NTS first, and 
replica them to the new DC as well. 

 

Jon

 





On Feb 22, 2018, at 10:53 AM, Jean Carlo  wrote:

 

Hi jonathan

 

Thank you for the answer. Do you know where to look to understand why this 
works. As i understood all the node then will chose ramdoms tokens. How can i 
assure the correctness of the ring?

 

So as you said. Under the condition that there.is   no data 
in the cluster. I can initialize a cluster multi dc without disable auto 
bootstrap.?

 

On Feb 22, 2018 5:43 PM, "Jonathan Haddad"  wrote:

If it's a new cluster, there's no need to disable auto_bootstrap.  That setting 
prevents the first node in the second DC from being a replica for all the data 
in the first DC.  If there's no data in the first DC, you can skip a couple 
steps and just leave it on.

 

Leave it on, and enjoy your afternoon.

 

Seeds don't bootstrap by the way, changing the setting on those nodes doesn't 
do anything.

 

On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo  wrote:

Hello

I would like to clarify this,

 

In order to initialize  a  cassandra multi dc cluster, without data. If I  
follow the documentation datastax




https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html



It says

*   auto_bootstrap: false (Add this setting only when initializing a clean 
node with no data.) 

But I dont understand the way this works regarding to the auto_bootstraps. 

If all the machines make their own tokens in a ramdon way using 
murmur3partitioner and vnodes , it isn't probable that two nodes will have the 
tokens in common ?

It is not better to bootstrap first the seeds with auto_bootstrap: false and 
then the rest of the nodes with auto_bootstrap: true ?

 

Thank you for the help

 

Jean Carlo


"The best way to predict the future is to invent it" Alan Kay

 

 



Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jon Haddad
Great question.  Unfortunately, our OSS docs lack a step by step process on how 
to add a DC, I’ve created a JIRA to do that: 
https://issues.apache.org/jira/browse/CASSANDRA-14254 


The datastax docs are pretty good for this though: 
https://docs.datastax.com/en/cassandra/latest/cassandra/operations/opsAddDCToCluster.html
 


Regarding token allocation, it was random prior to 3.0.  In 3.0 and up, it is 
calculated a little more intelligently.  in 3.11.2, which was just released, 
CASSANDRA-13080 was backported which will help out when you add your second DC. 
 If you go this route, you can drop your token count down to 16 and get all the 
benefits with no drawbacks.  

At this point I would go straight to 3.11.2 and skip 3.0 as there were quite a 
few improvements that make it worthwhile along the way, in my opinion.  We work 
with several customers that are running 3.11 and are pretty happy with it. 

Yes, if there’s no data, you can initialize the cluster with auto_boostrap: 
true.  Be sure to change any key spaces using simple strategy to NTS first, and 
replica them to the new DC as well. 

Jon


> On Feb 22, 2018, at 10:53 AM, Jean Carlo  wrote:
> 
> Hi jonathan
> 
> Thank you for the answer. Do you know where to look to understand why this 
> works. As i understood all the node then will chose ramdoms tokens. How can i 
> assure the correctness of the ring?
> 
> So as you said. Under the condition that there.is  no data 
> in the cluster. I can initialize a cluster multi dc without disable auto 
> bootstrap.?
> 
> On Feb 22, 2018 5:43 PM, "Jonathan Haddad"  > wrote:
> If it's a new cluster, there's no need to disable auto_bootstrap.  That 
> setting prevents the first node in the second DC from being a replica for all 
> the data in the first DC.  If there's no data in the first DC, you can skip a 
> couple steps and just leave it on.
> 
> Leave it on, and enjoy your afternoon.
> 
> Seeds don't bootstrap by the way, changing the setting on those nodes doesn't 
> do anything.
> 
> On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo  > wrote:
> Hello
> 
> I would like to clarify this,
> 
> In order to initialize  a  cassandra multi dc cluster, without data. If I  
> follow the documentation datastax
> 
> https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html
>  
> 
> 
> 
> It says
> auto_bootstrap: false (Add this setting only when initializing a clean node 
> with no data.)
> But I dont understand the way this works regarding to the auto_bootstraps. 
> 
> If all the machines make their own tokens in a ramdon way using 
> murmur3partitioner and vnodes , it isn't probable that two nodes will have 
> the tokens in common ?
> It is not better to bootstrap first the seeds with auto_bootstrap: false and 
> then the rest of the nodes with auto_bootstrap: true ?
> 
> 
> Thank you for the help
> 
> Jean Carlo
> 
> "The best way to predict the future is to invent it" Alan Kay
> 



Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jean Carlo
Hi jonathan

Thank you for the answer. Do you know where to look to understand why this
works. As i understood all the node then will chose ramdoms tokens. How can
i assure the correctness of the ring?

So as you said. Under the condition that there.is no data in the cluster. I
can initialize a cluster multi dc without disable auto bootstrap.?

On Feb 22, 2018 5:43 PM, "Jonathan Haddad"  wrote:

If it's a new cluster, there's no need to disable auto_bootstrap.  That
setting prevents the first node in the second DC from being a replica for
all the data in the first DC.  If there's no data in the first DC, you can
skip a couple steps and just leave it on.

Leave it on, and enjoy your afternoon.

Seeds don't bootstrap by the way, changing the setting on those nodes
doesn't do anything.

On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo 
wrote:

> Hello
>
> I would like to clarify this,
>
> In order to initialize  a  cassandra multi dc cluster, without data. If I
> follow the documentation datastax
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/
> initializeMultipleDS.html
>
>
> It says
>
>- auto_bootstrap: false (Add this setting *only* when initializing a
>clean node with no data.)
>
> But I dont understand the way this works regarding to the auto_bootstraps.
>
> If all the machines make their own tokens in a ramdon way using
> murmur3partitioner and vnodes , it isn't probable that two nodes will have
> the tokens in common ?
>
> It is not better to bootstrap first the seeds with auto_bootstrap: false
> and then the rest of the nodes with auto_bootstrap: true ?
>
> Thank you for the help
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>


Re: Secondary Indexes C* 3.0

2018-02-22 Thread DuyHai Doan
Read this: http://www.doanduyhai.com/blog/?p=13191




On Thu, Feb 22, 2018 at 6:44 PM, Akash Gangil  wrote:

> To provide more context, I was going through this
> https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html#
> useWhenIndex__highCardCol
>
> On Thu, Feb 22, 2018 at 9:35 AM, Akash Gangil 
> wrote:
>
>> Hi,
>>
>> I was wondering if there are recommendations around the cardinality of
>> secondary indexes.
>>
>> As I understand an index on a column with many distinct values will be
>> inefficient. Is it because the index would only direct me to the specfic
>> sstable, but then it sequentially searches for the target records? So a
>> wide range of the index could lead to a lot of ssltable options to traverse?
>>
>> Though what's unclear is what the recommended (or benchmarked?) limit, is
>> it the index must have 100 distinct values, or can it have upto 1000 or
>> 5 distinct values?
>>
>> thanks!
>>
>>
>>
>>
>> --
>> Akash
>>
>
>
>
> --
> Akash
>


Re: Secondary Indexes C* 3.0

2018-02-22 Thread Akash Gangil
To provide more context, I was going through this
https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html#useWhenIndex__highCardCol

On Thu, Feb 22, 2018 at 9:35 AM, Akash Gangil  wrote:

> Hi,
>
> I was wondering if there are recommendations around the cardinality of
> secondary indexes.
>
> As I understand an index on a column with many distinct values will be
> inefficient. Is it because the index would only direct me to the specfic
> sstable, but then it sequentially searches for the target records? So a
> wide range of the index could lead to a lot of ssltable options to traverse?
>
> Though what's unclear is what the recommended (or benchmarked?) limit, is
> it the index must have 100 distinct values, or can it have upto 1000 or
> 5 distinct values?
>
> thanks!
>
>
>
>
> --
> Akash
>



-- 
Akash


Secondary Indexes C* 3.0

2018-02-22 Thread Akash Gangil
Hi,

I was wondering if there are recommendations around the cardinality of
secondary indexes.

As I understand an index on a column with many distinct values will be
inefficient. Is it because the index would only direct me to the specfic
sstable, but then it sequentially searches for the target records? So a
wide range of the index could lead to a lot of ssltable options to traverse?

Though what's unclear is what the recommended (or benchmarked?) limit, is
it the index must have 100 distinct values, or can it have upto 1000 or
5 distinct values?

thanks!




-- 
Akash


Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jonathan Haddad
If it's a new cluster, there's no need to disable auto_bootstrap.  That
setting prevents the first node in the second DC from being a replica for
all the data in the first DC.  If there's no data in the first DC, you can
skip a couple steps and just leave it on.

Leave it on, and enjoy your afternoon.

Seeds don't bootstrap by the way, changing the setting on those nodes
doesn't do anything.

On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo 
wrote:

> Hello
>
> I would like to clarify this,
>
> In order to initialize  a  cassandra multi dc cluster, without data. If I
> follow the documentation datastax
>
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html
>
>
> It says
>
>- auto_bootstrap: false (Add this setting *only* when initializing a
>clean node with no data.)
>
> But I dont understand the way this works regarding to the auto_bootstraps.
>
> If all the machines make their own tokens in a ramdon way using
> murmur3partitioner and vnodes , it isn't probable that two nodes will have
> the tokens in common ?
>
> It is not better to bootstrap first the seeds with auto_bootstrap: false
> and then the rest of the nodes with auto_bootstrap: true ?
>
> Thank you for the help
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>


Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jean Carlo
Hello

I would like to clarify this,

In order to initialize  a  cassandra multi dc cluster, without data. If I
follow the documentation datastax

https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html


It says

   - auto_bootstrap: false (Add this setting *only* when initializing a
   clean node with no data.)

But I dont understand the way this works regarding to the auto_bootstraps.

If all the machines make their own tokens in a ramdon way using
murmur3partitioner and vnodes , it isn't probable that two nodes will have
the tokens in common ?

It is not better to bootstrap first the seeds with auto_bootstrap: false
and then the rest of the nodes with auto_bootstrap: true ?

Thank you for the help

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


Re: Cluster Repairs 'nodetool repair -pr' Cause Severe IncreaseinRead Latency After Shrinking Cluster

2018-02-22 Thread Carl Mueller
Your partition sizes aren't ridiculous... kinda big cells if there are 4
cells and 12 MB partitions, but still I don't think that is ludicrous.

Whelp, I'm out of ideas from my "pay grade". Honestly, with AZ/racks you
should have theoretically might have been able to take the nodes off
simultaneously, but (Disclaimer) I've never done that.

?Rolling Restart? <-- definitely indicates I have no ideas :-)

On Thu, Feb 22, 2018 at 8:15 AM, Fd Habash  wrote:

> One more observation …
>
>
>
> When we compare read latencies between non-prod (where nodes were removed)
> to prod clusters, even though the node load as measure by size of /data dir
> is similar, yet the read latencies are 5 times slower in the downsized
> non-prod cluster.
>
>
>
> The only difference we see is that prod reads from 4 sstables whereas
> non-prod reads from 5 as cfhistograms.
>
>
>
> Non-prod /data size
>
> -
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  454G  432G  52% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  439G  446G  50% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  368G  518G  42% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  431G  455G  49% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  463G  423G  53% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  406G  479G  46% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  419G  466G  48% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
>
>
> Prod /data size
>
> 
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  352G  534G  40% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  423G  462G  48% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  431G  454G  49% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  442G  443G  50% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  454G  431G  52% /data
>
>
>
>
>
> Cfhistograms: comparing prod to non-prod
>
> -
>
>
>
> Non-prod
>
> --
>
> 08:21:38Percentile  SSTables Write Latency  Read
> LatencyPartition SizeCell Count
>
> 08:21:38  (micros)
> (micros)   (bytes)
>
> 08:21:3850% 1.00 24.60
> 4055.27 11864 4
>
> 08:21:3875% 2.00 35.43
> 14530.76 17084 4
>
> 08:21:3895% 4.00126.93
> 89970.66 35425 4
>
> 08:21:3898% 5.00219.34
> 155469.30 73457 4
>
> 08:21:3899% 5.00219.34
>186563.16105778 4
>
> 08:21:38Min 0.00  5.72
> 17.0987 3
>
> 08:21:38Max 7.00  20924.30
> 1386179.89  14530764 4
>
>
>
> Prod
>
> ---
>
> 07:41:42Percentile  SSTables Write Latency  Read
> LatencyPartition SizeCell Count
>
> 07:41:42  (micros)
> (micros)   (bytes)
>
> 07:41:4250% 1.00 24.60
> 2346.80 11864 4
>
> 07:41:4275% 2.00 29.52
> 4866.32 17084 4
>
> 07:41:4295% 3.00 73.46
> 14530.76 29521 4
>
> 07:41:4298% 4.00182.79
> 25109.16 61214 4
>
> 07:41:4299% 4.00182.79
> 36157.19 88148 4
>
> 07:41:42Min 0.00  9.89
> 20.5087 0
>
> 07:41:42Max 5.00219.34
> 155469.30  12108970 4
>
>
>
>
>
> 
> Thank you
>
>
>
> *From: *Fd Habash 
> *Sent: *Thursday, February 22, 2018 9:00 AM
> *To: *user@cassandra.apache.org
> *Subject: *RE: Cluster Repairs 'nodetool repair -pr' Cause Severe
> IncreaseinRead Latency After Shrinking Cluster
>
>
>
>
>
> “ data was allowed to fully rebalance/repair/drain before the next node
> was taken off?”
>
> --
>
> Judging by the messages, the decomm was healthy. As an example
>
>
>
>   StorageService.java:3425 - Announcing that I have left the ring for
> 3ms
>
> …
>
> INFO  [RMI TCP Connection(4)-127.0.0.1] 2016-01-07 06:00:52,662
> 

RE: Cluster Repairs 'nodetool repair -pr' Cause Severe IncreaseinRead Latency After Shrinking Cluster

2018-02-22 Thread Fd Habash
One more observation …

When we compare read latencies between non-prod (where nodes were removed) to 
prod clusters, even though the node load as measure by size of /data dir is 
similar, yet the read latencies are 5 times slower in the downsized non-prod 
cluster.

The only difference we see is that prod reads from 4 sstables whereas non-prod 
reads from 5 as cfhistograms. 

Non-prod /data size
-
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  454G  432G  52% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  439G  446G  50% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  368G  518G  42% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  431G  455G  49% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  463G  423G  53% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  406G  479G  46% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  419G  466G  48% /data
Filesystem  Size  Used Avail Use% Mounted on

Prod /data size

Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  352G  534G  40% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  423G  462G  48% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  431G  454G  49% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  442G  443G  50% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  454G  431G  52% /data


Cfhistograms: comparing prod to non-prod
-

Non-prod
--
08:21:38Percentile  SSTables Write Latency  Read 
LatencyPartition SizeCell Count
08:21:38  (micros)  
(micros)   (bytes)  
08:21:3850% 1.00 24.60   
4055.27 11864 4
08:21:3875% 2.00 35.43  
14530.76 17084 4
08:21:3895% 4.00126.93  
89970.66 35425 4
08:21:3898% 5.00219.34 
155469.30 73457 4
08:21:3899% 5.00219.34 
186563.16105778 4
08:21:38Min 0.00  5.72 
17.0987 3
08:21:38Max 7.00  20924.30
1386179.89  14530764 4

Prod
--- 
07:41:42Percentile  SSTables Write Latency  Read 
LatencyPartition SizeCell Count
07:41:42  (micros)  
(micros)   (bytes)  
07:41:4250% 1.00 24.60   
2346.80 11864 4
07:41:4275% 2.00 29.52   
4866.32 17084 4
07:41:4295% 3.00 73.46  
14530.76 29521 4
07:41:4298% 4.00182.79  
25109.16 61214 4
07:41:4299% 4.00182.79  
36157.19 88148 4
07:41:42Min 0.00  9.89 
20.5087 0
07:41:42Max 5.00219.34 
155469.30  12108970 4



Thank you

From: Fd Habash
Sent: Thursday, February 22, 2018 9:00 AM
To: user@cassandra.apache.org
Subject: RE: Cluster Repairs 'nodetool repair -pr' Cause Severe IncreaseinRead 
Latency After Shrinking Cluster


“ data was allowed to fully rebalance/repair/drain before the next node was 
taken off?”
--
Judging by the messages, the decomm was healthy. As an example 

  StorageService.java:3425 - Announcing that I have left the ring for 3ms   
…
INFO  [RMI TCP Connection(4)-127.0.0.1] 2016-01-07 06:00:52,662 
StorageService.java:1191 – DECOMMISSIONED

I do not believe repairs were run after each node removal. I’ll double-check. 

I’m not sure what you mean by ‘rebalance’? How do you check if a node is 
balanced? Load/size of data dir? 

As for the drain, there was no need to drain and I believe it is not something 
you do as part of decomm’ing a node. 

did you take 1 off per rack/AZ?
--
We removed 3 nodes, one from each AZ in sequence

These are some 

RE: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase inRead Latency After Shrinking Cluster

2018-02-22 Thread Fd Habash

“ data was allowed to fully rebalance/repair/drain before the next node was 
taken off?”
--
Judging by the messages, the decomm was healthy. As an example 

  StorageService.java:3425 - Announcing that I have left the ring for 3ms   
 
…
INFO  [RMI TCP Connection(4)-127.0.0.1] 2016-01-07 06:00:52,662 
StorageService.java:1191 – DECOMMISSIONED

I do not believe repairs were run after each node removal. I’ll double-check. 

I’m not sure what you mean by ‘rebalance’? How do you check if a node is 
balanced? Load/size of data dir? 

As for the drain, there was no need to drain and I believe it is not something 
you do as part of decomm’ing a node. 

did you take 1 off per rack/AZ?
--
We removed 3 nodes, one from each AZ in sequence

These are some of the cfhistogram metrics. Read latencies are high after the 
removal of the nodes
--
You can see reads of 186ms are at the 99th% from 5 sstables. There are awfully 
high numbers given that these metrics measure C* storage layer read 
performance. 

Does this mean removing the nodes undersized the cluster? 

key_space_01/cf_01 histograms
Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count
  (micros)  (micros)   (bytes)  

50% 1.00 24.60   4055.27 11864  
   4
75% 2.00 35.43  14530.76 17084  
   4
95% 4.00126.93  89970.66 35425  
   4
98% 5.00219.34 155469.30 73457  
   4
99% 5.00219.34 186563.16105778  
   4
Min 0.00  5.72 17.0987  
   3
Max 7.00  20924.301386179.89  14530764  
   4

key_space_01/cf_01 histograms
Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count
  (micros)  (micros)   (bytes)  

50% 1.00 29.52   4055.27 11864  
   4
75% 2.00 42.51  10090.81 17084  
   4
95% 4.00152.32  52066.35 35425  
   4
98% 4.00219.34  89970.66 73457  
   4
99% 5.00219.34 155469.30 88148  
   4
Min 0.00  9.89 24.6087  
   0
Max 6.00   1955.67 557074.61  14530764  
   4


Thank you

From: Carl Mueller
Sent: Wednesday, February 21, 2018 4:33 PM
To: user@cassandra.apache.org
Subject: Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase inRead 
Latency After Shrinking Cluster

Hm nodetool decommision performs the streamout of the replicated data, and you 
said that was apparently without error...

But if you dropped three nodes in one AZ/rack on a five node with RF3, then we 
have a missing RF factor unless NetworkTopologyStrategy fails over to another 
AZ. But that would also entail cross-az streaming and queries and repair.

On Wed, Feb 21, 2018 at 3:30 PM, Carl Mueller  
wrote:
sorry for the idiot questions... 

data was allowed to fully rebalance/repair/drain before the next node was taken 
off?

did you take 1 off per rack/AZ?


On Wed, Feb 21, 2018 at 12:29 PM, Fred Habash  wrote:
One node at a time 

On Feb 21, 2018 10:23 AM, "Carl Mueller"  wrote:
What is your replication factor? 
Single datacenter, three availability zones, is that right?
You removed one node at a time or three at once?

On Wed, Feb 21, 2018 at 10:20 AM, Fd Habash  wrote:
We have had a 15 node cluster across three zones and cluster repairs using 
‘nodetool repair -pr’ took about 3 hours to finish. Lately, we shrunk the 
cluster to 12. Since then, same repair job has taken up to 12 hours to finish 
and most times, it never does. 
 
More importantly, at some point during the repair cycle, we see read latencies 
jumping to 1-2 seconds and applications immediately notice the impact.
 
stream_throughput_outbound_megabits_per_sec is set at 200 and 
compaction_throughput_mb_per_sec at 64. The /data dir on the nodes is around 
~500GB at 44% usage. 
 
When shrinking the cluster, the ‘nodetool decommision’ was eventless. It 
completed successfully with no issues.
 
What could possibly cause repairs to cause this impact following cluster 

Re: Tracing cql code being run through the drive

2018-02-22 Thread Lucas Benevides
I don't know if it you help you, but when the debug log is turned on, it
displays the slow queries.
To consider slow, the parameter  read_request_timeout_in_ms is considered.
Maybe if you decrease it, you can monitor your queries, with $tail -F
debug.log

Just an idea, I've never made it. Surely it must be made in a development
environment.

Lucas B. Dias

2018-02-22 8:27 GMT-03:00 Jonathan Baynes :

> Hi Community,
>
>
>
> Can anyone help me understand what class’s id need to set logging on , if
> I want to capture the cql commands being run through the driver, similar to
> how profiler (MSSQL) would work? I need to see what’s being run, and if the
> query is actually getting to cassandra?
>
>
>
> Has anyone had any experience in doing this?
>
>
>
> Thanks in advance.
>
>
>
> J
>
>
>
> *Jonathan Baynes*
>
> DBA
> Tradeweb Europe Limited
>
> Moor Place  •  1 Fore Street Avenue
> 
>   •  London EC2Y 9DT
> 
> P +44 (0)20 77760988 <+44%2020%207776%200988>  •  F +44 (0)20 7776 3201
> <+44%2020%207776%203201>  •  M +44 (0)7884111546 <+44%207884%20111546>
>
> jonathan.bay...@tradeweb.com
>
>
>
> [image: cid:image001.jpg@01CD26AD.4165F110] 
> follow us:  *[image: cid:image002.jpg@01CD26AD.4165F110]*
>    [image:
> cid:image003.jpg@01CD26AD.4165F110] 
>
> —
>
> A leading marketplace  for
> electronic fixed income, derivatives and ETF trading
>
>
>
> 
>
> This e-mail may contain confidential and/or privileged information. If you
> are not the intended recipient (or have received this e-mail in error)
> please notify the sender immediately and destroy it. Any unauthorized
> copying, disclosure or distribution of the material in this e-mail is
> strictly forbidden. Tradeweb reserves the right to monitor all e-mail
> communications through its networks. If you do not wish to receive
> marketing emails about our products / services, please let us know by
> contacting us, either by email at contac...@tradeweb.com or by writing to
> us at the registered office of Tradeweb in the UK, which is: Tradeweb
> Europe Limited (company number 3912826), 1 Fore Street Avenue London EC2Y
> 9DT
> .
> To see our privacy policy, visit our website @ www.tradeweb.com.
>


Tracing cql code being run through the drive

2018-02-22 Thread Jonathan Baynes
Hi Community,

Can anyone help me understand what class's id need to set logging on , if I 
want to capture the cql commands being run through the driver, similar to how 
profiler (MSSQL) would work? I need to see what's being run, and if the query 
is actually getting to cassandra?

Has anyone had any experience in doing this?

Thanks in advance.

J

Jonathan Baynes
DBA
Tradeweb Europe Limited
Moor Place  *  1 Fore Street Avenue  *  London EC2Y 9DT
P +44 (0)20 77760988  *  F +44 (0)20 7776 3201  *  M +44 (0)7884111546
jonathan.bay...@tradeweb.com

[cid:image001.jpg@01CD26AD.4165F110]   follow us:  
[cid:image002.jpg@01CD26AD.4165F110] 

[cid:image003.jpg@01CD26AD.4165F110] 
-
A leading marketplace for electronic 
fixed income, derivatives and ETF trading




This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and destroy it. Any unauthorized copying, 
disclosure or distribution of the material in this e-mail is strictly 
forbidden. Tradeweb reserves the right to monitor all e-mail communications 
through its networks. If you do not wish to receive marketing emails about our 
products / services, please let us know by contacting us, either by email at 
contac...@tradeweb.com or by writing to us at the registered office of Tradeweb 
in the UK, which is: Tradeweb Europe Limited (company number 3912826), 1 Fore 
Street Avenue London EC2Y 9DT. To see our privacy policy, visit our website @ 
www.tradeweb.com.


Re: Cassandra Needs to Grow Up by Version Five!

2018-02-22 Thread Oleksandr Shulgin
On Thu, Feb 22, 2018 at 9:50 AM, Eric Plowe  wrote:

> Cassandra, hard to use? I disagree completely. With that said, there are
> definitely deficiencies in certain parts of the documentation, but nothing
> that is a show stopper.


True, there are no show-stoppers from the docs side, it's just all those
little things--they add up.

We’ve been using Cassandra since the sub 1.0 days and have had nothing but
> great things to say about it.
>
> With that said, its an open source project; you get from it what you’re
> willing to put in. If you just expect something that installs, asks a
> couple of questions and you’re off to the races, Cassandra might not be for
> you.
>
> If you’re willing to put in the time to understand how Cassandra works,
> and how it fits into your use case, and if it is the right fit for your use
> case, you’ll be more than happy, I bet.
>

We are using Cassandra since v2.1 for more than 2 years now, and installing
was never a problem.  It does work and allows us to sleep well, which
cannot be underappreciated.

The problems begin when you need to do operations.  You never know what
exactly will happen when you start a certain repair command or how the
streaming will happen in case of bootstrap/rebuild, and the docs just
aren't detailed enough, so you have go the trial and error path most of the
time.

Regards,
--
Alex


Re: Cassandra Needs to Grow Up by Version Five!

2018-02-22 Thread Sylvain Lebresne
>
> I have to disagree with people here and point out that just creating
> JIRA's and (trying to) have discussions about these issues will not lead to
> change in any reasonable timeframe, because everyone who could do the work
> has an endless list of bigger fish to fry. I strongly encourage you to get
> involved and write some code, or pay someone to do it, because to put it
> bluntly, it's *very* unlikely your JIRA's will get actioned unless you
> contribute significantly to them yourself.
>

Though I don't truly disagree with the overall point that getting into code
is the surest way to get something you care about see progress, I'd love
for this to not be understood as "we don't care about your idea unless you
bring code". There has been tons of JIRA tickets in the past suggesting
improvements where some contributor said "you know what, that's a good
idea" and implemented it. I've certainly see it happen numerous times and
trust I did it a lot as well (and sure, it happens dis-proportionally more
for small improvement than for lets-rewrite-the-whole-database ones, for
obvious reasons hopefully).

So if you have a relatively concrete idea for an improvement, I'd say,
please, share it. Don't get me wrong though, please do your homework first
and take a few minutes googling/JIRA searching to see if that hasn't been
discussed first; don't assume your time is more valuable than that of other
contributors. It's rude to assume so (I'd say in general, but even more so
because it's a free-as-in-beer software).

That said, and to paraphrase what others have said, one should always come
to this with a few understandings:
- For all that people may like your idea and have the time to help it get
in, there is not guarantee here. And yes, more often than not, contributors
already have a list of things they want to fix and only a finite amount of
time for contributions, so the bar for your idea to make it in some other
contributor "list" is probably high. And remember that behavior science
strongly suggests that you thinking your ideas are obviously the most
important ones likely involves a fair amount of bias. That's why
contributing the code yourself, if possible, definitively helps a lot.
- A distributed database is not exactly a simple software. In particular,
Cassandra make the choice to be fully distributed, which is a clear
trade-off: it gives it very interesting properties (scalability, fault
tolerance, ...) almost for free, but it makes some things quite a bit more
challenging. My point being, some things may look like easy problem to
solve on the surface, but are in fact more complex than they appear (which
in turns means solving them take much more time that it seems, and we get
back to contribution time/efforts not be infinite). So it's imo a good idea
to seek first to understand why things are a certain way rather than assume
than contributors don't care.
- Cassandra is not perfect, no software is, but don't assume contributors
are not aware of the weaknesses. We are for the most part. So if those
weaknesses are still there, it's generally (there is of course exceptions)
due to some combination of 1) a lack of time, 2) the difficulties of
solving those weaknesses (without creating new, worth ones) and 3) some
actually well though trade-off (we accept that weakness as the price for
other strengths). As such, if you come simply pointing deficiencies, you
may feel like you are pointing things nobody knows, but chances are, you
aren't. You're probably just reminding contributors how frustrating it is
they don't have time to solve everything. Pointing deficiencies is ok, but
unless you take the time to offer some constructive steps to improve as
well, it's often useless to be honest.

--
Sylvain


RE: Cassandra Needs to Grow Up by Version Five!

2018-02-22 Thread Jacques-Henri Berthemet
Hi Kenneth,

As a Cassandra user I value usability, but since it's a database I value 
consistency and performance even more. If you want usability and documentation 
you can use Datastax DSE, after all that's where they add value on top of 
Cassandra. Since Datastax actually paid dev to work Cassandra internals, it's 
understandable that they kept some part (usability) for their own product. We 
all notice that when you google for some CQL commands you'll always end up to 
Datastax site, it would be great if that was not the case but it would take a 
lot of time.

Also, as a manager you're not supposed to fight with devs but to allocate 
tasks/time. If you have to choose between enhancing documentation and fixing 
this bad race condition that corrupts data, I hope you'd choose the later.

As for filling Jiras, if you create one like "I want a UI to setup TLS" it 
would be the kind of Jira nobody would implement, it takes a lot of time, 
touches security and may not be that useful in the end.

Last point on usability for Cassandra, as an end user it's very difficult to 
see the progress on it, but since I'm using Cassandra internals for my custom 
secondary index I can tell you that there was a huge rework between Cassandra 
2.2 and 3.x, PartitionIterators are a very elegant solution and is really 
helpful in my case, great work guys :)
--
Jacques-Henri Berthemet

-Original Message-
From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Wednesday, February 21, 2018 11:54 PM
To: d...@cassandra.apache.org
Cc: user@cassandra.apache.org
Subject: RE: Cassandra Needs to Grow Up by Version Five!

Hi Akash,

I get the part about outside work which is why in replying to Jeff Jirsa I was 
suggesting the big companies could justify taking it on easy enough and you 
know actually pay the people who would be working at it so those people could 
have a life.

The part I don't get is the aversion to usability.  Isn't that what you think 
about when you are coding?  "Am I making this thing I'm building easy to use?"  
If you were programming for me, we would be constantly talking about what we 
are building and how we can make things easier for users.  If I had to fight 
with a developer, architect or engineer about usability all the time, they 
would be gone and quick.  How do approach programming if you aren't trying to 
make things easy.

Kenneth Brotman

-Original Message-
From: Akash Gangil [mailto:akashg1...@gmail.com]
Sent: Wednesday, February 21, 2018 2:24 PM
To: d...@cassandra.apache.org
Cc: user@cassandra.apache.org
Subject: Re: Cassandra Needs to Grow Up by Version Five!

I would second Jon in the arguments he made. Contributing outside work is 
draining and really requires a lot of commitment. If someone requires features 
around usability etc, just pay for it, period.

On Wed, Feb 21, 2018 at 2:20 PM, Kenneth Brotman < 
kenbrot...@yahoo.com.invalid> wrote:

> Jon,
>
> Very sorry that you don't see the value of the time I'm taking for this.
> I don't have demands; I do have a stern warning and I'm right Jon.  
> Please be very careful not to mischaracterized my words Jon.
>
> You suggest I put things in JIRA's, then seem to suggest that I'd be 
> lucky if anyone looked at it and did anything. That's what I figured too.
>
> I don't appreciate the hostility.  You will understand more fully in 
> the next post where I'm coming from.  Try to keep the conversation civilized.
> I'm trying or at least so you understand I think what I'm doing is 
> saving your gig and mine.  I really like a lot of people is this group.
>
> I've come to a preliminary assessment on things.  Soon the cloud will 
> clear or I'll be gone.  Don't worry.  I'm a very peaceful person and 
> like you I am driven by real important projects that I feel compelled 
> to work on for the good of others.  I don't have time for people to 
> hand hold a database and I can't get stuck with my projects on the wrong 
> stuff.
>
> Kenneth Brotman
>
>
> -Original Message-
> From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon 
> Haddad
> Sent: Wednesday, February 21, 2018 12:44 PM
> To: user@cassandra.apache.org
> Cc: d...@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
> Ken,
>
> Maybe it’s not clear how open source projects work, so let me try to 
> explain.  There’s a bunch of us who either get paid by someone or 
> volunteer on our free time.  The folks that get paid, (yay!) usually 
> take direction on what the priorities are, and work on projects that 
> directly affect our jobs.  That means that someone needs to care 
> enough about the features you want to work on them, if you’re not going to do 
> it yourself.
>
> Now as others have said already, please put your list of demands in 
> JIRA, if someone is interested, they will work on it.  You may need to 
> contribute a little more than you’ve done already, be prepared to get 
> involved if you actually want to to 

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-22 Thread Eric Plowe
Cassandra, hard to use? I disagree completely. With that said, there are
definitely deficiencies in certain parts of the documentation, but nothing
that is a show stopper. We’ve been using Cassandra since the sub 1.0 days
and have had nothing but great things to say about it.

With that said, its an open source project; you get from it what you’re
willing to put in. If you just expect something that installs, asks a
couple of questions and you’re off to the races, Cassandra might not be for
you.

If you’re willing to put in the time to understand how Cassandra works, and
how it fits into your use case, and if it is the right fit for your use
case, you’ll be more than happy, I bet.

If there are things that are lacking, that you can’t find a work around
for, submit a PR! That’s the beauty of open source projects.

On Thu, Feb 22, 2018 at 2:55 AM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Wed, Feb 21, 2018 at 7:54 PM, Durity, Sean R <
> sean_r_dur...@homedepot.com> wrote:
>
>>
>>
>> However, I think the shots at Cassandra are generally unfair. When I
>> started working with it, the DataStax documentation was some of the best
>> documentation I had seen on any project, especially an open source one.
>>
>
> Oh, don't get me started on documentation, especially the DataStax one.  I
> come from Postgres.  In comparison, Cassandra documentation is mostly
> non-existent (and this is just a way to avoid listing other uncomfortable
> epithets).
>
> Not sure if I would be able to submit patches to improve that, however,
> since most of the time it would require me to already know the answer to my
> questions when the doc is incomplete.
>
> The move from DataStax to Apache.org for docs is actually good, IMO, since
> the docs were maintained very poorly and there was no real leverage to
> influence that.
>
> Cheers,
> --
> Alex
>
>