Re: Seed nodes and bootstrap (was: Re: Initializing a multiple node cluster (multiple datacenters))

2018-02-26 Thread Oleksandr Shulgin
On Mon, Feb 26, 2018 at 7:05 PM, Jeff Jirsa  wrote:

>
> I'll happily click the re-open button (you could have, too), but I'm not
> sure what the 'right' fix is. Feel free to move discussion to 5836.
>

Thanks, Jeff.   Somehow, I don't see any control elements to change issue
status, even though I'm logged in, so I assume only project members / devs
can do that.

--
Alex


Re: Seed nodes and bootstrap (was: Re: Initializing a multiple node cluster (multiple datacenters))

2018-02-26 Thread Jeff Jirsa
That ticket was before I was really active contributing, but I tend to
agree with your assessment: clearly there's pain point there, and we can do
better than the status quo.

The problem (as Jonathan notes) is that its a complicated subsystem, and
the "obvious" fix probably isn't as obvious as it seems.

I'll happily click the re-open button (you could have, too), but I'm not
sure what the 'right' fix is. Feel free to move discussion to 5836.




On Mon, Feb 26, 2018 at 12:51 AM, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Fri, Feb 23, 2018 at 7:35 PM, Jeff Jirsa  wrote:
>
>> It comes up from time to time.  Rob Coli spent years arguing that this
>> behavior was confusing ( https://issues.apache.org/jira
>> /browse/CASSANDRA-5836 ) , especially in the "I'm replacing a failed
>> seed" sense. It also comes up when you're adding the first few hosts to a
>> new DC (where they're new, but they're definitely going to be the seeds for
>> the new DC).
>>
>
> Jeff,
>
> I find the response on this ticket quite terrible: a number of independent
> reports of significant problems caused by this behavior doesn't justify the
> "Won't Fix" status, IMO.
>
> We were also hit by this one time when the expected location of data
> directory has changed in our Docker image.  We were performing a rolling
> update of the cluster and the first two nodes that we've updated happened
> to be seeds.  They started happily with blank data directory and were
> serving read requests.  Ouch.  We only realized there was a problem then
> the next node that we've updated failed to start.  The only reason is that
> it *did* try to bootstrap and failed.
>
> People use to repeat "seed nodes are not different from non-seeds" and
> it's true from the perspective of a client application.  The same people
> would repeat "seeds don't bootstrap" as some kind of magical incantation,
> so seeds *are* different and in a subtle way for the operator.  But I don't
> believe that this difference is justified.  When creating a brand new
> cluster there is no practical difference as to using auto_bootstrap=true or
> false, because there is no data or clients, so the seed nodes behave
> exactly the same way as non-seeds.  When adding a new DC you are supposed
> to set auto_boostrap=false explicitly, so again no difference.
>
> Where it matters however, is node behavior in *unexpected* circumstances.
> If seeds nodes were truly not different from non-seeds in this regard,
> there would be less surprises, because of the total node uniformity within
> the cluster.
>
> Therefore, I argue that the ticket should be reopened.
>
> Regards,
> --
> Alex
>
>


Seed nodes and bootstrap (was: Re: Initializing a multiple node cluster (multiple datacenters))

2018-02-26 Thread Oleksandr Shulgin
On Fri, Feb 23, 2018 at 7:35 PM, Jeff Jirsa  wrote:

> It comes up from time to time.  Rob Coli spent years arguing that this
> behavior was confusing ( https://issues.apache.org/
> jira/browse/CASSANDRA-5836 ) , especially in the "I'm replacing a failed
> seed" sense. It also comes up when you're adding the first few hosts to a
> new DC (where they're new, but they're definitely going to be the seeds for
> the new DC).
>

Jeff,

I find the response on this ticket quite terrible: a number of independent
reports of significant problems caused by this behavior doesn't justify the
"Won't Fix" status, IMO.

We were also hit by this one time when the expected location of data
directory has changed in our Docker image.  We were performing a rolling
update of the cluster and the first two nodes that we've updated happened
to be seeds.  They started happily with blank data directory and were
serving read requests.  Ouch.  We only realized there was a problem then
the next node that we've updated failed to start.  The only reason is that
it *did* try to bootstrap and failed.

People use to repeat "seed nodes are not different from non-seeds" and it's
true from the perspective of a client application.  The same people would
repeat "seeds don't bootstrap" as some kind of magical incantation, so
seeds *are* different and in a subtle way for the operator.  But I don't
believe that this difference is justified.  When creating a brand new
cluster there is no practical difference as to using auto_bootstrap=true or
false, because there is no data or clients, so the seed nodes behave
exactly the same way as non-seeds.  When adding a new DC you are supposed
to set auto_boostrap=false explicitly, so again no difference.

Where it matters however, is node behavior in *unexpected* circumstances.
If seeds nodes were truly not different from non-seeds in this regard,
there would be less surprises, because of the total node uniformity within
the cluster.

Therefore, I argue that the ticket should be reopened.

Regards,
--
Alex


Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-23 Thread Jeff Jirsa
It comes up from time to time.  Rob Coli spent years arguing that this
behavior was confusing (
https://issues.apache.org/jira/browse/CASSANDRA-5836 ) , especially in the
"I'm replacing a failed seed" sense. It also comes up when you're adding
the first few hosts to a new DC (where they're new, but they're definitely
going to be the seeds for the new DC).




On Fri, Feb 23, 2018 at 10:22 AM, Jon Haddad  wrote:

> In my opinion and experience, this isn’t a real problem, since you define
> a list of seeds as the first few nodes you add to a cluster.  When would
> you add a node to an existing cluster and mark itself as a seed?  It’s
> neither practical or something you’d do by accident.
>
> On Feb 23, 2018, at 10:17 AM, Jeff Jirsa  wrote:
>
>
> On Fri, Feb 23, 2018 at 10:12 AM, Oleksandr Shulgin  zalando.de> wrote:
>
>> On Fri, Feb 23, 2018 at 7:02 PM, Jeff Jirsa  wrote:
>>>
>>> Yes, seeds don't bootstrap.  But why?  I don't think I ever seen a
 comprehensive explanation of this.

 The meaning of seed in the most common sense is "connect to this host,
>>> and use it as the starting point for adding this node to the cluster".
>>>
>>> If you specify that a joining node is the seed, the implication is that
>>> it's already a member of the cluster (or, alternatively, authoritative on
>>> the cluster's state).  Given that implication, why would it make sense to
>>> then proceed to bootstrap? By setting it as a seed, you've told it that it
>>> already knows what the cluster is.
>>>
>>
>> Well, there is certain logic in that.  However, bootstrap is about
>> streaming in the data, isn't it?  And being seed is about knowing the
>> topology, i.e. which nodes exist in the cluster.  There is actually 0
>> overlap of these two concerns, so I don't really see why a seed node
>> shouldn't be able to bootstrap.  Would it break anything if it could, e.g.
>> if you're explicit about it and request auto_boostrap=true?
>>
>>
> I dont *think* it would break anything, but the more obvious answer is
> just not to list the node as a seed if it needs to bootstrap.
>
> This comes up a lot, and it's certainly one of those rough operator edges
> that we can do better with. There's no strict requirement to have all of
> the seeds exactly the same in a cluster, so if you need to bootstrap a new
> seed, just join it with it not a seed, then bounce it to make it think it's
> a seed after it's joined.
>
> The easier answer is probably "give people a way to change seeds after
> they're running", and it sorta exists, but it's hard to invoke
> intentionally. We should just make that easier, and the rough edges will
> get a little less rough.
>
>
>


Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-23 Thread Jon Haddad
In my opinion and experience, this isn’t a real problem, since you define a 
list of seeds as the first few nodes you add to a cluster.  When would you add 
a node to an existing cluster and mark itself as a seed?  It’s neither 
practical or something you’d do by accident.   

> On Feb 23, 2018, at 10:17 AM, Jeff Jirsa  wrote:
> 
> 
> On Fri, Feb 23, 2018 at 10:12 AM, Oleksandr Shulgin 
> > wrote:
> On Fri, Feb 23, 2018 at 7:02 PM, Jeff Jirsa  > wrote:
> Yes, seeds don't bootstrap.  But why?  I don't think I ever seen a 
> comprehensive explanation of this.
> 
> The meaning of seed in the most common sense is "connect to this host, and 
> use it as the starting point for adding this node to the cluster".
> 
> If you specify that a joining node is the seed, the implication is that it's 
> already a member of the cluster (or, alternatively, authoritative on the 
> cluster's state).  Given that implication, why would it make sense to then 
> proceed to bootstrap? By setting it as a seed, you've told it that it already 
> knows what the cluster is. 
> 
> Well, there is certain logic in that.  However, bootstrap is about streaming 
> in the data, isn't it?  And being seed is about knowing the topology, i.e. 
> which nodes exist in the cluster.  There is actually 0 overlap of these two 
> concerns, so I don't really see why a seed node shouldn't be able to 
> bootstrap.  Would it break anything if it could, e.g. if you're explicit 
> about it and request auto_boostrap=true?
> 
> 
> I dont *think* it would break anything, but the more obvious answer is just 
> not to list the node as a seed if it needs to bootstrap.
> 
> This comes up a lot, and it's certainly one of those rough operator edges 
> that we can do better with. There's no strict requirement to have all of the 
> seeds exactly the same in a cluster, so if you need to bootstrap a new seed, 
> just join it with it not a seed, then bounce it to make it think it's a seed 
> after it's joined.
> 
> The easier answer is probably "give people a way to change seeds after 
> they're running", and it sorta exists, but it's hard to invoke intentionally. 
> We should just make that easier, and the rough edges will get a little less 
> rough.



Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-23 Thread Jeff Jirsa
On Fri, Feb 23, 2018 at 10:12 AM, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Fri, Feb 23, 2018 at 7:02 PM, Jeff Jirsa  wrote:
>>
>> Yes, seeds don't bootstrap.  But why?  I don't think I ever seen a
>>> comprehensive explanation of this.
>>>
>>> The meaning of seed in the most common sense is "connect to this host,
>> and use it as the starting point for adding this node to the cluster".
>>
>> If you specify that a joining node is the seed, the implication is that
>> it's already a member of the cluster (or, alternatively, authoritative on
>> the cluster's state).  Given that implication, why would it make sense to
>> then proceed to bootstrap? By setting it as a seed, you've told it that it
>> already knows what the cluster is.
>>
>
> Well, there is certain logic in that.  However, bootstrap is about
> streaming in the data, isn't it?  And being seed is about knowing the
> topology, i.e. which nodes exist in the cluster.  There is actually 0
> overlap of these two concerns, so I don't really see why a seed node
> shouldn't be able to bootstrap.  Would it break anything if it could, e.g.
> if you're explicit about it and request auto_boostrap=true?
>
>
I dont *think* it would break anything, but the more obvious answer is just
not to list the node as a seed if it needs to bootstrap.

This comes up a lot, and it's certainly one of those rough operator edges
that we can do better with. There's no strict requirement to have all of
the seeds exactly the same in a cluster, so if you need to bootstrap a new
seed, just join it with it not a seed, then bounce it to make it think it's
a seed after it's joined.

The easier answer is probably "give people a way to change seeds after
they're running", and it sorta exists, but it's hard to invoke
intentionally. We should just make that easier, and the rough edges will
get a little less rough.


Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-23 Thread Oleksandr Shulgin
On Fri, Feb 23, 2018 at 7:02 PM, Jeff Jirsa  wrote:
>
> Yes, seeds don't bootstrap.  But why?  I don't think I ever seen a
>> comprehensive explanation of this.
>>
>> The meaning of seed in the most common sense is "connect to this host,
> and use it as the starting point for adding this node to the cluster".
>
> If you specify that a joining node is the seed, the implication is that
> it's already a member of the cluster (or, alternatively, authoritative on
> the cluster's state).  Given that implication, why would it make sense to
> then proceed to bootstrap? By setting it as a seed, you've told it that it
> already knows what the cluster is.
>

Well, there is certain logic in that.  However, bootstrap is about
streaming in the data, isn't it?  And being seed is about knowing the
topology, i.e. which nodes exist in the cluster.  There is actually 0
overlap of these two concerns, so I don't really see why a seed node
shouldn't be able to bootstrap.  Would it break anything if it could, e.g.
if you're explicit about it and request auto_boostrap=true?

--
Alex


Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-23 Thread Jeff Jirsa
On Thu, Feb 22, 2018 at 11:06 PM, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Thu, Feb 22, 2018 at 5:42 PM, Jonathan Haddad 
> wrote:
>
>> If it's a new cluster, there's no need to disable auto_bootstrap.
>>
>
> True.
>
>
>> That setting prevents the first node in the second DC from being a
>> replica for all the data in the first DC.
>>
>
> Not sure where did you get that from?  Whether a node in a new DC would
> become a replica for any data or not is controlled by RFs of the relevant
> keyspaces, and not by the auto_bootstrap setting.
>
> Seeds don't bootstrap by the way, changing the setting on those nodes
>> doesn't do anything.
>>
>
> Yes, seeds don't bootstrap.  But why?  I don't think I ever seen a
> comprehensive explanation of this.
>
>
The meaning of seed in the most common sense is "connect to this host, and
use it as the starting point for adding this node to the cluster".

If you specify that a joining node is the seed, the implication is that
it's already a member of the cluster (or, alternatively, authoritative on
the cluster's state).  Given that implication, why would it make sense to
then proceed to bootstrap? By setting it as a seed, you've told it that it
already knows what the cluster is.


Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-23 Thread Oleksandr Shulgin
On Fri, Feb 23, 2018 at 8:32 AM, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

>
> This is important, because if you would like to use it on 3.0, it will not
> work unless you make sure that auto_boostrap is *not* set to false.  This
> is not critical when creating DCs from scratch, but requires you to hop
> through quite some loops if you already have some data and you want to add
> a new DC.  Full details in this email thread:
>
> https://lists.apache.org/thread.html/396f2d20397c36b9cff88a0
> c2c5523154d420ece24a4dafc9fde3d1f@%3Cuser.cassandra.apache.org%3E
>

Hm, I remember from our experience on that thread that bootstrapping was
always from local DC only.  But now, out of curiosity, I've tried to add a
non-seed node to a new DC with auto_boostrap=true and I see that it started
streaming.

Did this not happen to us in previous experiments because we started adding
the new DC by adding seed nodes first?  So then we add non-seeds with
auto_boostrap=true they didn't stream from remote DC because they believed
all replicas are already available in the local DC (but the seeds are
actually empty)?

Could someone please explain what are the rules for actual
boostrapping/streaming in the data?

Regards,
--
Alex


Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Oleksandr Shulgin
On Thu, Feb 22, 2018 at 8:11 PM, Jon Haddad  wrote:

> Great question.  Unfortunately, our OSS docs lack a step by step process
> on how to add a DC, I’ve created a JIRA to do that:
> https://issues.apache.org/jira/browse/CASSANDRA-14254
>

Thanks.  I'd love to contribute as well, just need some questions to be
clarified, maybe even on this thread.

The datastax docs are pretty good for this though: https://docs.datastax.
> com/en/cassandra/latest/cassandra/operations/opsAddDCToCluster.html
>

I don't understand why should this be relevant if the question was about
creating multi-DC cluster *from scratch*.  There is no need to care about
auto_boostrap (as discussed above) or use nodetool rebuild.

The only detail is that you might want to use NetworkTopologyStrategy for
system keyspaces as well, which is pre-requisite when using rebuild, but
not required when creating from scratch.

Regarding token allocation, it was random prior to 3.0.  In 3.0 and up, it
> is calculated a little more intelligently.
>

But not by default.  To use the new token allocation you need specify
allocate_tokens_for_keyspace in cassandra.yaml (or in JVM_OPTS).

The thing is, if it's a new cluster, there are no user keyspaces yet.  So
you'll have to work around by starting at least one node (which
incidentally gets random tokens), then creating your data keyspace, and
only then continue to add more nodes with the setting
allocate_tokens_for_keyspace=mydata_ks.

This is a bit unfortunate, since the only information the token allocator
actually needs from the keyspace is the replication factors (it doesn't
care about name, replication strategy or actual load on the existing nodes).

The folks at DataStax realized that soon enough, so in DSE the setting is
now called 'allocate_tokens_for_local_replication_factor' and the other one
is deprecated:
https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__allocate_tokens_for_local_replication_factor

I believe it would make sense to Apache Cassandra adopt this change, but I
don't see a JIRA for that.  Should I open one?

in 3.11.2, which was just released, CASSANDRA-13080 was backported which
> will help out when you add your second DC.  If you go this route, you can
> drop your token count down to 16 and get all the benefits with no drawbacks.
>

This is important, because if you would like to use it on 3.0, it will not
work unless you make sure that auto_boostrap is *not* set to false.  This
is not critical when creating DCs from scratch, but requires you to hop
through quite some loops if you already have some data and you want to add
a new DC.  Full details in this email thread:

https://lists.apache.org/thread.html/396f2d20397c36b9cff88a0c2c5523154d420ece24a4dafc9fde3d1f@%3Cuser.cassandra.apache.org%3E

Cheers,
--
Alex


Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Oleksandr Shulgin
On Thu, Feb 22, 2018 at 5:42 PM, Jonathan Haddad  wrote:

> If it's a new cluster, there's no need to disable auto_bootstrap.
>

True.


> That setting prevents the first node in the second DC from being a replica
> for all the data in the first DC.
>

Not sure where did you get that from?  Whether a node in a new DC would
become a replica for any data or not is controlled by RFs of the relevant
keyspaces, and not by the auto_bootstrap setting.

Seeds don't bootstrap by the way, changing the setting on those nodes
> doesn't do anything.
>

Yes, seeds don't bootstrap.  But why?  I don't think I ever seen a
comprehensive explanation of this.

Thanks,
--
Alex


Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Oleksandr Shulgin
On Thu, Feb 22, 2018 at 5:36 PM, Jean Carlo 
wrote:

> Hello
>
> I would like to clarify this,
>
> In order to initialize  a  cassandra multi dc cluster, without data. If I
> follow the documentation datastax
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/
> initializeMultipleDS.html
>
> It says
>
>- auto_bootstrap: false (Add this setting *only* when initializing a
>clean node with no data.)
>
> But I dont understand the way this works regarding to the auto_bootstraps.
>
> If all the machines make their own tokens in a ramdon way using
> murmur3partitioner and vnodes , it isn't probable that two nodes will have
> the tokens in common ?
>

The key is not to start all nodes at the same time.  The token allocation
is random by default, but every node checks with the rest of the cluster to
see if a token has not been already taken, then it generates another random
one if needed.

It is recommended(where?) to wait for ~2 minutes between starting nodes in
a new cluster/DC.

> It is not better to bootstrap first the seeds with auto_bootstrap: false
> and then the rest of the nodes with auto_bootstrap: true ?
>
The auto_bootstrap setting has no influence on token allocation (unless you
want to use the new token allocation algorithm on version 3.0).  It only
allows you to skip streaming from the rest of the nodes, but since there is
no data in a brand new cluster, there is no practical difference.

Regards,
--
Alex


Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jon Haddad
In 2.1 token allocation is random, and the distribution doesn’t work as nicely. 
 Everything else is the same.

Do not use 3.1.  Under any circumstances.  Guessing that’s a typo but I just 
want to be sure.

Jon

> On Feb 22, 2018, at 1:45 PM, Jean Carlo <jean.jeancar...@gmail.com> wrote:
> 
> Hi Jonathan
> 
> Yes I do think this is a good idea about the doc. 
> 
> About the clarification, this is still true for the 2.1 ? We are planing 
> upgrading to the 3.1 but not in the next months. We will stick for few more 
> months on the 2.1. 
> 
> I believe this is true also for the 2.1 but I would like to confirm I am 
> missing something 
> 
> 
> Saludos
> 
> Jean Carlo
> 
> "The best way to predict the future is to invent it" Alan Kay
> 
> On Thu, Feb 22, 2018 at 10:28 PM, Kenneth Brotman 
> <kenbrot...@yahoo.com.invalid <mailto:kenbrot...@yahoo.com.invalid>> wrote:
> I will heavy lift the docs for a while, do my Slender Cassandra reference 
> project and then I’ll try to find one or two areas where I can contribute 
> code to get going on that.  I have read the section on contributing before I 
> start.  I’ll self-assign the JIRA right now.
> 
>  
> 
> Kenneth Brotman
> 
>  
> 
> From: Jonathan Haddad [mailto:j...@jonhaddad.com <mailto:j...@jonhaddad.com>] 
> Sent: Thursday, February 22, 2018 1:21 PM
> To: user@cassandra.apache.org <mailto:user@cassandra.apache.org>
> Subject: Re: Initializing a multiple node cluster (multiple datacenters)
> 
>  
> 
> Kenneth, if you want to take the JIRA, feel free to self-assign it to 
> yourself and put up a pull request or patch, and I'll review.  I'd be very 
> happy to get more people involved in the docs.
> 
>  
> 
> On Thu, Feb 22, 2018 at 12:56 PM Kenneth Brotman 
> <kenbrot...@yahoo.com.invalid <mailto:kenbrot...@yahoo.com.invalid>> wrote:
> 
> That information would have saved me time too.  Thanks for making a JIRA for 
> it Jon.  Perhaps this is a good JIRA for me to begin with.
> 
>  
> 
> Kenneth Brotman 
> 
>  
> 
> From: Jon Haddad [mailto:jonathan.had...@gmail.com 
> <mailto:jonathan.had...@gmail.com>] On Behalf Of Jon Haddad
> Sent: Thursday, February 22, 2018 11:11 AM
> To: user
> Subject: Re: Initializing a multiple node cluster (multiple datacenters)
> 
>  
> 
> Great question.  Unfortunately, our OSS docs lack a step by step process on 
> how to add a DC, I’ve created a JIRA to do that: 
> https://issues.apache.org/jira/browse/CASSANDRA-14254 
> <https://issues.apache.org/jira/browse/CASSANDRA-14254>
>  
> 
> The datastax docs are pretty good for this though: 
> https://docs.datastax.com/en/cassandra/latest/cassandra/operations/opsAddDCToCluster.html
>  
> <https://docs.datastax.com/en/cassandra/latest/cassandra/operations/opsAddDCToCluster.html>
>  
> 
> Regarding token allocation, it was random prior to 3.0.  In 3.0 and up, it is 
> calculated a little more intelligently.  in 3.11.2, which was just released, 
> CASSANDRA-13080 was backported which will help out when you add your second 
> DC.  If you go this route, you can drop your token count down to 16 and get 
> all the benefits with no drawbacks.  
> 
>  
> 
> At this point I would go straight to 3.11.2 and skip 3.0 as there were quite 
> a few improvements that make it worthwhile along the way, in my opinion.  We 
> work with several customers that are running 3.11 and are pretty happy with 
> it 
> 
>  
> 
> Yes, if there’s no data, you can initialize the cluster with auto_boostrap: 
> true.  Be sure to change any key spaces using simple strategy to NTS first, 
> and replica them to the new DC as well. 
> 
>  
> 
> Jon
> 
>  
> 
>  
> 
> On Feb 22, 2018, at 10:53 AM, Jean Carlo <jean.jeancar...@gmail.com 
> <mailto:jean.jeancar...@gmail.com>> wrote:
> 
>  
> 
> Hi jonathan
> 
>  
> 
> Thank you for the answer. Do you know where to look to understand why this 
> works. As i understood all the node then will chose ramdoms tokens. How can i 
> assure the correctness of the ring?
> 
>  
> 
> So as you said. Under the condition that there.is <http://there.is/> no data 
> in the cluster. I can initialize a cluster multi dc without disable auto 
> bootstrap.?
> 
>  
> 
> On Feb 22, 2018 5:43 PM, "Jonathan Haddad" <j...@jonhaddad.com 
> <mailto:j...@jonhaddad.com>> wrote:
> 
> If it's a new cluster, there's no need to disable auto_bootstrap.  That 
> setting prevents the first node in the second DC from being a replica for all 
> the data in the first DC.  If there's no data in the first DC, 

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jean Carlo
Hi Jonathan

Yes I do think this is a good idea about the doc.

About the clarification, this is still true for the 2.1 ? We are planing
upgrading to the 3.1 but not in the next months. We will stick for few more
months on the 2.1.

I believe this is true also for the 2.1 but I would like to confirm I am
missing something


Saludos

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay

On Thu, Feb 22, 2018 at 10:28 PM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

> I will heavy lift the docs for a while, do my Slender Cassandra reference
> project and then I’ll try to find one or two areas where I can contribute
> code to get going on that.  I have read the section on contributing before
> I start.  I’ll self-assign the JIRA right now.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
> *Sent:* Thursday, February 22, 2018 1:21 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Initializing a multiple node cluster (multiple datacenters)
>
>
>
> Kenneth, if you want to take the JIRA, feel free to self-assign it to
> yourself and put up a pull request or patch, and I'll review.  I'd be very
> happy to get more people involved in the docs.
>
>
>
> On Thu, Feb 22, 2018 at 12:56 PM Kenneth Brotman <
> kenbrot...@yahoo.com.invalid> wrote:
>
> That information would have saved me time too.  Thanks for making a JIRA
> for it Jon.  Perhaps this is a good JIRA for me to begin with.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Jon Haddad [mailto:jonathan.had...@gmail.com] *On Behalf Of *Jon
> Haddad
> *Sent:* Thursday, February 22, 2018 11:11 AM
> *To:* user
> *Subject:* Re: Initializing a multiple node cluster (multiple datacenters)
>
>
>
> Great question.  Unfortunately, our OSS docs lack a step by step process
> on how to add a DC, I’ve created a JIRA to do that: https://issues.apache.
> org/jira/browse/CASSANDRA-14254
>
>
>
> The datastax docs are pretty good for this though: https://docs.datastax.
> com/en/cassandra/latest/cassandra/operations/opsAddDCToCluster.html
>
>
>
> Regarding token allocation, it was random prior to 3.0.  In 3.0 and up, it
> is calculated a little more intelligently.  in 3.11.2, which was just
> released, CASSANDRA-13080 was backported which will help out when you add
> your second DC.  If you go this route, you can drop your token count down
> to 16 and get all the benefits with no drawbacks.
>
>
>
> At this point I would go straight to 3.11.2 and skip 3.0 as there were
> quite a few improvements that make it worthwhile along the way, in my
> opinion.  We work with several customers that are running 3.11 and are
> pretty happy with it
>
>
>
> Yes, if there’s no data, you can initialize the cluster with
> auto_boostrap: true.  Be sure to change any key spaces using simple
> strategy to NTS first, and replica them to the new DC as well.
>
>
>
> Jon
>
>
>
>
>
> On Feb 22, 2018, at 10:53 AM, Jean Carlo <jean.jeancar...@gmail.com>
> wrote:
>
>
>
> Hi jonathan
>
>
>
> Thank you for the answer. Do you know where to look to understand why this
> works. As i understood all the node then will chose ramdoms tokens. How can
> i assure the correctness of the ring?
>
>
>
> So as you said. Under the condition that there.is no data in the cluster.
> I can initialize a cluster multi dc without disable auto bootstrap.?
>
>
>
> On Feb 22, 2018 5:43 PM, "Jonathan Haddad" <j...@jonhaddad.com> wrote:
>
> If it's a new cluster, there's no need to disable auto_bootstrap.  That
> setting prevents the first node in the second DC from being a replica for
> all the data in the first DC.  If there's no data in the first DC, you can
> skip a couple steps and just leave it on.
>
>
>
> Leave it on, and enjoy your afternoon.
>
>
>
> Seeds don't bootstrap by the way, changing the setting on those nodes
> doesn't do anything.
>
>
>
> On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo <jeanjeancar...@gmail.com
> <jean.jeancar...@gmail.com>> wrote:
>
> Hello
>
> I would like to clarify this,
>
>
>
> In order to initialize  a  cassandra multi dc cluster, without data. If I
> follow the documentation datastax
>
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/
> initializeMultipleDS.html
>
> It says
>
>- auto_bootstrap: false (Add this setting *only* when initializing a
>clean node with no data.)
>
> But I dont understand the way this works regarding to the auto_bootstraps.
>
> If all the machines make their own tokens in a ramdon way using
> murmur3partitioner and vnodes , it isn't probable that two nodes will have
> the tokens in common ?
>
> It is not better to bootstrap first the seeds with auto_bootstrap: false
> and then the rest of the nodes with auto_bootstrap: true ?
>
>
>
> Thank you for the help
>
>
>
> Jean Carlo
>
>
> "The best way to predict the future is to invent it" Alan Kay
>
>
>
>
>
>


RE: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Kenneth Brotman
I will heavy lift the docs for a while, do my Slender Cassandra reference 
project and then I’ll try to find one or two areas where I can contribute code 
to get going on that.  I have read the section on contributing before I start.  
I’ll self-assign the JIRA right now.

 

Kenneth Brotman

 

From: Jonathan Haddad [mailto:j...@jonhaddad.com] 
Sent: Thursday, February 22, 2018 1:21 PM
To: user@cassandra.apache.org
Subject: Re: Initializing a multiple node cluster (multiple datacenters)

 

Kenneth, if you want to take the JIRA, feel free to self-assign it to yourself 
and put up a pull request or patch, and I'll review.  I'd be very happy to get 
more people involved in the docs.

 

On Thu, Feb 22, 2018 at 12:56 PM Kenneth Brotman <kenbrot...@yahoo.com.invalid> 
wrote:

That information would have saved me time too.  Thanks for making a JIRA for it 
Jon.  Perhaps this is a good JIRA for me to begin with.

 

Kenneth Brotman  

 

From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad
Sent: Thursday, February 22, 2018 11:11 AM
To: user
Subject: Re: Initializing a multiple node cluster (multiple datacenters)

 

Great question.  Unfortunately, our OSS docs lack a step by step process on how 
to add a DC, I’ve created a JIRA to do that: 
https://issues.apache.org/jira/browse/CASSANDRA-14254

 

The datastax docs are pretty good for this though: 
https://docs.datastax.com/en/cassandra/latest/cassandra/operations/opsAddDCToCluster.html

 

Regarding token allocation, it was random prior to 3.0.  In 3.0 and up, it is 
calculated a little more intelligently.  in 3.11.2, which was just released, 
CASSANDRA-13080 was backported which will help out when you add your second DC. 
 If you go this route, you can drop your token count down to 16 and get all the 
benefits with no drawbacks.  

 

At this point I would go straight to 3.11.2 and skip 3.0 as there were quite a 
few improvements that make it worthwhile along the way, in my opinion.  We work 
with several customers that are running 3.11 and are pretty happy with it 

 

Yes, if there’s no data, you can initialize the cluster with auto_boostrap: 
true.  Be sure to change any key spaces using simple strategy to NTS first, and 
replica them to the new DC as well. 

 

Jon

 

 

On Feb 22, 2018, at 10:53 AM, Jean Carlo <jean.jeancar...@gmail.com> wrote:

 

Hi jonathan

 

Thank you for the answer. Do you know where to look to understand why this 
works. As i understood all the node then will chose ramdoms tokens. How can i 
assure the correctness of the ring?

 

So as you said. Under the condition that there.is <http://there.is/>  no data 
in the cluster. I can initialize a cluster multi dc without disable auto 
bootstrap.?

 

On Feb 22, 2018 5:43 PM, "Jonathan Haddad" <j...@jonhaddad.com> wrote:

If it's a new cluster, there's no need to disable auto_bootstrap.  That setting 
prevents the first node in the second DC from being a replica for all the data 
in the first DC.  If there's no data in the first DC, you can skip a couple 
steps and just leave it on.

 

Leave it on, and enjoy your afternoon.

 

Seeds don't bootstrap by the way, changing the setting on those nodes doesn't 
do anything.

 

On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo <jeanjeancar...@gmail.com 
<mailto:jean.jeancar...@gmail.com> > wrote:

Hello

I would like to clarify this,

 

In order to initialize  a  cassandra multi dc cluster, without data. If I  
follow the documentation datastax




https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html

It says

*   auto_bootstrap: false (Add this setting only when initializing a clean 
node with no data.) 

But I dont understand the way this works regarding to the auto_bootstraps. 

If all the machines make their own tokens in a ramdon way using 
murmur3partitioner and vnodes , it isn't probable that two nodes will have the 
tokens in common ?

It is not better to bootstrap first the seeds with auto_bootstrap: false and 
then the rest of the nodes with auto_bootstrap: true ?

 

Thank you for the help

 

Jean Carlo


"The best way to predict the future is to invent it" Alan Kay

 

 



Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jonathan Haddad
Kenneth, if you want to take the JIRA, feel free to self-assign it to
yourself and put up a pull request or patch, and I'll review.  I'd be very
happy to get more people involved in the docs.

On Thu, Feb 22, 2018 at 12:56 PM Kenneth Brotman
<kenbrot...@yahoo.com.invalid> wrote:

> That information would have saved me time too.  Thanks for making a JIRA
> for it Jon.  Perhaps this is a good JIRA for me to begin with.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Jon Haddad [mailto:jonathan.had...@gmail.com] *On Behalf Of *Jon
> Haddad
> *Sent:* Thursday, February 22, 2018 11:11 AM
> *To:* user
> *Subject:* Re: Initializing a multiple node cluster (multiple datacenters)
>
>
>
> Great question.  Unfortunately, our OSS docs lack a step by step process
> on how to add a DC, I’ve created a JIRA to do that:
> https://issues.apache.org/jira/browse/CASSANDRA-14254
>
>
>
> The datastax docs are pretty good for this though:
> https://docs.datastax.com/en/cassandra/latest/cassandra/operations/opsAddDCToCluster.html
>
>
>
> Regarding token allocation, it was random prior to 3.0.  In 3.0 and up, it
> is calculated a little more intelligently.  in 3.11.2, which was just
> released, CASSANDRA-13080 was backported which will help out when you add
> your second DC.  If you go this route, you can drop your token count down
> to 16 and get all the benefits with no drawbacks.
>
>
>
> At this point I would go straight to 3.11.2 and skip 3.0 as there were
> quite a few improvements that make it worthwhile along the way, in my
> opinion.  We work with several customers that are running 3.11 and are
> pretty happy with it.
>
>
>
> Yes, if there’s no data, you can initialize the cluster with
> auto_boostrap: true.  Be sure to change any key spaces using simple
> strategy to NTS first, and replica them to the new DC as well.
>
>
>
> Jon
>
>
>
>
>
> On Feb 22, 2018, at 10:53 AM, Jean Carlo <jean.jeancar...@gmail.com>
> wrote:
>
>
>
> Hi jonathan
>
>
>
> Thank you for the answer. Do you know where to look to understand why this
> works. As i understood all the node then will chose ramdoms tokens. How can
> i assure the correctness of the ring?
>
>
>
> So as you said. Under the condition that there.is no data in the cluster.
> I can initialize a cluster multi dc without disable auto bootstrap.?
>
>
>
> On Feb 22, 2018 5:43 PM, "Jonathan Haddad" <j...@jonhaddad.com> wrote:
>
> If it's a new cluster, there's no need to disable auto_bootstrap.  That
> setting prevents the first node in the second DC from being a replica for
> all the data in the first DC.  If there's no data in the first DC, you can
> skip a couple steps and just leave it on.
>
>
>
> Leave it on, and enjoy your afternoon.
>
>
>
> Seeds don't bootstrap by the way, changing the setting on those nodes
> doesn't do anything.
>
>
>
> On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo <jean.jeancar...@gmail.com>
> wrote:
>
> Hello
>
> I would like to clarify this,
>
>
>
> In order to initialize  a  cassandra multi dc cluster, without data. If I
> follow the documentation datastax
>
>
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html
>
> It says
>
>- auto_bootstrap: false (Add this setting *only* when initializing a
>clean node with no data.)
>
> But I dont understand the way this works regarding to the auto_bootstraps.
>
> If all the machines make their own tokens in a ramdon way using
> murmur3partitioner and vnodes , it isn't probable that two nodes will have
> the tokens in common ?
>
> It is not better to bootstrap first the seeds with auto_bootstrap: false
> and then the rest of the nodes with auto_bootstrap: true ?
>
>
>
> Thank you for the help
>
>
>
> Jean Carlo
>
>
> "The best way to predict the future is to invent it" Alan Kay
>
>
>
>
>


RE: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Kenneth Brotman
That information would have saved me time too.  Thanks for making a JIRA for it 
Jon.  Perhaps this is a good JIRA for me to begin with.

 

Kenneth Brotman  

 

From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad
Sent: Thursday, February 22, 2018 11:11 AM
To: user
Subject: Re: Initializing a multiple node cluster (multiple datacenters)

 

Great question.  Unfortunately, our OSS docs lack a step by step process on how 
to add a DC, I’ve created a JIRA to do that: 
https://issues.apache.org/jira/browse/CASSANDRA-14254

 

The datastax docs are pretty good for this though: 
https://docs.datastax.com/en/cassandra/latest/cassandra/operations/opsAddDCToCluster.html

 

Regarding token allocation, it was random prior to 3.0.  In 3.0 and up, it is 
calculated a little more intelligently.  in 3.11.2, which was just released, 
CASSANDRA-13080 was backported which will help out when you add your second DC. 
 If you go this route, you can drop your token count down to 16 and get all the 
benefits with no drawbacks.  

 

At this point I would go straight to 3.11.2 and skip 3.0 as there were quite a 
few improvements that make it worthwhile along the way, in my opinion.  We work 
with several customers that are running 3.11 and are pretty happy with it. 

 

Yes, if there’s no data, you can initialize the cluster with auto_boostrap: 
true.  Be sure to change any key spaces using simple strategy to NTS first, and 
replica them to the new DC as well. 

 

Jon

 





On Feb 22, 2018, at 10:53 AM, Jean Carlo <jean.jeancar...@gmail.com> wrote:

 

Hi jonathan

 

Thank you for the answer. Do you know where to look to understand why this 
works. As i understood all the node then will chose ramdoms tokens. How can i 
assure the correctness of the ring?

 

So as you said. Under the condition that there.is <http://there.is/>  no data 
in the cluster. I can initialize a cluster multi dc without disable auto 
bootstrap.?

 

On Feb 22, 2018 5:43 PM, "Jonathan Haddad" <j...@jonhaddad.com> wrote:

If it's a new cluster, there's no need to disable auto_bootstrap.  That setting 
prevents the first node in the second DC from being a replica for all the data 
in the first DC.  If there's no data in the first DC, you can skip a couple 
steps and just leave it on.

 

Leave it on, and enjoy your afternoon.

 

Seeds don't bootstrap by the way, changing the setting on those nodes doesn't 
do anything.

 

On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo <jean.jeancar...@gmail.com> wrote:

Hello

I would like to clarify this,

 

In order to initialize  a  cassandra multi dc cluster, without data. If I  
follow the documentation datastax




https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html



It says

*   auto_bootstrap: false (Add this setting only when initializing a clean 
node with no data.) 

But I dont understand the way this works regarding to the auto_bootstraps. 

If all the machines make their own tokens in a ramdon way using 
murmur3partitioner and vnodes , it isn't probable that two nodes will have the 
tokens in common ?

It is not better to bootstrap first the seeds with auto_bootstrap: false and 
then the rest of the nodes with auto_bootstrap: true ?

 

Thank you for the help

 

Jean Carlo


"The best way to predict the future is to invent it" Alan Kay

 

 



Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jon Haddad
Great question.  Unfortunately, our OSS docs lack a step by step process on how 
to add a DC, I’ve created a JIRA to do that: 
https://issues.apache.org/jira/browse/CASSANDRA-14254 


The datastax docs are pretty good for this though: 
https://docs.datastax.com/en/cassandra/latest/cassandra/operations/opsAddDCToCluster.html
 


Regarding token allocation, it was random prior to 3.0.  In 3.0 and up, it is 
calculated a little more intelligently.  in 3.11.2, which was just released, 
CASSANDRA-13080 was backported which will help out when you add your second DC. 
 If you go this route, you can drop your token count down to 16 and get all the 
benefits with no drawbacks.  

At this point I would go straight to 3.11.2 and skip 3.0 as there were quite a 
few improvements that make it worthwhile along the way, in my opinion.  We work 
with several customers that are running 3.11 and are pretty happy with it. 

Yes, if there’s no data, you can initialize the cluster with auto_boostrap: 
true.  Be sure to change any key spaces using simple strategy to NTS first, and 
replica them to the new DC as well. 

Jon


> On Feb 22, 2018, at 10:53 AM, Jean Carlo  wrote:
> 
> Hi jonathan
> 
> Thank you for the answer. Do you know where to look to understand why this 
> works. As i understood all the node then will chose ramdoms tokens. How can i 
> assure the correctness of the ring?
> 
> So as you said. Under the condition that there.is  no data 
> in the cluster. I can initialize a cluster multi dc without disable auto 
> bootstrap.?
> 
> On Feb 22, 2018 5:43 PM, "Jonathan Haddad"  > wrote:
> If it's a new cluster, there's no need to disable auto_bootstrap.  That 
> setting prevents the first node in the second DC from being a replica for all 
> the data in the first DC.  If there's no data in the first DC, you can skip a 
> couple steps and just leave it on.
> 
> Leave it on, and enjoy your afternoon.
> 
> Seeds don't bootstrap by the way, changing the setting on those nodes doesn't 
> do anything.
> 
> On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo  > wrote:
> Hello
> 
> I would like to clarify this,
> 
> In order to initialize  a  cassandra multi dc cluster, without data. If I  
> follow the documentation datastax
> 
> https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html
>  
> 
> 
> 
> It says
> auto_bootstrap: false (Add this setting only when initializing a clean node 
> with no data.)
> But I dont understand the way this works regarding to the auto_bootstraps. 
> 
> If all the machines make their own tokens in a ramdon way using 
> murmur3partitioner and vnodes , it isn't probable that two nodes will have 
> the tokens in common ?
> It is not better to bootstrap first the seeds with auto_bootstrap: false and 
> then the rest of the nodes with auto_bootstrap: true ?
> 
> 
> Thank you for the help
> 
> Jean Carlo
> 
> "The best way to predict the future is to invent it" Alan Kay
> 



Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jean Carlo
Hi jonathan

Thank you for the answer. Do you know where to look to understand why this
works. As i understood all the node then will chose ramdoms tokens. How can
i assure the correctness of the ring?

So as you said. Under the condition that there.is no data in the cluster. I
can initialize a cluster multi dc without disable auto bootstrap.?

On Feb 22, 2018 5:43 PM, "Jonathan Haddad"  wrote:

If it's a new cluster, there's no need to disable auto_bootstrap.  That
setting prevents the first node in the second DC from being a replica for
all the data in the first DC.  If there's no data in the first DC, you can
skip a couple steps and just leave it on.

Leave it on, and enjoy your afternoon.

Seeds don't bootstrap by the way, changing the setting on those nodes
doesn't do anything.

On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo 
wrote:

> Hello
>
> I would like to clarify this,
>
> In order to initialize  a  cassandra multi dc cluster, without data. If I
> follow the documentation datastax
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/
> initializeMultipleDS.html
>
>
> It says
>
>- auto_bootstrap: false (Add this setting *only* when initializing a
>clean node with no data.)
>
> But I dont understand the way this works regarding to the auto_bootstraps.
>
> If all the machines make their own tokens in a ramdon way using
> murmur3partitioner and vnodes , it isn't probable that two nodes will have
> the tokens in common ?
>
> It is not better to bootstrap first the seeds with auto_bootstrap: false
> and then the rest of the nodes with auto_bootstrap: true ?
>
> Thank you for the help
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>


Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jonathan Haddad
If it's a new cluster, there's no need to disable auto_bootstrap.  That
setting prevents the first node in the second DC from being a replica for
all the data in the first DC.  If there's no data in the first DC, you can
skip a couple steps and just leave it on.

Leave it on, and enjoy your afternoon.

Seeds don't bootstrap by the way, changing the setting on those nodes
doesn't do anything.

On Thu, Feb 22, 2018 at 8:36 AM Jean Carlo 
wrote:

> Hello
>
> I would like to clarify this,
>
> In order to initialize  a  cassandra multi dc cluster, without data. If I
> follow the documentation datastax
>
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html
>
>
> It says
>
>- auto_bootstrap: false (Add this setting *only* when initializing a
>clean node with no data.)
>
> But I dont understand the way this works regarding to the auto_bootstraps.
>
> If all the machines make their own tokens in a ramdon way using
> murmur3partitioner and vnodes , it isn't probable that two nodes will have
> the tokens in common ?
>
> It is not better to bootstrap first the seeds with auto_bootstrap: false
> and then the rest of the nodes with auto_bootstrap: true ?
>
> Thank you for the help
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>


Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jean Carlo
Hello

I would like to clarify this,

In order to initialize  a  cassandra multi dc cluster, without data. If I
follow the documentation datastax

https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html


It says

   - auto_bootstrap: false (Add this setting *only* when initializing a
   clean node with no data.)

But I dont understand the way this works regarding to the auto_bootstraps.

If all the machines make their own tokens in a ramdon way using
murmur3partitioner and vnodes , it isn't probable that two nodes will have
the tokens in common ?

It is not better to bootstrap first the seeds with auto_bootstrap: false
and then the rest of the nodes with auto_bootstrap: true ?

Thank you for the help

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay