Re: node replacement failed

2018-09-22 Thread kurt greaves
I don't like your cunning plan. Don't drop the system auth and distributed
keyspaces, instead just change them to NTS and then do your replacement for
each down node.

 If you're actually using auth and worried about consistency I believe 3.11
has the feature to be able to exclude nodes during a repair which you could
use just to repair the auth keyspace.
But if you're not using auth go ahead and change them and then do all your
replaces is the best method of recovery here.

On Sun., 23 Sep. 2018, 00:33 onmstester onmstester, 
wrote:

> Another question,
> Is there a management tool to do nodetool cleanup one by one (wait until
> finish of cleaning up one node then start clean up for the next node in
> cluster)?
>  On Sat, 22 Sep 2018 16:02:17 +0330 *onmstester onmstester
> >* wrote 
>
> I have a cunning plan (Baldrick wise) to solve this problem:
>
>- stop client application
>- run nodetool flush on all nodes to save memtables to disk
>- stop cassandra on all of the nodes
>- rename original Cassandra data directory to data-old
>- start cassandra on all the nodes to create a fresh cluster including
>the old dead nodes
>- again create the application related keyspaces in cqlsh and this
>time set rf=2 on system keyspaces (to never encounter this problem again!)
>- move sstables from data-backup dir to current data dirs and restart
>cassandra or reload sstables
>
>
> Should this work and solve my problem?
>
>
>  On Mon, 10 Sep 2018 17:12:48 +0430 *onmstester onmstester
> >* wrote 
>
>
>
> Thanks Alain,
> First here it is more detail about my cluster:
>
>- 10 racks + 3 nodes on each rack
>- nodetool status: shows 27 nodes UN and 3 nodes all related to single
>rack as DN
>- version 3.11.2
>
> *Option 1: (Change schema and) use replace method (preferred method)*
> * Did you try to have the replace going, without any former repairs,
> ignoring the fact 'system_traces' might be inconsistent? You probably don't
> care about this table, so if Cassandra allows it with some of the nodes
> down, going this way is relatively safe probably. I really do not see what
> you could lose that matters in this table.
> * Another option, if the schema first change was accepted, is to make the
> second one, to drop this table. You can always rebuild it in case you need
> it I assume.
>
> I really love to let the replace going, but it stops with the error:
>
> java.lang.IllegalStateException: unable to find sufficient sources for
> streaming range in keyspace system_traces
>
>
> Also i could delete system_traces which is empty anyway, but there is a
> system_auth and system_distributed keyspace too and they are not empty,
> Could i delete them safely too?
> If i could just somehow skip streaming the system keyspaces from node
> replace phase, the option 1 would be great.
>
> P.S: Its clear to me that i should use at least RF=3 in production, but
> could not manage to acquire enough resources yet (i hope would be fixed in
> recent future)
>
> Again Thank you for your time
>
> Sent using Zoho Mail 
>
>
>  On Mon, 10 Sep 2018 16:20:10 +0430 *Alain RODRIGUEZ
> >* wrote 
>
>
>
> Hello,
>
> I am sorry it took us (the community) more than a day to answer to this
> rather critical situation. That being said, my recommendation at this point
> would be for you to make sure about the impacts of whatever you would try.
> Working on a broken cluster, as an emergency might lead you to a second
> mistake, possibly more destructive than the first one. It happened to me
> and around, for many clusters. Move forward even more carefuly in these
> situations as a global advice.
>
> Suddenly i lost all disks of cassandar-data on one of my racks
>
>
> With RF=2, I guess operations use LOCAL_ONE consistency, thus you should
> have all the data in the safe rack(s) with your configuration, you probably
> did not lose anything yet and have the service only using the nodes up,
> that got the right data.
>
>  tried to replace the nodes with same ip using this:
>
> https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
>
>
> As a side note, I would recommend you to use 'replace_address_first_boot'
> instead of 'replace_address'. This does basically the same but will be
> ignored after the first bootstrap. A detail, but hey, it's there and
> somewhat safer, I would use this one.
>
> java.lang.IllegalStateException: unable to find sufficient sources for
> streaming range in keyspace system_traces
>
>
> By default, non-user keyspace use 'SimpleStrategy' and a small RF.
> Ideally, this should be changed in a production cluster, and you're having
> an example of why.
>
> Now when i altered the system_traces keyspace startegy to
> NetworkTopologyStrategy and RF=2
> but then running nodetool repair failed: Endpoint not alive /IP of dead
> node that i'm trying to replace.
>
>
> Changing the replication strategy you made the dead 

Re: stuck with num_tokens 256

2018-09-22 Thread kurt greaves
No, that's not true.

On Sat., 22 Sep. 2018, 21:58 onmstester onmstester, 
wrote:

>
> If you have problems with balance you can add new nodes using the
> algorithm and it'll balance out the cluster. You probably want to stick to
> 256 tokens though.
>
>
> I read somewhere (don't remember the ref) that all nodes of the cluster
> should use the same algorithm, so if my cluster suffer from imbalanced
> nodes using random algorithm i can not add new nodes that are using
> Allocation algorithm. isn't that correct?
>
>
>


Re: node replacement failed

2018-09-22 Thread onmstester onmstester
Another question, Is there a management tool to do nodetool cleanup one by one 
(wait until finish of cleaning up one node then start clean up for the next 
node in cluster)?  On Sat, 22 Sep 2018 16:02:17 +0330 onmstester onmstester 
 wrote  I have a cunning plan (Baldrick wise) to solve 
this problem: stop client application run nodetool flush on all nodes to save 
memtables to disk stop cassandra on all of the nodes rename original Cassandra 
data directory to data-old start cassandra on all the nodes to create a fresh 
cluster including the old dead nodes again create the application related 
keyspaces in cqlsh and this time set rf=2 on system keyspaces (to never 
encounter this problem again!) move sstables from data-backup dir to current 
data dirs and restart cassandra or reload sstables Should this work and solve 
my problem?  On Mon, 10 Sep 2018 17:12:48 +0430 onmstester onmstester 
 wrote  Thanks Alain, First here it is more detail 
about my cluster: 10 racks + 3 nodes on each rack nodetool status: shows 27 
nodes UN and 3 nodes all related to single rack as DN version 3.11.2 Option 1: 
(Change schema and) use replace method (preferred method) * Did you try to have 
the replace going, without any former repairs, ignoring the fact 
'system_traces' might be inconsistent? You probably don't care about this 
table, so if Cassandra allows it with some of the nodes down, going this way is 
relatively safe probably. I really do not see what you could lose that matters 
in this table. * Another option, if the schema first change was accepted, is to 
make the second one, to drop this table. You can always rebuild it in case you 
need it I assume. I really love to let the replace going, but it stops with the 
error: java.lang.IllegalStateException: unable to find sufficient sources for 
streaming range in keyspace system_traces Also i could delete system_traces 
which is empty anyway, but there is a system_auth and system_distributed 
keyspace too and they are not empty, Could i delete them safely too? If i could 
just somehow skip streaming the system keyspaces from node replace phase, the 
option 1 would be great. P.S: Its clear to me that i should use at least RF=3 
in production, but could not manage to acquire enough resources yet (i hope 
would be fixed in recent future) Again Thank you for your time Sent using Zoho 
Mail  On Mon, 10 Sep 2018 16:20:10 +0430 Alain RODRIGUEZ 
 wrote  Hello, I am sorry it took us (the community) 
more than a day to answer to this rather critical situation. That being said, 
my recommendation at this point would be for you to make sure about the impacts 
of whatever you would try. Working on a broken cluster, as an emergency might 
lead you to a second mistake, possibly more destructive than the first one. It 
happened to me and around, for many clusters. Move forward even more carefuly 
in these situations as a global advice. Suddenly i lost all disks of 
cassandar-data on one of my racks With RF=2, I guess operations use LOCAL_ONE 
consistency, thus you should have all the data in the safe rack(s) with your 
configuration, you probably did not lose anything yet and have the service only 
using the nodes up, that got the right data.  tried to replace the nodes with 
same ip using this: 
https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
 As a side note, I would recommend you to use 'replace_address_first_boot' 
instead of 'replace_address'. This does basically the same but will be ignored 
after the first bootstrap. A detail, but hey, it's there and somewhat safer, I 
would use this one. java.lang.IllegalStateException: unable to find sufficient 
sources for streaming range in keyspace system_traces By default, non-user 
keyspace use 'SimpleStrategy' and a small RF. Ideally, this should be changed 
in a production cluster, and you're having an example of why. Now when i 
altered the system_traces keyspace startegy to NetworkTopologyStrategy and RF=2 
but then running nodetool repair failed: Endpoint not alive /IP of dead node 
that i'm trying to replace. Changing the replication strategy you made the dead 
rack owner of part of the token ranges, thus repairs just can't work as there 
will always be one of the nodes involved down as the whole rack is down. Repair 
won't work, but you probably do not need it! 'system_traces' is a temporary / 
debug table. It's probably empty or with irrelevant data. Here are some 
thoughts: * It would be awesome at this point for us (and for you if you did 
not) to see the status of the cluster: ** 'nodetool status' ** 'nodetool 
describecluster' --> This one will tell if the nodes agree on the schema (nodes 
up). I have seen schema changes with nodes down inducing some issues. ** 
Cassandra version ** Number of racks (I assumer #racks >= 2 in this email) 
Option 1: (Change schema and) use replace method (preferred method) * Did you 
try to have the replace going, without any former 

Re: node replacement failed

2018-09-22 Thread onmstester onmstester
I have a cunning plan (Baldrick wise) to solve this problem: stop client 
application run nodetool flush on all nodes to save memtables to disk stop 
cassandra on all of the nodes rename original Cassandra data directory to 
data-old start cassandra on all the nodes to create a fresh cluster including 
the old dead nodes again create the application related keyspaces in cqlsh and 
this time set rf=2 on system keyspaces (to never encounter this problem again!) 
move sstables from data-backup dir to current data dirs and restart cassandra 
or reload sstables Should this work and solve my problem?  On Mon, 10 Sep 
2018 17:12:48 +0430 onmstester onmstester  wrote  
Thanks Alain, First here it is more detail about my cluster: 10 racks + 3 nodes 
on each rack nodetool status: shows 27 nodes UN and 3 nodes all related to 
single rack as DN version 3.11.2 Option 1: (Change schema and) use replace 
method (preferred method) * Did you try to have the replace going, without any 
former repairs, ignoring the fact 'system_traces' might be inconsistent? You 
probably don't care about this table, so if Cassandra allows it with some of 
the nodes down, going this way is relatively safe probably. I really do not see 
what you could lose that matters in this table. * Another option, if the schema 
first change was accepted, is to make the second one, to drop this table. You 
can always rebuild it in case you need it I assume. I really love to let the 
replace going, but it stops with the error: java.lang.IllegalStateException: 
unable to find sufficient sources for streaming range in keyspace system_traces 
Also i could delete system_traces which is empty anyway, but there is a 
system_auth and system_distributed keyspace too and they are not empty, Could i 
delete them safely too? If i could just somehow skip streaming the system 
keyspaces from node replace phase, the option 1 would be great. P.S: Its clear 
to me that i should use at least RF=3 in production, but could not manage to 
acquire enough resources yet (i hope would be fixed in recent future) Again 
Thank you for your time Sent using Zoho Mail  On Mon, 10 Sep 2018 16:20:10 
+0430 Alain RODRIGUEZ  wrote  Hello, I am sorry it took 
us (the community) more than a day to answer to this rather critical situation. 
That being said, my recommendation at this point would be for you to make sure 
about the impacts of whatever you would try. Working on a broken cluster, as an 
emergency might lead you to a second mistake, possibly more destructive than 
the first one. It happened to me and around, for many clusters. Move forward 
even more carefuly in these situations as a global advice. Suddenly i lost all 
disks of cassandar-data on one of my racks With RF=2, I guess operations use 
LOCAL_ONE consistency, thus you should have all the data in the safe rack(s) 
with your configuration, you probably did not lose anything yet and have the 
service only using the nodes up, that got the right data.  tried to replace the 
nodes with same ip using this: 
https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
 As a side note, I would recommend you to use 'replace_address_first_boot' 
instead of 'replace_address'. This does basically the same but will be ignored 
after the first bootstrap. A detail, but hey, it's there and somewhat safer, I 
would use this one. java.lang.IllegalStateException: unable to find sufficient 
sources for streaming range in keyspace system_traces By default, non-user 
keyspace use 'SimpleStrategy' and a small RF. Ideally, this should be changed 
in a production cluster, and you're having an example of why. Now when i 
altered the system_traces keyspace startegy to NetworkTopologyStrategy and RF=2 
but then running nodetool repair failed: Endpoint not alive /IP of dead node 
that i'm trying to replace. Changing the replication strategy you made the dead 
rack owner of part of the token ranges, thus repairs just can't work as there 
will always be one of the nodes involved down as the whole rack is down. Repair 
won't work, but you probably do not need it! 'system_traces' is a temporary / 
debug table. It's probably empty or with irrelevant data. Here are some 
thoughts: * It would be awesome at this point for us (and for you if you did 
not) to see the status of the cluster: ** 'nodetool status' ** 'nodetool 
describecluster' --> This one will tell if the nodes agree on the schema (nodes 
up). I have seen schema changes with nodes down inducing some issues. ** 
Cassandra version ** Number of racks (I assumer #racks >= 2 in this email) 
Option 1: (Change schema and) use replace method (preferred method) * Did you 
try to have the replace going, without any former repairs, ignoring the fact 
'system_traces' might be inconsistent? You probably don't care about this 
table, so if Cassandra allows it with some of the nodes down, going this way is 
relatively safe probably. I really do not see what you could lose 

Re: stuck with num_tokens 256

2018-09-22 Thread onmstester onmstester
If you have problems with balance you can add new nodes using the algorithm and 
it'll balance out the cluster. You probably want to stick to 256 tokens though. 
I read somewhere (don't remember the ref) that all nodes of the cluster should 
use the same algorithm, so if my cluster suffer from imbalanced nodes using 
random algorithm i can not add new nodes that are using Allocation algorithm. 
isn't that correct?

Re: stuck with num_tokens 256

2018-09-22 Thread kurt greaves
>
> But one more question, should i use num_tokens : 8 (i would follow
> datastax recommendation) and allocate_tokens_for_local_replication_factor=3
> (which is max RF among my keyspaces) for new clusters which i'm going to
> setup?

16 is probably where it's at. Test beforehand though.

> Is the Allocation algorithm, now recommended algorithm and mature enough
> to replace the Random algorithm? if its so, it should be the default one at
> 4.0?

Let's leave that discussion to the other thread on the dev list.

On Sat, 22 Sep 2018 at 20:35, onmstester onmstester 
wrote:

> Thanks,
> Because all my clusters are already balanced, i won't change their config
> But one more question, should i use num_tokens : 8 (i would follow
> datastax recommendation) and allocate_tokens_for_local_replication_factor=3
> (which is max RF among my keyspaces) for new clusters which i'm going to
> setup?
> Is the Allocation algorithm, now recommended algorithm and mature enough
> to replace the Random algorithm? if its so, it should be the default one at
> 4.0?
>
>
>  On Sat, 22 Sep 2018 13:41:47 +0330 *kurt greaves
> >* wrote 
>
> If you have problems with balance you can add new nodes using the
> algorithm and it'll balance out the cluster. You probably want to stick to
> 256 tokens though.
> To reduce your # tokens you'll have to do a DC migration (best way). Spin
> up a new DC using the algorithm on the nodes and set a lower number of
> tokens. You'll want to test first but if you create a new keyspace for the
> new DC prior to creation of the new nodes with the desired RF (ie. a
> keyspace just in the "new" DC with your RF) then add your nodes using that
> keyspace for allocation tokens *should* be distributed evenly amongst
> that DC, and when migrate you can decommission the old DC and hopefully end
> up with a balanced cluster.
> Definitely test beforehand though because that was just me theorising...
>
> I'll note though that if your existing clusters don't have any major
> issues it's probably not worth the migration at this point.
>
> On Sat, 22 Sep 2018 at 17:40, onmstester onmstester 
> wrote:
>
>
> I noticed that currently there is a discussion in ML with
> subject: changing default token behavior for 4.0.
> Any recommendation to guys like me who already have multiple clusters ( >
> 30 nodes in each cluster) with random partitioner and num_tokens = 256?
> I should also add some nodes to existing clusters, is it possible
> with num_tokens = 256?
> How could we fix this bug (reduce num_tokens in existent clusters)?
> Cassandra version: 3.11.2
>
> Sent using Zoho Mail 
>
>
>
>


Re: stuck with num_tokens 256

2018-09-22 Thread onmstester onmstester
Thanks, Because all my clusters are already balanced, i won't change their 
config But one more question, should i use num_tokens : 8 (i would follow 
datastax recommendation) and allocate_tokens_for_local_replication_factor=3 
(which is max RF among my keyspaces) for new clusters which i'm going to setup? 
Is the Allocation algorithm, now recommended algorithm and mature enough to 
replace the Random algorithm? if its so, it should be the default one at 4.0? 
 On Sat, 22 Sep 2018 13:41:47 +0330 kurt greaves  
wrote  If you have problems with balance you can add new nodes using the 
algorithm and it'll balance out the cluster. You probably want to stick to 256 
tokens though. To reduce your # tokens you'll have to do a DC migration (best 
way). Spin up a new DC using the algorithm on the nodes and set a lower number 
of tokens. You'll want to test first but if you create a new keyspace for the 
new DC prior to creation of the new nodes with the desired RF (ie. a keyspace 
just in the "new" DC with your RF) then add your nodes using that keyspace for 
allocation tokens should be distributed evenly amongst that DC, and when 
migrate you can decommission the old DC and hopefully end up with a balanced 
cluster. Definitely test beforehand though because that was just me 
theorising... I'll note though that if your existing clusters don't have any 
major issues it's probably not worth the migration at this point. On Sat, 22 
Sep 2018 at 17:40, onmstester onmstester  wrote: I noticed 
that currently there is a discussion in ML with subject: changing default token 
behavior for 4.0. Any recommendation to guys like me who already have multiple 
clusters ( > 30 nodes in each cluster) with random partitioner and num_tokens = 
256? I should also add some nodes to existing clusters, is it possible with 
num_tokens = 256? How could we fix this bug (reduce num_tokens in existent 
clusters)? Cassandra version: 3.11.2 Sent using Zoho Mail

Re: stuck with num_tokens 256

2018-09-22 Thread kurt greaves
If you have problems with balance you can add new nodes using the algorithm
and it'll balance out the cluster. You probably want to stick to 256 tokens
though.
To reduce your # tokens you'll have to do a DC migration (best way). Spin
up a new DC using the algorithm on the nodes and set a lower number of
tokens. You'll want to test first but if you create a new keyspace for the
new DC prior to creation of the new nodes with the desired RF (ie. a
keyspace just in the "new" DC with your RF) then add your nodes using that
keyspace for allocation tokens *should* be distributed evenly amongst that
DC, and when migrate you can decommission the old DC and hopefully end up
with a balanced cluster.
Definitely test beforehand though because that was just me theorising...

I'll note though that if your existing clusters don't have any major issues
it's probably not worth the migration at this point.

On Sat, 22 Sep 2018 at 17:40, onmstester onmstester 
wrote:

> I noticed that currently there is a discussion in ML with
> subject: changing default token behavior for 4.0.
> Any recommendation to guys like me who already have multiple clusters ( >
> 30 nodes in each cluster) with random partitioner and num_tokens = 256?
> I should also add some nodes to existing clusters, is it possible
> with num_tokens = 256?
> How could we fix this bug (reduce num_tokens in existent clusters)?
> Cassandra version: 3.11.2
>
> Sent using Zoho Mail 
>
>
>


stuck with num_tokens 256

2018-09-22 Thread onmstester onmstester
I noticed that currently there is a discussion in ML with subject: changing 
default token behavior for 4.0. Any recommendation to guys like me who already 
have multiple clusters ( > 30 nodes in each cluster) with random partitioner 
and num_tokens = 256? I should also add some nodes to existing clusters, is it 
possible with num_tokens = 256? How could we fix this bug (reduce num_tokens in 
existent clusters)? Cassandra version: 3.11.2 Sent using Zoho Mail