Re: Unreachable node, not in nodetool ring
Hi, I finally successfully removed the ghost node using unsafeAssassinateEndpoint() as described there : http://tumblr.doki-pen.org/post/22654515359/assinating-cassandra-nodes, I hope this can help more people. Nodetool gossipinfo gives me now the following info for the ghost node : /10.56.62.211 RELEASE_VERSION:1.1.2 RPC_ADDRESS:0.0.0.0 REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864 SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f STATUS:LEFT,42529904547457370790386101505459979624,1344611213445 LOAD:11594.0 DC:eu-west RACK:1b Instead of : /10.56.62.211 RELEASE_VERSION:1.1.2 LOAD:11594.0 RACK:1b SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f DC:eu-west REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864 STATUS:removed,170141183460469231731687303715884105727,1342453967415 RPC_ADDRESS:0.0.0.0 Cassandra-cli describe cluster now don't show me any unreachable node. The only issue that remains is that my nodes aren't well load balanced yet... After repairing, cleaning up, restarting all nodes I still have the following ring : Address DC RackStatus State Load OwnsToken 85070591730234615865843651857942052864 10.59.21.241eu-west 1b Up Normal 103.19 GB 50.00% 0 10.58.83.109eu-west 1b Up Normal 62.62 GB 50.00% 85070591730234615865843651857942052864 Any idea on why I can't get the load well balanced in this cluster ? Alain
Re: Unreachable node, not in nodetool ring
Hi again, Nobody has a clue about this issue ? I'm still facing this problem. Alain 2012/7/23 Alain RODRIGUEZ arodr...@gmail.com: Does anyone knows how to totally remove a dead node that only appears when doing a describe cluster from the cli ? I still got this issue in my production cluster. Alain 2012/7/20 Alain RODRIGUEZ arodr...@gmail.com: Hi Aaron, I have repaired and cleanup both nodes already and I did it after any change on my ring (It tooks me a while btw :)). The node *.211 is actually out of the ring and out of my control 'cause I don't have the server anymore (EC2 instance terminated a few days ago). Alain 2012/7/20 aaron morton aa...@thelastpickle.com: I would: * run repair on 10.58.83.109 * run cleanup on 10.59.21.241 (I assume this was the first node). It looks like 0.56.62.211 is out of the cluster. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote: Not sure if this may help : nodetool -h localhost gossipinfo /10.58.83.109 RELEASE_VERSION:1.1.2 RACK:1b LOAD:5.9384978406E10 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 DC:eu-west STATUS:NORMAL,85070591730234615865843651857942052864 RPC_ADDRESS:0.0.0.0 /10.248.10.94 RELEASE_VERSION:1.1.2 LOAD:3.0128207422E10 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 STATUS:LEFT,0,1342866804032 RPC_ADDRESS:0.0.0.0 /10.56.62.211 RELEASE_VERSION:1.1.2 LOAD:11594.0 RACK:1b SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f DC:eu-west REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864 STATUS:removed,170141183460469231731687303715884105727,1342453967415 RPC_ADDRESS:0.0.0.0 /10.59.21.241 RELEASE_VERSION:1.1.2 RACK:1b LOAD:1.08667047094E11 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 DC:eu-west STATUS:NORMAL,0 RPC_ADDRESS:0.0.0.0 Story : I had 2 node cluster 10.248.10.94 Token 0 10.59.21.241 Token 85070591730234615865843651857942052864 Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1 (170141183460469231731687303715884105727). This failed, I removed token. I repeat the previous operation with the node 10.59.21.241 and it went fine. Next I decommissionned the node 10.248.10.94 and moved 10.59.21.241 to the token 0. Now I am on the situation described before. Alain 2012/7/19 Alain RODRIGUEZ arodr...@gmail.com: Hi, I wasn't able to see the token used currently by the 10.56.62.211 (ghost node). I already removed the token 6 days ago : - Removing token 170141183460469231731687303715884105727 for /10.56.62.211 - check in cassandra log. It is possible you see a log line telling you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token Nothing like that in the logs I tried the following without success : $ nodetool -h localhost removetoken 170141183460469231731687303715884105727 Exception in thread main java.lang.UnsupportedOperationException: Token not found. ... I really thought this was going to work :-). Any other ideas ? Alain PS : I heard that Octo is a nice company and you use Cassandra so I guess you're fine in there :-). I wish you the best thanks for your help. 2012/7/19 Olivier Mallassi omalla...@octo.com: I got that a couple of time (due to DNS issues in our infra) what you could try - check in cassandra log. It is possible you see a log line telling you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token - if 10.56.62.211 is up, try decommission (via nodetool) - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1 - use removetoken (via nodetool) to remove the token associated with 10.56.62.211. in case of failure, you can use removetoken -f instead. then, the unreachable IP should have disappeared. HTH On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi, I tried to add a node a few days ago and it failed. I finally made it work with an other node but now when I describe cluster on cli I got this : Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: UNREACHABLE: [10.56.62.211] e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109] And nodetool ring gives me : Address DC RackStatus State Load OwnsToken 85070591730234615865843651857942052864 10.59.21.241eu-west 1b Up Normal 101.17 GB 50.00% 0 10.58.83.109eu-west 1b Up Normal 55.27 GB 50.00% 85070591730234615865843651857942052864 The point, as you can see, is that one of my node has twice the information of the second one. I have a RF = 2 defined. My guess is that the token 0 node keep data for the unreachable node. The IP of
Re: Unreachable node, not in nodetool ring
nope my last ideas would be (and I am not sure these are the best) - try removetoken with -f option. I do not believe it will change anything but... - try nodeltool ring on ALL nodes and check all nodes see the unreachable node. If not, you could maybe juste decommission the one(s) that see the unreachable node. - If you are in test, you can delete the system folder (subfolder of where all your data are saved (data_directory in cassandra.yaml, by default /var/lib/cassandra/data). *but you will lose everything* - snapshot data and restore them in another cluster. not that simple depending on data volume, traffic etc From my side, I do not have more ideas...and once again, I am not the sure these ones are the best ;) I do not know if cassandra is able to definitively consider a node as dead after a certain amount of time. On Fri, Jul 27, 2012 at 11:04 AM, Alain RODRIGUEZ arodr...@gmail.comwrote: Hi again, Nobody has a clue about this issue ? I'm still facing this problem. Alain 2012/7/23 Alain RODRIGUEZ arodr...@gmail.com: Does anyone knows how to totally remove a dead node that only appears when doing a describe cluster from the cli ? I still got this issue in my production cluster. Alain 2012/7/20 Alain RODRIGUEZ arodr...@gmail.com: Hi Aaron, I have repaired and cleanup both nodes already and I did it after any change on my ring (It tooks me a while btw :)). The node *.211 is actually out of the ring and out of my control 'cause I don't have the server anymore (EC2 instance terminated a few days ago). Alain 2012/7/20 aaron morton aa...@thelastpickle.com: I would: * run repair on 10.58.83.109 * run cleanup on 10.59.21.241 (I assume this was the first node). It looks like 0.56.62.211 is out of the cluster. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote: Not sure if this may help : nodetool -h localhost gossipinfo /10.58.83.109 RELEASE_VERSION:1.1.2 RACK:1b LOAD:5.9384978406E10 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 DC:eu-west STATUS:NORMAL,85070591730234615865843651857942052864 RPC_ADDRESS:0.0.0.0 /10.248.10.94 RELEASE_VERSION:1.1.2 LOAD:3.0128207422E10 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 STATUS:LEFT,0,1342866804032 RPC_ADDRESS:0.0.0.0 /10.56.62.211 RELEASE_VERSION:1.1.2 LOAD:11594.0 RACK:1b SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f DC:eu-west REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864 STATUS:removed,170141183460469231731687303715884105727,1342453967415 RPC_ADDRESS:0.0.0.0 /10.59.21.241 RELEASE_VERSION:1.1.2 RACK:1b LOAD:1.08667047094E11 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 DC:eu-west STATUS:NORMAL,0 RPC_ADDRESS:0.0.0.0 Story : I had 2 node cluster 10.248.10.94 Token 0 10.59.21.241 Token 85070591730234615865843651857942052864 Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1 (170141183460469231731687303715884105727). This failed, I removed token. I repeat the previous operation with the node 10.59.21.241 and it went fine. Next I decommissionned the node 10.248.10.94 and moved 10.59.21.241 to the token 0. Now I am on the situation described before. Alain 2012/7/19 Alain RODRIGUEZ arodr...@gmail.com: Hi, I wasn't able to see the token used currently by the 10.56.62.211 (ghost node). I already removed the token 6 days ago : - Removing token 170141183460469231731687303715884105727 for /10.56.62.211 - check in cassandra log. It is possible you see a log line telling you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token Nothing like that in the logs I tried the following without success : $ nodetool -h localhost removetoken 170141183460469231731687303715884105727 Exception in thread main java.lang.UnsupportedOperationException: Token not found. ... I really thought this was going to work :-). Any other ideas ? Alain PS : I heard that Octo is a nice company and you use Cassandra so I guess you're fine in there :-). I wish you the best thanks for your help. 2012/7/19 Olivier Mallassi omalla...@octo.com: I got that a couple of time (due to DNS issues in our infra) what you could try - check in cassandra log. It is possible you see a log line telling you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token - if 10.56.62.211 is up, try decommission (via nodetool) - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1 - use removetoken (via nodetool) to remove the token associated with 10.56.62.211. in case of failure, you can use removetoken -f instead. then, the unreachable IP should have disappeared. HTH On Thu, Jul 19,
Re: Unreachable node, not in nodetool ring
Does anyone knows how to totally remove a dead node that only appears when doing a describe cluster from the cli ? I still got this issue in my production cluster. Alain 2012/7/20 Alain RODRIGUEZ arodr...@gmail.com: Hi Aaron, I have repaired and cleanup both nodes already and I did it after any change on my ring (It tooks me a while btw :)). The node *.211 is actually out of the ring and out of my control 'cause I don't have the server anymore (EC2 instance terminated a few days ago). Alain 2012/7/20 aaron morton aa...@thelastpickle.com: I would: * run repair on 10.58.83.109 * run cleanup on 10.59.21.241 (I assume this was the first node). It looks like 0.56.62.211 is out of the cluster. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote: Not sure if this may help : nodetool -h localhost gossipinfo /10.58.83.109 RELEASE_VERSION:1.1.2 RACK:1b LOAD:5.9384978406E10 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 DC:eu-west STATUS:NORMAL,85070591730234615865843651857942052864 RPC_ADDRESS:0.0.0.0 /10.248.10.94 RELEASE_VERSION:1.1.2 LOAD:3.0128207422E10 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 STATUS:LEFT,0,1342866804032 RPC_ADDRESS:0.0.0.0 /10.56.62.211 RELEASE_VERSION:1.1.2 LOAD:11594.0 RACK:1b SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f DC:eu-west REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864 STATUS:removed,170141183460469231731687303715884105727,1342453967415 RPC_ADDRESS:0.0.0.0 /10.59.21.241 RELEASE_VERSION:1.1.2 RACK:1b LOAD:1.08667047094E11 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 DC:eu-west STATUS:NORMAL,0 RPC_ADDRESS:0.0.0.0 Story : I had 2 node cluster 10.248.10.94 Token 0 10.59.21.241 Token 85070591730234615865843651857942052864 Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1 (170141183460469231731687303715884105727). This failed, I removed token. I repeat the previous operation with the node 10.59.21.241 and it went fine. Next I decommissionned the node 10.248.10.94 and moved 10.59.21.241 to the token 0. Now I am on the situation described before. Alain 2012/7/19 Alain RODRIGUEZ arodr...@gmail.com: Hi, I wasn't able to see the token used currently by the 10.56.62.211 (ghost node). I already removed the token 6 days ago : - Removing token 170141183460469231731687303715884105727 for /10.56.62.211 - check in cassandra log. It is possible you see a log line telling you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token Nothing like that in the logs I tried the following without success : $ nodetool -h localhost removetoken 170141183460469231731687303715884105727 Exception in thread main java.lang.UnsupportedOperationException: Token not found. ... I really thought this was going to work :-). Any other ideas ? Alain PS : I heard that Octo is a nice company and you use Cassandra so I guess you're fine in there :-). I wish you the best thanks for your help. 2012/7/19 Olivier Mallassi omalla...@octo.com: I got that a couple of time (due to DNS issues in our infra) what you could try - check in cassandra log. It is possible you see a log line telling you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token - if 10.56.62.211 is up, try decommission (via nodetool) - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1 - use removetoken (via nodetool) to remove the token associated with 10.56.62.211. in case of failure, you can use removetoken -f instead. then, the unreachable IP should have disappeared. HTH On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi, I tried to add a node a few days ago and it failed. I finally made it work with an other node but now when I describe cluster on cli I got this : Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: UNREACHABLE: [10.56.62.211] e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109] And nodetool ring gives me : Address DC RackStatus State Load OwnsToken 85070591730234615865843651857942052864 10.59.21.241eu-west 1b Up Normal 101.17 GB 50.00% 0 10.58.83.109eu-west 1b Up Normal 55.27 GB 50.00% 85070591730234615865843651857942052864 The point, as you can see, is that one of my node has twice the information of the second one. I have a RF = 2 defined. My guess is that the token 0 node keep data for the unreachable node. The IP of the unreachable node doesn't belong to me anymore, I have no access to this ghost node. Does someone know how to completely remove this
Re: Unreachable node, not in nodetool ring
I would: * run repair on 10.58.83.109 * run cleanup on 10.59.21.241 (I assume this was the first node). It looks like 0.56.62.211 is out of the cluster. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote: Not sure if this may help : nodetool -h localhost gossipinfo /10.58.83.109 RELEASE_VERSION:1.1.2 RACK:1b LOAD:5.9384978406E10 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 DC:eu-west STATUS:NORMAL,85070591730234615865843651857942052864 RPC_ADDRESS:0.0.0.0 /10.248.10.94 RELEASE_VERSION:1.1.2 LOAD:3.0128207422E10 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 STATUS:LEFT,0,1342866804032 RPC_ADDRESS:0.0.0.0 /10.56.62.211 RELEASE_VERSION:1.1.2 LOAD:11594.0 RACK:1b SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f DC:eu-west REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864 STATUS:removed,170141183460469231731687303715884105727,1342453967415 RPC_ADDRESS:0.0.0.0 /10.59.21.241 RELEASE_VERSION:1.1.2 RACK:1b LOAD:1.08667047094E11 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 DC:eu-west STATUS:NORMAL,0 RPC_ADDRESS:0.0.0.0 Story : I had 2 node cluster 10.248.10.94 Token 0 10.59.21.241 Token 85070591730234615865843651857942052864 Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1 (170141183460469231731687303715884105727). This failed, I removed token. I repeat the previous operation with the node 10.59.21.241 and it went fine. Next I decommissionned the node 10.248.10.94 and moved 10.59.21.241 to the token 0. Now I am on the situation described before. Alain 2012/7/19 Alain RODRIGUEZ arodr...@gmail.com: Hi, I wasn't able to see the token used currently by the 10.56.62.211 (ghost node). I already removed the token 6 days ago : - Removing token 170141183460469231731687303715884105727 for /10.56.62.211 - check in cassandra log. It is possible you see a log line telling you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token Nothing like that in the logs I tried the following without success : $ nodetool -h localhost removetoken 170141183460469231731687303715884105727 Exception in thread main java.lang.UnsupportedOperationException: Token not found. ... I really thought this was going to work :-). Any other ideas ? Alain PS : I heard that Octo is a nice company and you use Cassandra so I guess you're fine in there :-). I wish you the best thanks for your help. 2012/7/19 Olivier Mallassi omalla...@octo.com: I got that a couple of time (due to DNS issues in our infra) what you could try - check in cassandra log. It is possible you see a log line telling you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token - if 10.56.62.211 is up, try decommission (via nodetool) - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1 - use removetoken (via nodetool) to remove the token associated with 10.56.62.211. in case of failure, you can use removetoken -f instead. then, the unreachable IP should have disappeared. HTH On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi, I tried to add a node a few days ago and it failed. I finally made it work with an other node but now when I describe cluster on cli I got this : Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: UNREACHABLE: [10.56.62.211] e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109] And nodetool ring gives me : Address DC RackStatus State Load OwnsToken 85070591730234615865843651857942052864 10.59.21.241eu-west 1b Up Normal 101.17 GB 50.00% 0 10.58.83.109eu-west 1b Up Normal 55.27 GB 50.00% 85070591730234615865843651857942052864 The point, as you can see, is that one of my node has twice the information of the second one. I have a RF = 2 defined. My guess is that the token 0 node keep data for the unreachable node. The IP of the unreachable node doesn't belong to me anymore, I have no access to this ghost node. Does someone know how to completely remove this ghost node from my cluster ? Thank you. Alain INFO : On ubuntu (AMI Datastax 2.1 and 2.2) Cassandra 1.1.2 (upgraded from 1.0.9) 2 node cluster (+ the ghost one) RF = 2 -- Olivier Mallassi OCTO Technology 50, Avenue des Champs-Elysées 75008 Paris Mobile: (33) 6 28 70 26 61 Tél: (33) 1 58 56 10 00 Fax: (33) 1 58 56 10 01 http://www.octo.com Octo Talks! http://blog.octo.com
Re: Unreachable node, not in nodetool ring
Hi Aaron, I have repaired and cleanup both nodes already and I did it after any change on my ring (It tooks me a while btw :)). The node *.211 is actually out of the ring and out of my control 'cause I don't have the server anymore (EC2 instance terminated a few days ago). Alain 2012/7/20 aaron morton aa...@thelastpickle.com: I would: * run repair on 10.58.83.109 * run cleanup on 10.59.21.241 (I assume this was the first node). It looks like 0.56.62.211 is out of the cluster. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote: Not sure if this may help : nodetool -h localhost gossipinfo /10.58.83.109 RELEASE_VERSION:1.1.2 RACK:1b LOAD:5.9384978406E10 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 DC:eu-west STATUS:NORMAL,85070591730234615865843651857942052864 RPC_ADDRESS:0.0.0.0 /10.248.10.94 RELEASE_VERSION:1.1.2 LOAD:3.0128207422E10 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 STATUS:LEFT,0,1342866804032 RPC_ADDRESS:0.0.0.0 /10.56.62.211 RELEASE_VERSION:1.1.2 LOAD:11594.0 RACK:1b SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f DC:eu-west REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864 STATUS:removed,170141183460469231731687303715884105727,1342453967415 RPC_ADDRESS:0.0.0.0 /10.59.21.241 RELEASE_VERSION:1.1.2 RACK:1b LOAD:1.08667047094E11 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 DC:eu-west STATUS:NORMAL,0 RPC_ADDRESS:0.0.0.0 Story : I had 2 node cluster 10.248.10.94 Token 0 10.59.21.241 Token 85070591730234615865843651857942052864 Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1 (170141183460469231731687303715884105727). This failed, I removed token. I repeat the previous operation with the node 10.59.21.241 and it went fine. Next I decommissionned the node 10.248.10.94 and moved 10.59.21.241 to the token 0. Now I am on the situation described before. Alain 2012/7/19 Alain RODRIGUEZ arodr...@gmail.com: Hi, I wasn't able to see the token used currently by the 10.56.62.211 (ghost node). I already removed the token 6 days ago : - Removing token 170141183460469231731687303715884105727 for /10.56.62.211 - check in cassandra log. It is possible you see a log line telling you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token Nothing like that in the logs I tried the following without success : $ nodetool -h localhost removetoken 170141183460469231731687303715884105727 Exception in thread main java.lang.UnsupportedOperationException: Token not found. ... I really thought this was going to work :-). Any other ideas ? Alain PS : I heard that Octo is a nice company and you use Cassandra so I guess you're fine in there :-). I wish you the best thanks for your help. 2012/7/19 Olivier Mallassi omalla...@octo.com: I got that a couple of time (due to DNS issues in our infra) what you could try - check in cassandra log. It is possible you see a log line telling you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token - if 10.56.62.211 is up, try decommission (via nodetool) - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1 - use removetoken (via nodetool) to remove the token associated with 10.56.62.211. in case of failure, you can use removetoken -f instead. then, the unreachable IP should have disappeared. HTH On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi, I tried to add a node a few days ago and it failed. I finally made it work with an other node but now when I describe cluster on cli I got this : Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: UNREACHABLE: [10.56.62.211] e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109] And nodetool ring gives me : Address DC RackStatus State Load OwnsToken 85070591730234615865843651857942052864 10.59.21.241eu-west 1b Up Normal 101.17 GB 50.00% 0 10.58.83.109eu-west 1b Up Normal 55.27 GB 50.00% 85070591730234615865843651857942052864 The point, as you can see, is that one of my node has twice the information of the second one. I have a RF = 2 defined. My guess is that the token 0 node keep data for the unreachable node. The IP of the unreachable node doesn't belong to me anymore, I have no access to this ghost node. Does someone know how to completely remove this ghost node from my cluster ? Thank you. Alain INFO : On ubuntu (AMI Datastax 2.1 and 2.2) Cassandra 1.1.2 (upgraded from 1.0.9) 2 node cluster (+ the ghost one) RF = 2 --
Re: Unreachable node, not in nodetool ring
I got that a couple of time (due to DNS issues in our infra) what you could try - check in cassandra log. It is possible you see a log line telling you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token - if 10.56.62.211 is up, try decommission (via nodetool) - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1 - use removetoken (via nodetool) to remove the token associated with 10.56.62.211. in case of failure, you can use removetoken -f instead. then, the unreachable IP should have disappeared. HTH On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ arodr...@gmail.comwrote: Hi, I tried to add a node a few days ago and it failed. I finally made it work with an other node but now when I describe cluster on cli I got this : Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: UNREACHABLE: [10.56.62.211] e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109] And nodetool ring gives me : Address DC RackStatus State Load OwnsToken 85070591730234615865843651857942052864 10.59.21.241eu-west 1b Up Normal 101.17 GB 50.00% 0 10.58.83.109eu-west 1b Up Normal 55.27 GB 50.00% 85070591730234615865843651857942052864 The point, as you can see, is that one of my node has twice the information of the second one. I have a RF = 2 defined. My guess is that the token 0 node keep data for the unreachable node. The IP of the unreachable node doesn't belong to me anymore, I have no access to this ghost node. Does someone know how to completely remove this ghost node from my cluster ? Thank you. Alain INFO : On ubuntu (AMI Datastax 2.1 and 2.2) Cassandra 1.1.2 (upgraded from 1.0.9) 2 node cluster (+ the ghost one) RF = 2 -- Olivier Mallassi OCTO Technology 50, Avenue des Champs-Elysées 75008 Paris Mobile: (33) 6 28 70 26 61 Tél: (33) 1 58 56 10 00 Fax: (33) 1 58 56 10 01 http://www.octo.com Octo Talks! http://blog.octo.com
Re: Unreachable node, not in nodetool ring
Hi, I wasn't able to see the token used currently by the 10.56.62.211 (ghost node). I already removed the token 6 days ago : - Removing token 170141183460469231731687303715884105727 for /10.56.62.211 - check in cassandra log. It is possible you see a log line telling you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token Nothing like that in the logs I tried the following without success : $ nodetool -h localhost removetoken 170141183460469231731687303715884105727 Exception in thread main java.lang.UnsupportedOperationException: Token not found. ... I really thought this was going to work :-). Any other ideas ? Alain PS : I heard that Octo is a nice company and you use Cassandra so I guess you're fine in there :-). I wish you the best thanks for your help. 2012/7/19 Olivier Mallassi omalla...@octo.com: I got that a couple of time (due to DNS issues in our infra) what you could try - check in cassandra log. It is possible you see a log line telling you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token - if 10.56.62.211 is up, try decommission (via nodetool) - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1 - use removetoken (via nodetool) to remove the token associated with 10.56.62.211. in case of failure, you can use removetoken -f instead. then, the unreachable IP should have disappeared. HTH On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi, I tried to add a node a few days ago and it failed. I finally made it work with an other node but now when I describe cluster on cli I got this : Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: UNREACHABLE: [10.56.62.211] e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109] And nodetool ring gives me : Address DC RackStatus State Load OwnsToken 85070591730234615865843651857942052864 10.59.21.241eu-west 1b Up Normal 101.17 GB 50.00% 0 10.58.83.109eu-west 1b Up Normal 55.27 GB 50.00% 85070591730234615865843651857942052864 The point, as you can see, is that one of my node has twice the information of the second one. I have a RF = 2 defined. My guess is that the token 0 node keep data for the unreachable node. The IP of the unreachable node doesn't belong to me anymore, I have no access to this ghost node. Does someone know how to completely remove this ghost node from my cluster ? Thank you. Alain INFO : On ubuntu (AMI Datastax 2.1 and 2.2) Cassandra 1.1.2 (upgraded from 1.0.9) 2 node cluster (+ the ghost one) RF = 2 -- Olivier Mallassi OCTO Technology 50, Avenue des Champs-Elysées 75008 Paris Mobile: (33) 6 28 70 26 61 Tél: (33) 1 58 56 10 00 Fax: (33) 1 58 56 10 01 http://www.octo.com Octo Talks! http://blog.octo.com
Re: Unreachable node, not in nodetool ring
Not sure if this may help : nodetool -h localhost gossipinfo /10.58.83.109 RELEASE_VERSION:1.1.2 RACK:1b LOAD:5.9384978406E10 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 DC:eu-west STATUS:NORMAL,85070591730234615865843651857942052864 RPC_ADDRESS:0.0.0.0 /10.248.10.94 RELEASE_VERSION:1.1.2 LOAD:3.0128207422E10 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 STATUS:LEFT,0,1342866804032 RPC_ADDRESS:0.0.0.0 /10.56.62.211 RELEASE_VERSION:1.1.2 LOAD:11594.0 RACK:1b SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f DC:eu-west REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864 STATUS:removed,170141183460469231731687303715884105727,1342453967415 RPC_ADDRESS:0.0.0.0 /10.59.21.241 RELEASE_VERSION:1.1.2 RACK:1b LOAD:1.08667047094E11 SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 DC:eu-west STATUS:NORMAL,0 RPC_ADDRESS:0.0.0.0 Story : I had 2 node cluster 10.248.10.94 Token 0 10.59.21.241 Token 85070591730234615865843651857942052864 Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1 (170141183460469231731687303715884105727). This failed, I removed token. I repeat the previous operation with the node 10.59.21.241 and it went fine. Next I decommissionned the node 10.248.10.94 and moved 10.59.21.241 to the token 0. Now I am on the situation described before. Alain 2012/7/19 Alain RODRIGUEZ arodr...@gmail.com: Hi, I wasn't able to see the token used currently by the 10.56.62.211 (ghost node). I already removed the token 6 days ago : - Removing token 170141183460469231731687303715884105727 for /10.56.62.211 - check in cassandra log. It is possible you see a log line telling you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token Nothing like that in the logs I tried the following without success : $ nodetool -h localhost removetoken 170141183460469231731687303715884105727 Exception in thread main java.lang.UnsupportedOperationException: Token not found. ... I really thought this was going to work :-). Any other ideas ? Alain PS : I heard that Octo is a nice company and you use Cassandra so I guess you're fine in there :-). I wish you the best thanks for your help. 2012/7/19 Olivier Mallassi omalla...@octo.com: I got that a couple of time (due to DNS issues in our infra) what you could try - check in cassandra log. It is possible you see a log line telling you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token - if 10.56.62.211 is up, try decommission (via nodetool) - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1 - use removetoken (via nodetool) to remove the token associated with 10.56.62.211. in case of failure, you can use removetoken -f instead. then, the unreachable IP should have disappeared. HTH On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi, I tried to add a node a few days ago and it failed. I finally made it work with an other node but now when I describe cluster on cli I got this : Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: UNREACHABLE: [10.56.62.211] e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109] And nodetool ring gives me : Address DC RackStatus State Load OwnsToken 85070591730234615865843651857942052864 10.59.21.241eu-west 1b Up Normal 101.17 GB 50.00% 0 10.58.83.109eu-west 1b Up Normal 55.27 GB 50.00% 85070591730234615865843651857942052864 The point, as you can see, is that one of my node has twice the information of the second one. I have a RF = 2 defined. My guess is that the token 0 node keep data for the unreachable node. The IP of the unreachable node doesn't belong to me anymore, I have no access to this ghost node. Does someone know how to completely remove this ghost node from my cluster ? Thank you. Alain INFO : On ubuntu (AMI Datastax 2.1 and 2.2) Cassandra 1.1.2 (upgraded from 1.0.9) 2 node cluster (+ the ghost one) RF = 2 -- Olivier Mallassi OCTO Technology 50, Avenue des Champs-Elysées 75008 Paris Mobile: (33) 6 28 70 26 61 Tél: (33) 1 58 56 10 00 Fax: (33) 1 58 56 10 01 http://www.octo.com Octo Talks! http://blog.octo.com