nope my last ideas would be (and I am not sure these are the best....) - try removetoken with -f option. I do not believe it will change anything but... - try nodeltool ring on ALL nodes and check all nodes see the unreachable node. If not, you could maybe juste decommission the one(s) that see the unreachable node. - If you are in test, you can delete the system folder (subfolder of where all your data are saved (data_directory in cassandra.yaml, by default /var/lib/cassandra/data). *but you will lose everything*.... - snapshot data and restore them in another cluster. not that simple depending on data volume, traffic etc....
>From my side, I do not have more ideas...and once again, I am not the sure these ones are the best ;) I do not know if cassandra is able to definitively consider a node as dead after a certain amount of time. On Fri, Jul 27, 2012 at 11:04 AM, Alain RODRIGUEZ <arodr...@gmail.com>wrote: > Hi again, > > Nobody has a clue about this issue ? > > I'm still facing this problem. > > Alain > > 2012/7/23 Alain RODRIGUEZ <arodr...@gmail.com>: > > Does anyone knows how to totally remove a dead node that only appears > > when doing a "describe cluster" from the cli ? > > > > I still got this issue in my production cluster. > > > > Alain > > > > 2012/7/20 Alain RODRIGUEZ <arodr...@gmail.com>: > >> Hi Aaron, > >> > >> I have repaired and cleanup both nodes already and I did it after any > >> change on my ring (It tooks me a while btw :)). > >> > >> The node *.211 is actually out of the ring and out of my control > >> 'cause I don't have the server anymore (EC2 instance terminated a few > >> days ago). > >> > >> Alain > >> > >> 2012/7/20 aaron morton <aa...@thelastpickle.com>: > >>> I would: > >>> > >>> * run repair on 10.58.83.109 > >>> * run cleanup on 10.59.21.241 (I assume this was the first node). > >>> > >>> It looks like 0.56.62.211 is out of the cluster. > >>> > >>> Cheers > >>> > >>> ----------------- > >>> Aaron Morton > >>> Freelance Developer > >>> @aaronmorton > >>> http://www.thelastpickle.com > >>> > >>> On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote: > >>> > >>> Not sure if this may help : > >>> > >>> nodetool -h localhost gossipinfo > >>> /10.58.83.109 > >>> RELEASE_VERSION:1.1.2 > >>> RACK:1b > >>> LOAD:5.9384978406E10 > >>> SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 > >>> DC:eu-west > >>> STATUS:NORMAL,85070591730234615865843651857942052864 > >>> RPC_ADDRESS:0.0.0.0 > >>> /10.248.10.94 > >>> RELEASE_VERSION:1.1.2 > >>> LOAD:3.0128207422E10 > >>> SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 > >>> STATUS:LEFT,0,1342866804032 > >>> RPC_ADDRESS:0.0.0.0 > >>> /10.56.62.211 > >>> RELEASE_VERSION:1.1.2 > >>> LOAD:11594.0 > >>> RACK:1b > >>> SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f > >>> DC:eu-west > >>> REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864 > >>> STATUS:removed,170141183460469231731687303715884105727,1342453967415 > >>> RPC_ADDRESS:0.0.0.0 > >>> /10.59.21.241 > >>> RELEASE_VERSION:1.1.2 > >>> RACK:1b > >>> LOAD:1.08667047094E11 > >>> SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 > >>> DC:eu-west > >>> STATUS:NORMAL,0 > >>> RPC_ADDRESS:0.0.0.0 > >>> > >>> Story : > >>> > >>> I had 2 node cluster > >>> > >>> 10.248.10.94 Token 0 > >>> 10.59.21.241 Token 85070591730234615865843651857942052864 > >>> > >>> Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1 > >>> (170141183460469231731687303715884105727). This failed, I removed > >>> token. > >>> > >>> I repeat the previous operation with the node 10.59.21.241 and it went > >>> fine. Next I decommissionned the node 10.248.10.94 and moved > >>> 10.59.21.241 to the token 0. > >>> > >>> Now I am on the situation described before. > >>> > >>> Alain > >>> > >>> > >>> 2012/7/19 Alain RODRIGUEZ <arodr...@gmail.com>: > >>> > >>> Hi, I wasn't able to see the token used currently by the 10.56.62.211 > >>> > >>> (ghost node). > >>> > >>> > >>> I already removed the token 6 days ago : > >>> > >>> > >>> -> "Removing token 170141183460469231731687303715884105727 for > >>> /10.56.62.211" > >>> > >>> > >>> "- check in cassandra log. It is possible you see a log line telling > >>> > >>> you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same > >>> > >>> token" > >>> > >>> > >>> Nothing like that in the logs > >>> > >>> > >>> I tried the following without success : > >>> > >>> > >>> $ nodetool -h localhost removetoken > 170141183460469231731687303715884105727 > >>> > >>> Exception in thread "main" java.lang.UnsupportedOperationException: > >>> > >>> Token not found. > >>> > >>> ... > >>> > >>> > >>> I really thought this was going to work :-). > >>> > >>> > >>> Any other ideas ? > >>> > >>> > >>> Alain > >>> > >>> > >>> PS : I heard that Octo is a nice company and you use Cassandra so I > >>> > >>> guess you're fine in there :-). I wish you the best thanks for your > >>> > >>> help. > >>> > >>> > >>> 2012/7/19 Olivier Mallassi <omalla...@octo.com>: > >>> > >>> I got that a couple of time (due to DNS issues in our infra) > >>> > >>> > >>> what you could try > >>> > >>> - check in cassandra log. It is possible you see a log line telling you > >>> > >>> 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token > >>> > >>> - if 10.56.62.211 is up, try decommission (via nodetool) > >>> > >>> - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1 > >>> > >>> - use removetoken (via nodetool) to remove the token associated with > >>> > >>> 10.56.62.211. in case of failure, you can use removetoken -f instead. > >>> > >>> > >>> then, the unreachable IP should have disappeared. > >>> > >>> > >>> > >>> HTH > >>> > >>> > >>> On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ <arodr...@gmail.com> > >>> > >>> wrote: > >>> > >>> > >>> Hi, > >>> > >>> > >>> I tried to add a node a few days ago and it failed. I finally made it > >>> > >>> work with an other node but now when I describe cluster on cli I got > >>> > >>> this : > >>> > >>> > >>> Cluster Information: > >>> > >>> Snitch: org.apache.cassandra.locator.Ec2Snitch > >>> > >>> Partitioner: org.apache.cassandra.dht.RandomPartitioner > >>> > >>> Schema versions: > >>> > >>> UNREACHABLE: [10.56.62.211] > >>> > >>> e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109] > >>> > >>> > >>> And nodetool ring gives me : > >>> > >>> > >>> Address DC Rack Status State Load > >>> > >>> Owns Token > >>> > >>> > >>> 85070591730234615865843651857942052864 > >>> > >>> 10.59.21.241 eu-west 1b Up Normal 101.17 GB > >>> > >>> 50.00% 0 > >>> > >>> 10.58.83.109 eu-west 1b Up Normal 55.27 GB > >>> > >>> 50.00% 85070591730234615865843651857942052864 > >>> > >>> > >>> The point, as you can see, is that one of my node has twice the > >>> > >>> information of the second one. I have a RF = 2 defined. > >>> > >>> > >>> My guess is that the token 0 node keep data for the unreachable node. > >>> > >>> > >>> The IP of the unreachable node doesn't belong to me anymore, I have no > >>> > >>> access to this ghost node. > >>> > >>> > >>> Does someone know how to completely remove this ghost node from my > cluster > >>> > >>> ? > >>> > >>> > >>> Thank you. > >>> > >>> > >>> Alain > >>> > >>> > >>> INFO : > >>> > >>> > >>> On ubuntu (AMI Datastax 2.1 and 2.2) > >>> > >>> Cassandra 1.1.2 (upgraded from 1.0.9) > >>> > >>> 2 node cluster (+ the ghost one) > >>> > >>> RF = 2 > >>> > >>> > >>> > >>> > >>> > >>> -- > >>> > >>> ............................................................ > >>> > >>> Olivier Mallassi > >>> > >>> OCTO Technology > >>> > >>> ............................................................ > >>> > >>> 50, Avenue des Champs-Elysées > >>> > >>> 75008 Paris > >>> > >>> > >>> Mobile: (33) 6 28 70 26 61 > >>> > >>> Tél: (33) 1 58 56 10 00 > >>> > >>> Fax: (33) 1 58 56 10 01 > >>> > >>> > >>> http://www.octo.com > >>> > >>> Octo Talks! http://blog.octo.com > >>> > >>> > >>> > >>> > -- ............................................................ Olivier Mallassi OCTO Technology ............................................................ 50, Avenue des Champs-Elysées 75008 Paris Mobile: (33) 6 28 70 26 61 Tél: (33) 1 58 56 10 00 Fax: (33) 1 58 56 10 01 http://www.octo.com Octo Talks! http://blog.octo.com