Re: Unreachable node, not in nodetool ring

Olivier Mallassi Fri, 27 Jul 2012 02:36:40 -0700

nope
my last ideas would be (and I am not sure these are the best....)
- try removetoken with -f option. I do not believe it will change anything
but...
- try nodeltool ring on ALL nodes and check all nodes see the unreachable
node. If not, you could maybe juste decommission the one(s) that see the
unreachable node.
- If you are in test, you can delete the system folder (subfolder of where
all your data are saved (data_directory in cassandra.yaml, by default
/var/lib/cassandra/data).
*but you will lose everything*....
- snapshot data and restore them in another cluster. not that simple
depending on data volume, traffic etc....


>From my side, I do not have more ideas...and once again, I am not the sure
these ones are the best ;)

I do not know if cassandra is able to definitively consider a node as dead
after a certain amount of time.


On Fri, Jul 27, 2012 at 11:04 AM, Alain RODRIGUEZ <arodr...@gmail.com>wrote:

> Hi again,
>
> Nobody has a clue about this issue ?
>
> I'm still facing this problem.
>
> Alain
>
> 2012/7/23 Alain RODRIGUEZ <arodr...@gmail.com>:
> > Does anyone knows how to totally remove a dead node that only appears
> > when doing a "describe cluster" from the cli ?
> >
> > I still got this issue in my production cluster.
> >
> > Alain
> >
> > 2012/7/20 Alain RODRIGUEZ <arodr...@gmail.com>:
> >> Hi Aaron,
> >>
> >> I have repaired and cleanup both nodes already and I did it after any
> >> change on my ring (It tooks me a while btw :)).
> >>
> >> The node *.211 is actually out of the ring and out of my control
> >> 'cause I don't have the server anymore (EC2 instance terminated a few
> >> days ago).
> >>
> >> Alain
> >>
> >> 2012/7/20 aaron morton <aa...@thelastpickle.com>:
> >>> I would:
> >>>
> >>> * run repair on 10.58.83.109
> >>> * run cleanup on 10.59.21.241 (I assume this was the first node).
> >>>
> >>> It looks like 0.56.62.211 is out of the cluster.
> >>>
> >>> Cheers
> >>>
> >>> -----------------
> >>> Aaron Morton
> >>> Freelance Developer
> >>> @aaronmorton
> >>> http://www.thelastpickle.com
> >>>
> >>> On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote:
> >>>
> >>> Not sure if this may help :
> >>>
> >>> nodetool -h localhost gossipinfo
> >>> /10.58.83.109
> >>>  RELEASE_VERSION:1.1.2
> >>>  RACK:1b
> >>>  LOAD:5.9384978406E10
> >>>  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
> >>>  DC:eu-west
> >>>  STATUS:NORMAL,85070591730234615865843651857942052864
> >>>  RPC_ADDRESS:0.0.0.0
> >>> /10.248.10.94
> >>>  RELEASE_VERSION:1.1.2
> >>>  LOAD:3.0128207422E10
> >>>  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
> >>>  STATUS:LEFT,0,1342866804032
> >>>  RPC_ADDRESS:0.0.0.0
> >>> /10.56.62.211
> >>>  RELEASE_VERSION:1.1.2
> >>>  LOAD:11594.0
> >>>  RACK:1b
> >>>  SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
> >>>  DC:eu-west
> >>>  REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864
> >>>  STATUS:removed,170141183460469231731687303715884105727,1342453967415
> >>>  RPC_ADDRESS:0.0.0.0
> >>> /10.59.21.241
> >>>  RELEASE_VERSION:1.1.2
> >>>  RACK:1b
> >>>  LOAD:1.08667047094E11
> >>>  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
> >>>  DC:eu-west
> >>>  STATUS:NORMAL,0
> >>>  RPC_ADDRESS:0.0.0.0
> >>>
> >>> Story :
> >>>
> >>> I had 2 node cluster
> >>>
> >>> 10.248.10.94 Token 0
> >>> 10.59.21.241 Token 85070591730234615865843651857942052864
> >>>
> >>> Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1
> >>> (170141183460469231731687303715884105727). This failed, I removed
> >>> token.
> >>>
> >>> I repeat the previous operation with the node 10.59.21.241 and it went
> >>> fine. Next I decommissionned the node 10.248.10.94 and moved
> >>> 10.59.21.241 to the token 0.
> >>>
> >>> Now I am on the situation described before.
> >>>
> >>> Alain
> >>>
> >>>
> >>> 2012/7/19 Alain RODRIGUEZ <arodr...@gmail.com>:
> >>>
> >>> Hi, I wasn't able to see the token used currently by the 10.56.62.211
> >>>
> >>> (ghost node).
> >>>
> >>>
> >>> I already removed the token 6 days ago :
> >>>
> >>>
> >>> -> "Removing token 170141183460469231731687303715884105727 for
> >>> /10.56.62.211"
> >>>
> >>>
> >>> "- check in cassandra log. It is possible you see a log line telling
> >>>
> >>> you 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same
> >>>
> >>> token"
> >>>
> >>>
> >>> Nothing like that in the logs
> >>>
> >>>
> >>> I tried the following without success :
> >>>
> >>>
> >>> $ nodetool -h localhost removetoken
> 170141183460469231731687303715884105727
> >>>
> >>> Exception in thread "main" java.lang.UnsupportedOperationException:
> >>>
> >>> Token not found.
> >>>
> >>> ...
> >>>
> >>>
> >>> I really thought this was going to work :-).
> >>>
> >>>
> >>> Any other ideas ?
> >>>
> >>>
> >>> Alain
> >>>
> >>>
> >>> PS : I heard that Octo is a nice company and you use Cassandra so I
> >>>
> >>> guess you're fine in there :-). I wish you the best thanks for your
> >>>
> >>> help.
> >>>
> >>>
> >>> 2012/7/19 Olivier Mallassi <omalla...@octo.com>:
> >>>
> >>> I got that a couple of time (due to DNS issues in our infra)
> >>>
> >>>
> >>> what you could try
> >>>
> >>> - check in cassandra log. It is possible you see a log line telling you
> >>>
> >>> 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same token
> >>>
> >>> - if 10.56.62.211 is up, try decommission (via nodetool)
> >>>
> >>> - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1
> >>>
> >>> - use removetoken (via nodetool) to remove the token associated with
> >>>
> >>> 10.56.62.211. in case of failure, you can use removetoken -f instead.
> >>>
> >>>
> >>> then, the unreachable IP should have disappeared.
> >>>
> >>>
> >>>
> >>> HTH
> >>>
> >>>
> >>> On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ <arodr...@gmail.com>
> >>>
> >>> wrote:
> >>>
> >>>
> >>> Hi,
> >>>
> >>>
> >>> I tried to add a node a few days ago and it failed. I finally made it
> >>>
> >>> work with an other node but now when I describe cluster on cli I got
> >>>
> >>> this :
> >>>
> >>>
> >>> Cluster Information:
> >>>
> >>>   Snitch: org.apache.cassandra.locator.Ec2Snitch
> >>>
> >>>   Partitioner: org.apache.cassandra.dht.RandomPartitioner
> >>>
> >>>   Schema versions:
> >>>
> >>>      UNREACHABLE: [10.56.62.211]
> >>>
> >>>      e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109]
> >>>
> >>>
> >>> And nodetool ring gives me :
> >>>
> >>>
> >>> Address         DC          Rack        Status State   Load
> >>>
> >>> Owns                Token
> >>>
> >>>
> >>>                    85070591730234615865843651857942052864
> >>>
> >>> 10.59.21.241    eu-west     1b          Up     Normal  101.17 GB
> >>>
> >>> 50.00%              0
> >>>
> >>> 10.58.83.109    eu-west     1b          Up     Normal  55.27 GB
> >>>
> >>> 50.00%              85070591730234615865843651857942052864
> >>>
> >>>
> >>> The point, as you can see, is that one of my node has twice the
> >>>
> >>> information of the second one. I have a RF = 2 defined.
> >>>
> >>>
> >>> My guess is that the token 0 node keep data for the unreachable node.
> >>>
> >>>
> >>> The IP of the unreachable node doesn't belong to me anymore, I have no
> >>>
> >>> access to this ghost node.
> >>>
> >>>
> >>> Does someone know how to completely remove this ghost node from my
> cluster
> >>>
> >>> ?
> >>>
> >>>
> >>> Thank you.
> >>>
> >>>
> >>> Alain
> >>>
> >>>
> >>> INFO :
> >>>
> >>>
> >>> On ubuntu (AMI Datastax 2.1 and 2.2)
> >>>
> >>> Cassandra 1.1.2 (upgraded from 1.0.9)
> >>>
> >>> 2 node cluster (+ the ghost one)
> >>>
> >>> RF = 2
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> ............................................................
> >>>
> >>> Olivier Mallassi
> >>>
> >>> OCTO Technology
> >>>
> >>> ............................................................
> >>>
> >>> 50, Avenue des Champs-Elysées
> >>>
> >>> 75008 Paris
> >>>
> >>>
> >>> Mobile: (33) 6 28 70 26 61
> >>>
> >>> Tél: (33) 1 58 56 10 00
> >>>
> >>> Fax: (33) 1 58 56 10 01
> >>>
> >>>
> >>> http://www.octo.com
> >>>
> >>> Octo Talks! http://blog.octo.com
> >>>
> >>>
> >>>
> >>>
>



-- 
............................................................
Olivier Mallassi
OCTO Technology
............................................................
50, Avenue des Champs-Elysées
75008 Paris

Mobile: (33) 6 28 70 26 61
Tél: (33) 1 58 56 10 00
Fax: (33) 1 58 56 10 01

http://www.octo.com
Octo Talks! http://blog.octo.com

Re: Unreachable node, not in nodetool ring

Reply via email to