Re: Unreachable node, not in nodetool ring

2012-08-08 Thread Alain RODRIGUEZ
Hi,

I finally successfully removed the ghost node using
unsafeAssassinateEndpoint() as described there :
http://tumblr.doki-pen.org/post/22654515359/assinating-cassandra-nodes,
I hope this can help more people.

Nodetool gossipinfo gives me now the following info for the ghost node :

/10.56.62.211
  RELEASE_VERSION:1.1.2
  RPC_ADDRESS:0.0.0.0
  REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864
  SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
  STATUS:LEFT,42529904547457370790386101505459979624,1344611213445
  LOAD:11594.0
  DC:eu-west
  RACK:1b

Instead of :

/10.56.62.211
  RELEASE_VERSION:1.1.2
  LOAD:11594.0
  RACK:1b
  SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
  DC:eu-west
  REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864
  STATUS:removed,170141183460469231731687303715884105727,1342453967415
  RPC_ADDRESS:0.0.0.0

Cassandra-cli describe cluster now don't show me any unreachable node.

The only issue that remains is that my nodes aren't well load balanced
yet... After repairing, cleaning up, restarting all nodes I still have
the following ring :

Address DC  RackStatus State   Load
OwnsToken

85070591730234615865843651857942052864
10.59.21.241eu-west 1b  Up Normal  103.19 GB
50.00%  0
10.58.83.109eu-west 1b  Up Normal  62.62 GB
50.00%  85070591730234615865843651857942052864

Any idea on why I can't get the load well balanced in this cluster ?

Alain


Re: Unreachable node, not in nodetool ring

2012-07-27 Thread Alain RODRIGUEZ
Hi again,

Nobody has a clue about this issue ?

I'm still facing this problem.

Alain

2012/7/23 Alain RODRIGUEZ arodr...@gmail.com:
 Does anyone knows how to totally remove a dead node that only appears
 when doing a describe cluster from the cli ?

 I still got this issue in my production cluster.

 Alain

 2012/7/20 Alain RODRIGUEZ arodr...@gmail.com:
 Hi Aaron,

 I have repaired and cleanup both nodes already and I did it after any
 change on my ring (It tooks me a while btw :)).

 The node *.211 is actually out of the ring and out of my control
 'cause I don't have the server anymore (EC2 instance terminated a few
 days ago).

 Alain

 2012/7/20 aaron morton aa...@thelastpickle.com:
 I would:

 * run repair on 10.58.83.109
 * run cleanup on 10.59.21.241 (I assume this was the first node).

 It looks like 0.56.62.211 is out of the cluster.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote:

 Not sure if this may help :

 nodetool -h localhost gossipinfo
 /10.58.83.109
  RELEASE_VERSION:1.1.2
  RACK:1b
  LOAD:5.9384978406E10
  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
  DC:eu-west
  STATUS:NORMAL,85070591730234615865843651857942052864
  RPC_ADDRESS:0.0.0.0
 /10.248.10.94
  RELEASE_VERSION:1.1.2
  LOAD:3.0128207422E10
  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
  STATUS:LEFT,0,1342866804032
  RPC_ADDRESS:0.0.0.0
 /10.56.62.211
  RELEASE_VERSION:1.1.2
  LOAD:11594.0
  RACK:1b
  SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
  DC:eu-west
  REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864
  STATUS:removed,170141183460469231731687303715884105727,1342453967415
  RPC_ADDRESS:0.0.0.0
 /10.59.21.241
  RELEASE_VERSION:1.1.2
  RACK:1b
  LOAD:1.08667047094E11
  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
  DC:eu-west
  STATUS:NORMAL,0
  RPC_ADDRESS:0.0.0.0

 Story :

 I had 2 node cluster

 10.248.10.94 Token 0
 10.59.21.241 Token 85070591730234615865843651857942052864

 Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1
 (170141183460469231731687303715884105727). This failed, I removed
 token.

 I repeat the previous operation with the node 10.59.21.241 and it went
 fine. Next I decommissionned the node 10.248.10.94 and moved
 10.59.21.241 to the token 0.

 Now I am on the situation described before.

 Alain


 2012/7/19 Alain RODRIGUEZ arodr...@gmail.com:

 Hi, I wasn't able to see the token used currently by the 10.56.62.211

 (ghost node).


 I already removed the token 6 days ago :


 - Removing token 170141183460469231731687303715884105727 for
 /10.56.62.211


 - check in cassandra log. It is possible you see a log line telling

 you 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same

 token


 Nothing like that in the logs


 I tried the following without success :


 $ nodetool -h localhost removetoken 170141183460469231731687303715884105727

 Exception in thread main java.lang.UnsupportedOperationException:

 Token not found.

 ...


 I really thought this was going to work :-).


 Any other ideas ?


 Alain


 PS : I heard that Octo is a nice company and you use Cassandra so I

 guess you're fine in there :-). I wish you the best thanks for your

 help.


 2012/7/19 Olivier Mallassi omalla...@octo.com:

 I got that a couple of time (due to DNS issues in our infra)


 what you could try

 - check in cassandra log. It is possible you see a log line telling you

 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same token

 - if 10.56.62.211 is up, try decommission (via nodetool)

 - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1

 - use removetoken (via nodetool) to remove the token associated with

 10.56.62.211. in case of failure, you can use removetoken -f instead.


 then, the unreachable IP should have disappeared.



 HTH


 On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ arodr...@gmail.com

 wrote:


 Hi,


 I tried to add a node a few days ago and it failed. I finally made it

 work with an other node but now when I describe cluster on cli I got

 this :


 Cluster Information:

   Snitch: org.apache.cassandra.locator.Ec2Snitch

   Partitioner: org.apache.cassandra.dht.RandomPartitioner

   Schema versions:

  UNREACHABLE: [10.56.62.211]

  e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109]


 And nodetool ring gives me :


 Address DC  RackStatus State   Load

 OwnsToken


85070591730234615865843651857942052864

 10.59.21.241eu-west 1b  Up Normal  101.17 GB

 50.00%  0

 10.58.83.109eu-west 1b  Up Normal  55.27 GB

 50.00%  85070591730234615865843651857942052864


 The point, as you can see, is that one of my node has twice the

 information of the second one. I have a RF = 2 defined.


 My guess is that the token 0 node keep data for the unreachable node.


 The IP of 

Re: Unreachable node, not in nodetool ring

2012-07-27 Thread Olivier Mallassi
nope
my last ideas would be (and I am not sure these are the best)
- try removetoken with -f option. I do not believe it will change anything
but...
- try nodeltool ring on ALL nodes and check all nodes see the unreachable
node. If not, you could maybe juste decommission the one(s) that see the
unreachable node.
- If you are in test, you can delete the system folder (subfolder of where
all your data are saved (data_directory in cassandra.yaml, by default
/var/lib/cassandra/data).
*but you will lose everything*
- snapshot data and restore them in another cluster. not that simple
depending on data volume, traffic etc

From my side, I do not have more ideas...and once again, I am not the sure
these ones are the best ;)

I do not know if cassandra is able to definitively consider a node as dead
after a certain amount of time.


On Fri, Jul 27, 2012 at 11:04 AM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 Hi again,

 Nobody has a clue about this issue ?

 I'm still facing this problem.

 Alain

 2012/7/23 Alain RODRIGUEZ arodr...@gmail.com:
  Does anyone knows how to totally remove a dead node that only appears
  when doing a describe cluster from the cli ?
 
  I still got this issue in my production cluster.
 
  Alain
 
  2012/7/20 Alain RODRIGUEZ arodr...@gmail.com:
  Hi Aaron,
 
  I have repaired and cleanup both nodes already and I did it after any
  change on my ring (It tooks me a while btw :)).
 
  The node *.211 is actually out of the ring and out of my control
  'cause I don't have the server anymore (EC2 instance terminated a few
  days ago).
 
  Alain
 
  2012/7/20 aaron morton aa...@thelastpickle.com:
  I would:
 
  * run repair on 10.58.83.109
  * run cleanup on 10.59.21.241 (I assume this was the first node).
 
  It looks like 0.56.62.211 is out of the cluster.
 
  Cheers
 
  -
  Aaron Morton
  Freelance Developer
  @aaronmorton
  http://www.thelastpickle.com
 
  On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote:
 
  Not sure if this may help :
 
  nodetool -h localhost gossipinfo
  /10.58.83.109
   RELEASE_VERSION:1.1.2
   RACK:1b
   LOAD:5.9384978406E10
   SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
   DC:eu-west
   STATUS:NORMAL,85070591730234615865843651857942052864
   RPC_ADDRESS:0.0.0.0
  /10.248.10.94
   RELEASE_VERSION:1.1.2
   LOAD:3.0128207422E10
   SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
   STATUS:LEFT,0,1342866804032
   RPC_ADDRESS:0.0.0.0
  /10.56.62.211
   RELEASE_VERSION:1.1.2
   LOAD:11594.0
   RACK:1b
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   DC:eu-west
   REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864
   STATUS:removed,170141183460469231731687303715884105727,1342453967415
   RPC_ADDRESS:0.0.0.0
  /10.59.21.241
   RELEASE_VERSION:1.1.2
   RACK:1b
   LOAD:1.08667047094E11
   SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
   DC:eu-west
   STATUS:NORMAL,0
   RPC_ADDRESS:0.0.0.0
 
  Story :
 
  I had 2 node cluster
 
  10.248.10.94 Token 0
  10.59.21.241 Token 85070591730234615865843651857942052864
 
  Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1
  (170141183460469231731687303715884105727). This failed, I removed
  token.
 
  I repeat the previous operation with the node 10.59.21.241 and it went
  fine. Next I decommissionned the node 10.248.10.94 and moved
  10.59.21.241 to the token 0.
 
  Now I am on the situation described before.
 
  Alain
 
 
  2012/7/19 Alain RODRIGUEZ arodr...@gmail.com:
 
  Hi, I wasn't able to see the token used currently by the 10.56.62.211
 
  (ghost node).
 
 
  I already removed the token 6 days ago :
 
 
  - Removing token 170141183460469231731687303715884105727 for
  /10.56.62.211
 
 
  - check in cassandra log. It is possible you see a log line telling
 
  you 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same
 
  token
 
 
  Nothing like that in the logs
 
 
  I tried the following without success :
 
 
  $ nodetool -h localhost removetoken
 170141183460469231731687303715884105727
 
  Exception in thread main java.lang.UnsupportedOperationException:
 
  Token not found.
 
  ...
 
 
  I really thought this was going to work :-).
 
 
  Any other ideas ?
 
 
  Alain
 
 
  PS : I heard that Octo is a nice company and you use Cassandra so I
 
  guess you're fine in there :-). I wish you the best thanks for your
 
  help.
 
 
  2012/7/19 Olivier Mallassi omalla...@octo.com:
 
  I got that a couple of time (due to DNS issues in our infra)
 
 
  what you could try
 
  - check in cassandra log. It is possible you see a log line telling you
 
  10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same token
 
  - if 10.56.62.211 is up, try decommission (via nodetool)
 
  - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1
 
  - use removetoken (via nodetool) to remove the token associated with
 
  10.56.62.211. in case of failure, you can use removetoken -f instead.
 
 
  then, the unreachable IP should have disappeared.
 
 
 
  HTH
 
 
  On Thu, Jul 19, 

Re: Unreachable node, not in nodetool ring

2012-07-23 Thread Alain RODRIGUEZ
Does anyone knows how to totally remove a dead node that only appears
when doing a describe cluster from the cli ?

I still got this issue in my production cluster.

Alain

2012/7/20 Alain RODRIGUEZ arodr...@gmail.com:
 Hi Aaron,

 I have repaired and cleanup both nodes already and I did it after any
 change on my ring (It tooks me a while btw :)).

 The node *.211 is actually out of the ring and out of my control
 'cause I don't have the server anymore (EC2 instance terminated a few
 days ago).

 Alain

 2012/7/20 aaron morton aa...@thelastpickle.com:
 I would:

 * run repair on 10.58.83.109
 * run cleanup on 10.59.21.241 (I assume this was the first node).

 It looks like 0.56.62.211 is out of the cluster.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote:

 Not sure if this may help :

 nodetool -h localhost gossipinfo
 /10.58.83.109
  RELEASE_VERSION:1.1.2
  RACK:1b
  LOAD:5.9384978406E10
  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
  DC:eu-west
  STATUS:NORMAL,85070591730234615865843651857942052864
  RPC_ADDRESS:0.0.0.0
 /10.248.10.94
  RELEASE_VERSION:1.1.2
  LOAD:3.0128207422E10
  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
  STATUS:LEFT,0,1342866804032
  RPC_ADDRESS:0.0.0.0
 /10.56.62.211
  RELEASE_VERSION:1.1.2
  LOAD:11594.0
  RACK:1b
  SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
  DC:eu-west
  REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864
  STATUS:removed,170141183460469231731687303715884105727,1342453967415
  RPC_ADDRESS:0.0.0.0
 /10.59.21.241
  RELEASE_VERSION:1.1.2
  RACK:1b
  LOAD:1.08667047094E11
  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
  DC:eu-west
  STATUS:NORMAL,0
  RPC_ADDRESS:0.0.0.0

 Story :

 I had 2 node cluster

 10.248.10.94 Token 0
 10.59.21.241 Token 85070591730234615865843651857942052864

 Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1
 (170141183460469231731687303715884105727). This failed, I removed
 token.

 I repeat the previous operation with the node 10.59.21.241 and it went
 fine. Next I decommissionned the node 10.248.10.94 and moved
 10.59.21.241 to the token 0.

 Now I am on the situation described before.

 Alain


 2012/7/19 Alain RODRIGUEZ arodr...@gmail.com:

 Hi, I wasn't able to see the token used currently by the 10.56.62.211

 (ghost node).


 I already removed the token 6 days ago :


 - Removing token 170141183460469231731687303715884105727 for
 /10.56.62.211


 - check in cassandra log. It is possible you see a log line telling

 you 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same

 token


 Nothing like that in the logs


 I tried the following without success :


 $ nodetool -h localhost removetoken 170141183460469231731687303715884105727

 Exception in thread main java.lang.UnsupportedOperationException:

 Token not found.

 ...


 I really thought this was going to work :-).


 Any other ideas ?


 Alain


 PS : I heard that Octo is a nice company and you use Cassandra so I

 guess you're fine in there :-). I wish you the best thanks for your

 help.


 2012/7/19 Olivier Mallassi omalla...@octo.com:

 I got that a couple of time (due to DNS issues in our infra)


 what you could try

 - check in cassandra log. It is possible you see a log line telling you

 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same token

 - if 10.56.62.211 is up, try decommission (via nodetool)

 - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1

 - use removetoken (via nodetool) to remove the token associated with

 10.56.62.211. in case of failure, you can use removetoken -f instead.


 then, the unreachable IP should have disappeared.



 HTH


 On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ arodr...@gmail.com

 wrote:


 Hi,


 I tried to add a node a few days ago and it failed. I finally made it

 work with an other node but now when I describe cluster on cli I got

 this :


 Cluster Information:

   Snitch: org.apache.cassandra.locator.Ec2Snitch

   Partitioner: org.apache.cassandra.dht.RandomPartitioner

   Schema versions:

  UNREACHABLE: [10.56.62.211]

  e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109]


 And nodetool ring gives me :


 Address DC  RackStatus State   Load

 OwnsToken


85070591730234615865843651857942052864

 10.59.21.241eu-west 1b  Up Normal  101.17 GB

 50.00%  0

 10.58.83.109eu-west 1b  Up Normal  55.27 GB

 50.00%  85070591730234615865843651857942052864


 The point, as you can see, is that one of my node has twice the

 information of the second one. I have a RF = 2 defined.


 My guess is that the token 0 node keep data for the unreachable node.


 The IP of the unreachable node doesn't belong to me anymore, I have no

 access to this ghost node.


 Does someone know how to completely remove this 

Re: Unreachable node, not in nodetool ring

2012-07-20 Thread aaron morton
I would:

* run repair on 10.58.83.109
* run cleanup on 10.59.21.241 (I assume this was the first node). 

It looks like 0.56.62.211 is out of the cluster. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote:

 Not sure if this may help :
 
 nodetool -h localhost gossipinfo
 /10.58.83.109
  RELEASE_VERSION:1.1.2
  RACK:1b
  LOAD:5.9384978406E10
  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
  DC:eu-west
  STATUS:NORMAL,85070591730234615865843651857942052864
  RPC_ADDRESS:0.0.0.0
 /10.248.10.94
  RELEASE_VERSION:1.1.2
  LOAD:3.0128207422E10
  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
  STATUS:LEFT,0,1342866804032
  RPC_ADDRESS:0.0.0.0
 /10.56.62.211
  RELEASE_VERSION:1.1.2
  LOAD:11594.0
  RACK:1b
  SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
  DC:eu-west
  REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864
  STATUS:removed,170141183460469231731687303715884105727,1342453967415
  RPC_ADDRESS:0.0.0.0
 /10.59.21.241
  RELEASE_VERSION:1.1.2
  RACK:1b
  LOAD:1.08667047094E11
  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
  DC:eu-west
  STATUS:NORMAL,0
  RPC_ADDRESS:0.0.0.0
 
 Story :
 
 I had 2 node cluster
 
 10.248.10.94 Token 0
 10.59.21.241 Token 85070591730234615865843651857942052864
 
 Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1
 (170141183460469231731687303715884105727). This failed, I removed
 token.
 
 I repeat the previous operation with the node 10.59.21.241 and it went
 fine. Next I decommissionned the node 10.248.10.94 and moved
 10.59.21.241 to the token 0.
 
 Now I am on the situation described before.
 
 Alain
 
 
 2012/7/19 Alain RODRIGUEZ arodr...@gmail.com:
 Hi, I wasn't able to see the token used currently by the 10.56.62.211
 (ghost node).
 
 I already removed the token 6 days ago :
 
 - Removing token 170141183460469231731687303715884105727 for /10.56.62.211
 
 - check in cassandra log. It is possible you see a log line telling
 you 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same
 token
 
 Nothing like that in the logs
 
 I tried the following without success :
 
 $ nodetool -h localhost removetoken 170141183460469231731687303715884105727
 Exception in thread main java.lang.UnsupportedOperationException:
 Token not found.
 ...
 
 I really thought this was going to work :-).
 
 Any other ideas ?
 
 Alain
 
 PS : I heard that Octo is a nice company and you use Cassandra so I
 guess you're fine in there :-). I wish you the best thanks for your
 help.
 
 2012/7/19 Olivier Mallassi omalla...@octo.com:
 I got that a couple of time (due to DNS issues in our infra)
 
 what you could try
 - check in cassandra log. It is possible you see a log line telling you
 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same token
 - if 10.56.62.211 is up, try decommission (via nodetool)
 - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1
 - use removetoken (via nodetool) to remove the token associated with
 10.56.62.211. in case of failure, you can use removetoken -f instead.
 
 then, the unreachable IP should have disappeared.
 
 
 HTH
 
 On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ arodr...@gmail.com
 wrote:
 
 Hi,
 
 I tried to add a node a few days ago and it failed. I finally made it
 work with an other node but now when I describe cluster on cli I got
 this :
 
 Cluster Information:
   Snitch: org.apache.cassandra.locator.Ec2Snitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
  UNREACHABLE: [10.56.62.211]
  e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109]
 
 And nodetool ring gives me :
 
 Address DC  RackStatus State   Load
 OwnsToken
 
85070591730234615865843651857942052864
 10.59.21.241eu-west 1b  Up Normal  101.17 GB
 50.00%  0
 10.58.83.109eu-west 1b  Up Normal  55.27 GB
 50.00%  85070591730234615865843651857942052864
 
 The point, as you can see, is that one of my node has twice the
 information of the second one. I have a RF = 2 defined.
 
 My guess is that the token 0 node keep data for the unreachable node.
 
 The IP of the unreachable node doesn't belong to me anymore, I have no
 access to this ghost node.
 
 Does someone know how to completely remove this ghost node from my cluster
 ?
 
 Thank you.
 
 Alain
 
 INFO :
 
 On ubuntu (AMI Datastax 2.1 and 2.2)
 Cassandra 1.1.2 (upgraded from 1.0.9)
 2 node cluster (+ the ghost one)
 RF = 2
 
 
 
 
 --
 
 Olivier Mallassi
 OCTO Technology
 
 50, Avenue des Champs-Elysées
 75008 Paris
 
 Mobile: (33) 6 28 70 26 61
 Tél: (33) 1 58 56 10 00
 Fax: (33) 1 58 56 10 01
 
 http://www.octo.com
 Octo Talks! http://blog.octo.com
 
 



Re: Unreachable node, not in nodetool ring

2012-07-20 Thread Alain RODRIGUEZ
Hi Aaron,

I have repaired and cleanup both nodes already and I did it after any
change on my ring (It tooks me a while btw :)).

The node *.211 is actually out of the ring and out of my control
'cause I don't have the server anymore (EC2 instance terminated a few
days ago).

Alain

2012/7/20 aaron morton aa...@thelastpickle.com:
 I would:

 * run repair on 10.58.83.109
 * run cleanup on 10.59.21.241 (I assume this was the first node).

 It looks like 0.56.62.211 is out of the cluster.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote:

 Not sure if this may help :

 nodetool -h localhost gossipinfo
 /10.58.83.109
  RELEASE_VERSION:1.1.2
  RACK:1b
  LOAD:5.9384978406E10
  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
  DC:eu-west
  STATUS:NORMAL,85070591730234615865843651857942052864
  RPC_ADDRESS:0.0.0.0
 /10.248.10.94
  RELEASE_VERSION:1.1.2
  LOAD:3.0128207422E10
  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
  STATUS:LEFT,0,1342866804032
  RPC_ADDRESS:0.0.0.0
 /10.56.62.211
  RELEASE_VERSION:1.1.2
  LOAD:11594.0
  RACK:1b
  SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
  DC:eu-west
  REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864
  STATUS:removed,170141183460469231731687303715884105727,1342453967415
  RPC_ADDRESS:0.0.0.0
 /10.59.21.241
  RELEASE_VERSION:1.1.2
  RACK:1b
  LOAD:1.08667047094E11
  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
  DC:eu-west
  STATUS:NORMAL,0
  RPC_ADDRESS:0.0.0.0

 Story :

 I had 2 node cluster

 10.248.10.94 Token 0
 10.59.21.241 Token 85070591730234615865843651857942052864

 Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1
 (170141183460469231731687303715884105727). This failed, I removed
 token.

 I repeat the previous operation with the node 10.59.21.241 and it went
 fine. Next I decommissionned the node 10.248.10.94 and moved
 10.59.21.241 to the token 0.

 Now I am on the situation described before.

 Alain


 2012/7/19 Alain RODRIGUEZ arodr...@gmail.com:

 Hi, I wasn't able to see the token used currently by the 10.56.62.211

 (ghost node).


 I already removed the token 6 days ago :


 - Removing token 170141183460469231731687303715884105727 for
 /10.56.62.211


 - check in cassandra log. It is possible you see a log line telling

 you 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same

 token


 Nothing like that in the logs


 I tried the following without success :


 $ nodetool -h localhost removetoken 170141183460469231731687303715884105727

 Exception in thread main java.lang.UnsupportedOperationException:

 Token not found.

 ...


 I really thought this was going to work :-).


 Any other ideas ?


 Alain


 PS : I heard that Octo is a nice company and you use Cassandra so I

 guess you're fine in there :-). I wish you the best thanks for your

 help.


 2012/7/19 Olivier Mallassi omalla...@octo.com:

 I got that a couple of time (due to DNS issues in our infra)


 what you could try

 - check in cassandra log. It is possible you see a log line telling you

 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same token

 - if 10.56.62.211 is up, try decommission (via nodetool)

 - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1

 - use removetoken (via nodetool) to remove the token associated with

 10.56.62.211. in case of failure, you can use removetoken -f instead.


 then, the unreachable IP should have disappeared.



 HTH


 On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ arodr...@gmail.com

 wrote:


 Hi,


 I tried to add a node a few days ago and it failed. I finally made it

 work with an other node but now when I describe cluster on cli I got

 this :


 Cluster Information:

   Snitch: org.apache.cassandra.locator.Ec2Snitch

   Partitioner: org.apache.cassandra.dht.RandomPartitioner

   Schema versions:

  UNREACHABLE: [10.56.62.211]

  e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109]


 And nodetool ring gives me :


 Address DC  RackStatus State   Load

 OwnsToken


85070591730234615865843651857942052864

 10.59.21.241eu-west 1b  Up Normal  101.17 GB

 50.00%  0

 10.58.83.109eu-west 1b  Up Normal  55.27 GB

 50.00%  85070591730234615865843651857942052864


 The point, as you can see, is that one of my node has twice the

 information of the second one. I have a RF = 2 defined.


 My guess is that the token 0 node keep data for the unreachable node.


 The IP of the unreachable node doesn't belong to me anymore, I have no

 access to this ghost node.


 Does someone know how to completely remove this ghost node from my cluster

 ?


 Thank you.


 Alain


 INFO :


 On ubuntu (AMI Datastax 2.1 and 2.2)

 Cassandra 1.1.2 (upgraded from 1.0.9)

 2 node cluster (+ the ghost one)

 RF = 2





 --

 

Re: Unreachable node, not in nodetool ring

2012-07-19 Thread Olivier Mallassi
I got that a couple of time (due to DNS issues in our infra)

what you could try
- check in cassandra log. It is possible you see a log line telling
you 10.56.62.211
and 10.59.21.241 o 10.58.83.109  share the same token
- if 10.56.62.211 is up, try decommission (via nodetool)
- if not, move 10.59.21.241 or 10.58.83.109 to current token + 1
- use removetoken (via nodetool) to remove the token associated with
10.56.62.211.
in case of failure, you can use removetoken -f instead.

then, the unreachable IP should have disappeared.


HTH

On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 Hi,

 I tried to add a node a few days ago and it failed. I finally made it
 work with an other node but now when I describe cluster on cli I got
 this :

 Cluster Information:
Snitch: org.apache.cassandra.locator.Ec2Snitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
   UNREACHABLE: [10.56.62.211]
   e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109]

 And nodetool ring gives me :

 Address DC  RackStatus State   Load
 OwnsToken

 85070591730234615865843651857942052864
 10.59.21.241eu-west 1b  Up Normal  101.17 GB
 50.00%  0
 10.58.83.109eu-west 1b  Up Normal  55.27 GB
 50.00%  85070591730234615865843651857942052864

 The point, as you can see, is that one of my node has twice the
 information of the second one. I have a RF = 2 defined.

 My guess is that the token 0 node keep data for the unreachable node.

 The IP of the unreachable node doesn't belong to me anymore, I have no
 access to this ghost node.

 Does someone know how to completely remove this ghost node from my cluster
 ?

 Thank you.

 Alain

 INFO :

 On ubuntu (AMI Datastax 2.1 and 2.2)
 Cassandra 1.1.2 (upgraded from 1.0.9)
 2 node cluster (+ the ghost one)
 RF = 2




-- 

Olivier Mallassi
OCTO Technology

50, Avenue des Champs-Elysées
75008 Paris

Mobile: (33) 6 28 70 26 61
Tél: (33) 1 58 56 10 00
Fax: (33) 1 58 56 10 01

http://www.octo.com
Octo Talks! http://blog.octo.com


Re: Unreachable node, not in nodetool ring

2012-07-19 Thread Alain RODRIGUEZ
Hi, I wasn't able to see the token used currently by the 10.56.62.211
(ghost node).

I already removed the token 6 days ago :

- Removing token 170141183460469231731687303715884105727 for /10.56.62.211

- check in cassandra log. It is possible you see a log line telling
you 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same
token

Nothing like that in the logs

I tried the following without success :

$ nodetool -h localhost removetoken 170141183460469231731687303715884105727
Exception in thread main java.lang.UnsupportedOperationException:
Token not found.
...

I really thought this was going to work :-).

Any other ideas ?

Alain

PS : I heard that Octo is a nice company and you use Cassandra so I
guess you're fine in there :-). I wish you the best thanks for your
help.

2012/7/19 Olivier Mallassi omalla...@octo.com:
 I got that a couple of time (due to DNS issues in our infra)

 what you could try
 - check in cassandra log. It is possible you see a log line telling you
 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same token
 - if 10.56.62.211 is up, try decommission (via nodetool)
 - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1
 - use removetoken (via nodetool) to remove the token associated with
 10.56.62.211. in case of failure, you can use removetoken -f instead.

 then, the unreachable IP should have disappeared.


 HTH

 On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ arodr...@gmail.com
 wrote:

 Hi,

 I tried to add a node a few days ago and it failed. I finally made it
 work with an other node but now when I describe cluster on cli I got
 this :

 Cluster Information:
Snitch: org.apache.cassandra.locator.Ec2Snitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
   UNREACHABLE: [10.56.62.211]
   e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109]

 And nodetool ring gives me :

 Address DC  RackStatus State   Load
 OwnsToken

 85070591730234615865843651857942052864
 10.59.21.241eu-west 1b  Up Normal  101.17 GB
 50.00%  0
 10.58.83.109eu-west 1b  Up Normal  55.27 GB
 50.00%  85070591730234615865843651857942052864

 The point, as you can see, is that one of my node has twice the
 information of the second one. I have a RF = 2 defined.

 My guess is that the token 0 node keep data for the unreachable node.

 The IP of the unreachable node doesn't belong to me anymore, I have no
 access to this ghost node.

 Does someone know how to completely remove this ghost node from my cluster
 ?

 Thank you.

 Alain

 INFO :

 On ubuntu (AMI Datastax 2.1 and 2.2)
 Cassandra 1.1.2 (upgraded from 1.0.9)
 2 node cluster (+ the ghost one)
 RF = 2




 --
 
 Olivier Mallassi
 OCTO Technology
 
 50, Avenue des Champs-Elysées
 75008 Paris

 Mobile: (33) 6 28 70 26 61
 Tél: (33) 1 58 56 10 00
 Fax: (33) 1 58 56 10 01

 http://www.octo.com
 Octo Talks! http://blog.octo.com




Re: Unreachable node, not in nodetool ring

2012-07-19 Thread Alain RODRIGUEZ
Not sure if this may help :

nodetool -h localhost gossipinfo
/10.58.83.109
  RELEASE_VERSION:1.1.2
  RACK:1b
  LOAD:5.9384978406E10
  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
  DC:eu-west
  STATUS:NORMAL,85070591730234615865843651857942052864
  RPC_ADDRESS:0.0.0.0
/10.248.10.94
  RELEASE_VERSION:1.1.2
  LOAD:3.0128207422E10
  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
  STATUS:LEFT,0,1342866804032
  RPC_ADDRESS:0.0.0.0
/10.56.62.211
  RELEASE_VERSION:1.1.2
  LOAD:11594.0
  RACK:1b
  SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
  DC:eu-west
  REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864
  STATUS:removed,170141183460469231731687303715884105727,1342453967415
  RPC_ADDRESS:0.0.0.0
/10.59.21.241
  RELEASE_VERSION:1.1.2
  RACK:1b
  LOAD:1.08667047094E11
  SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8
  DC:eu-west
  STATUS:NORMAL,0
  RPC_ADDRESS:0.0.0.0

Story :

I had 2 node cluster

10.248.10.94 Token 0
10.59.21.241 Token 85070591730234615865843651857942052864

Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1
(170141183460469231731687303715884105727). This failed, I removed
token.

I repeat the previous operation with the node 10.59.21.241 and it went
fine. Next I decommissionned the node 10.248.10.94 and moved
10.59.21.241 to the token 0.

Now I am on the situation described before.

Alain


2012/7/19 Alain RODRIGUEZ arodr...@gmail.com:
 Hi, I wasn't able to see the token used currently by the 10.56.62.211
 (ghost node).

 I already removed the token 6 days ago :

 - Removing token 170141183460469231731687303715884105727 for /10.56.62.211

 - check in cassandra log. It is possible you see a log line telling
 you 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same
 token

 Nothing like that in the logs

 I tried the following without success :

 $ nodetool -h localhost removetoken 170141183460469231731687303715884105727
 Exception in thread main java.lang.UnsupportedOperationException:
 Token not found.
 ...

 I really thought this was going to work :-).

 Any other ideas ?

 Alain

 PS : I heard that Octo is a nice company and you use Cassandra so I
 guess you're fine in there :-). I wish you the best thanks for your
 help.

 2012/7/19 Olivier Mallassi omalla...@octo.com:
 I got that a couple of time (due to DNS issues in our infra)

 what you could try
 - check in cassandra log. It is possible you see a log line telling you
 10.56.62.211 and 10.59.21.241 o 10.58.83.109  share the same token
 - if 10.56.62.211 is up, try decommission (via nodetool)
 - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1
 - use removetoken (via nodetool) to remove the token associated with
 10.56.62.211. in case of failure, you can use removetoken -f instead.

 then, the unreachable IP should have disappeared.


 HTH

 On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ arodr...@gmail.com
 wrote:

 Hi,

 I tried to add a node a few days ago and it failed. I finally made it
 work with an other node but now when I describe cluster on cli I got
 this :

 Cluster Information:
Snitch: org.apache.cassandra.locator.Ec2Snitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
   UNREACHABLE: [10.56.62.211]
   e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109]

 And nodetool ring gives me :

 Address DC  RackStatus State   Load
 OwnsToken

 85070591730234615865843651857942052864
 10.59.21.241eu-west 1b  Up Normal  101.17 GB
 50.00%  0
 10.58.83.109eu-west 1b  Up Normal  55.27 GB
 50.00%  85070591730234615865843651857942052864

 The point, as you can see, is that one of my node has twice the
 information of the second one. I have a RF = 2 defined.

 My guess is that the token 0 node keep data for the unreachable node.

 The IP of the unreachable node doesn't belong to me anymore, I have no
 access to this ghost node.

 Does someone know how to completely remove this ghost node from my cluster
 ?

 Thank you.

 Alain

 INFO :

 On ubuntu (AMI Datastax 2.1 and 2.2)
 Cassandra 1.1.2 (upgraded from 1.0.9)
 2 node cluster (+ the ghost one)
 RF = 2




 --
 
 Olivier Mallassi
 OCTO Technology
 
 50, Avenue des Champs-Elysées
 75008 Paris

 Mobile: (33) 6 28 70 26 61
 Tél: (33) 1 58 56 10 00
 Fax: (33) 1 58 56 10 01

 http://www.octo.com
 Octo Talks! http://blog.octo.com