Hi all,

Following a discussion with our adminsys, I have a very practical question.
We use cassandra proxies (-Dcassandra.join_ring=false) as coordinators for PHP 
clients (a loooooot of PHP clients).

Our problem is that restarting Cassandra on proxies sometimes fails with the 
following error :

ERROR [main] 2021-03-16 14:18:46,236 CassandraDaemon.java:803 - Exception 
encountered during startup
java.lang.RuntimeException: A node with address XXXXXXXXXXXXXXXX/10.120.1.XXX 
already exists, cancelling join. Use cassandra.replace_address if you want to 
replace this node.

The node mentioned in the ERROR is the one we are restarting... and the start 
fails. Of course doing a manual start after works fine.
This message doesn't make sense... hostId didn't changed for this proxy (I am 
sure of me : system.local, IP, hostname, ... nothing changed... just the 
restart).

What I suppose (we don't all agree about this) is that, as proxies don't have 
data, they start very quickly. Too quickly for gossip protocol knows that the 
node was down.

Could this ERROR log be explained if the node is still known as UP by seeds 
servers if the state of the proxy in gossip protocol is not updated because 
stop/start is made too quickly ?
If this hypothesis seems possible, what reasonable delay (with technical 
arguments) should be implemented between stop and start ?
We have ~ 100 proxies and 12 classical Cassandra (4 of them are seeds)...

Thx in advance

Reply via email to