RE: Cluster & Nodetool

Marc Hoppins Mon, 06 Jun 2022 01:54:58 -0700

No. It transpires that, after seeing errors when running a start.yml for 
ansible, I decided to start all nodes again and when starting some assumed the 
same ID as others.


I resolved this by shutting down the service on the affected nodes, removing 
the data dirs. (these are all new nodes: no data) and restarted the service, 
one by one, making sure that the new node appeared before starting another.

All are now alive and kicking (copyright Simple Minds).

Vis. Seeds: given my setup only has a small number of nodes, I used 1 node out 
of 4 as a seed. I have seen folk suggesting every node (sounds excessive) and 1 
per datacentre (seems unreliable), and also 3 seeds per datacentre...which 
could be adequate if all not in the same rack (which mine currently are).  What 
suggestions/best practice?  2 per switch/rack for failover, or just go with a 
set number per datacentre?

For automated install: how do you go about resolving dc & rack, and tokens per 
node (if the hardware varies)?

Marc

-----Original Message-----
From: Bowen Song <bo...@bso.ng> 
Sent: Saturday, June 4, 2022 3:10 PM
To: user@cassandra.apache.org
Subject: Re: Cluster & Nodetool

EXTERNAL


That sounds like something caused by duplicated node IDs (the Host ID column in 
`nodetool status`). Did you by any chance copied the Cassandra data directory 
between nodes? (e.g. spinning up a new node from a VM snapshot that contains a 
non-empty data directory)

On 03/06/2022 12:38, Marc Hoppins wrote:
> Hi all,
>
> Am new to Cassandra.  Just finished installing on 22 nodes across 2 
> datacentres.
>
> If I run nodetool describecluster  I get
>
> Stats for all nodes:
>          Live: 22
>          Joining: 0
>          Moving: 0
>          Leaving: 0
>          Unreachable: 0
>
> Data Centers:
>          BA #Nodes: 9 #Down: 0
>          DR1 #Nodes: 8 #Down: 0
>
> There should be 12 in BA and 10 in DR1.  The service is running on these 
> other nodes...yet nodetool status also only shows the above numbers.
>
> Datacenter: BA
> ==============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address       Load        Tokens  Owns (effective)  Host ID               
>                 Rack
> UN  10.1.146.197  304.72 KiB  16      11.4%             
> 26d5a89c-aa8f-4249-b2b5-82341cc214bc  SSW09
> UN  10.1.146.186  245.02 KiB  16      9.0%              
> 29f20519-51f9-493c-b891-930762d82231  SSW09
> UN  10.1.146.20   129.53 KiB  16      12.5%             
> f90dd318-1357-46ca-9870-807d988658b3  SSW09
> UN  10.1.146.200  150.31 KiB  16      11.1%             
> c544e85a-c2c5-4afd-aca8-1854a1723c2f  SSW09
> UN  10.1.146.17   185.9 KiB   16      11.7%             
> db9d9856-3082-44a8-b292-156da1a17d0a  SSW09
> UN  10.1.146.174  288.64 KiB  16      12.1%             
> 03126eba-8b58-4a96-80ca-10cec2e18e69  SSW09
> UN  10.1.146.199  146.71 KiB  16      13.7%             
> 860d6549-94ab-4a07-b665-70ea7e53f41a  SSW09
> UN  10.1.146.78   69.05 KiB   16      11.5%             
> 7d9fdbab-40b0-4a9e-b0c9-4ffa822c42fd  SSW09
> UN  10.1.146.67   304.5 KiB   16      13.6%             
> 48e9eba2-9112-4d91-8f26-8272cb5ce7bc  SSW09
>
> Datacenter: DR1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address       Load        Tokens  Owns (effective)  Host ID               
>                 Rack
> UN  10.1.146.137  209.33 KiB  16      12.6%             
> f65c685f-048c-41de-85e4-308c4b84d047  SSW02
> UN  10.1.146.141  237.21 KiB  16      9.8%              
> 847ad921-fceb-4cef-acec-1c918d2a6517  SSW02
> UN  10.1.146.131  311.05 KiB  16      11.7%             
> 7263f6c6-c4d6-438e-8ee7-d07666242ba0  SSW02
> UN  10.1.146.139  283.33 KiB  16      11.5%             
> 264cbe47-acb4-49cc-97d0-6f9e2cee6844  SSW02
> UN  10.1.146.140  258.46 KiB  16      11.6%             
> 43dbbe91-5dac-4c3a-9df5-2f5ccf268eb6  SSW02
> UN  10.1.146.132  157.03 KiB  16      12.3%             
> 1c0cb23c-af78-4fa2-bd92-20fa7d39ec30  SSW02
> UN  10.1.146.135  301.13 KiB  16      11.2%             
> 26159fbe-cf78-4c94-88e0-54773bcf7bed  SSW02
> UN  10.1.146.130  305.16 KiB  16      12.5%             
> d6d6c490-551d-4a97-a93c-3b772b750d7d  SSW02
>
> So I restarted the service on one of the missing addresses. It appeared in 
> the list but one other dropped off.  I tried this several times.  It seems I 
> can only get 9 and 8 not 12 and 10.
>
> Anyone have an idea why this may be so?
>
> Thanks
>
> Marc

RE: Cluster & Nodetool

Reply via email to