No. It transpires that, after seeing errors when running a start.yml for ansible, I decided to start all nodes again and when starting some assumed the same ID as others.
I resolved this by shutting down the service on the affected nodes, removing the data dirs. (these are all new nodes: no data) and restarted the service, one by one, making sure that the new node appeared before starting another. All are now alive and kicking (copyright Simple Minds). Vis. Seeds: given my setup only has a small number of nodes, I used 1 node out of 4 as a seed. I have seen folk suggesting every node (sounds excessive) and 1 per datacentre (seems unreliable), and also 3 seeds per datacentre...which could be adequate if all not in the same rack (which mine currently are). What suggestions/best practice? 2 per switch/rack for failover, or just go with a set number per datacentre? For automated install: how do you go about resolving dc & rack, and tokens per node (if the hardware varies)? Marc -----Original Message----- From: Bowen Song <bo...@bso.ng> Sent: Saturday, June 4, 2022 3:10 PM To: user@cassandra.apache.org Subject: Re: Cluster & Nodetool EXTERNAL That sounds like something caused by duplicated node IDs (the Host ID column in `nodetool status`). Did you by any chance copied the Cassandra data directory between nodes? (e.g. spinning up a new node from a VM snapshot that contains a non-empty data directory) On 03/06/2022 12:38, Marc Hoppins wrote: > Hi all, > > Am new to Cassandra. Just finished installing on 22 nodes across 2 > datacentres. > > If I run nodetool describecluster I get > > Stats for all nodes: > Live: 22 > Joining: 0 > Moving: 0 > Leaving: 0 > Unreachable: 0 > > Data Centers: > BA #Nodes: 9 #Down: 0 > DR1 #Nodes: 8 #Down: 0 > > There should be 12 in BA and 10 in DR1. The service is running on these > other nodes...yet nodetool status also only shows the above numbers. > > Datacenter: BA > ============== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 10.1.146.197 304.72 KiB 16 11.4% > 26d5a89c-aa8f-4249-b2b5-82341cc214bc SSW09 > UN 10.1.146.186 245.02 KiB 16 9.0% > 29f20519-51f9-493c-b891-930762d82231 SSW09 > UN 10.1.146.20 129.53 KiB 16 12.5% > f90dd318-1357-46ca-9870-807d988658b3 SSW09 > UN 10.1.146.200 150.31 KiB 16 11.1% > c544e85a-c2c5-4afd-aca8-1854a1723c2f SSW09 > UN 10.1.146.17 185.9 KiB 16 11.7% > db9d9856-3082-44a8-b292-156da1a17d0a SSW09 > UN 10.1.146.174 288.64 KiB 16 12.1% > 03126eba-8b58-4a96-80ca-10cec2e18e69 SSW09 > UN 10.1.146.199 146.71 KiB 16 13.7% > 860d6549-94ab-4a07-b665-70ea7e53f41a SSW09 > UN 10.1.146.78 69.05 KiB 16 11.5% > 7d9fdbab-40b0-4a9e-b0c9-4ffa822c42fd SSW09 > UN 10.1.146.67 304.5 KiB 16 13.6% > 48e9eba2-9112-4d91-8f26-8272cb5ce7bc SSW09 > > Datacenter: DR1 > =============== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 10.1.146.137 209.33 KiB 16 12.6% > f65c685f-048c-41de-85e4-308c4b84d047 SSW02 > UN 10.1.146.141 237.21 KiB 16 9.8% > 847ad921-fceb-4cef-acec-1c918d2a6517 SSW02 > UN 10.1.146.131 311.05 KiB 16 11.7% > 7263f6c6-c4d6-438e-8ee7-d07666242ba0 SSW02 > UN 10.1.146.139 283.33 KiB 16 11.5% > 264cbe47-acb4-49cc-97d0-6f9e2cee6844 SSW02 > UN 10.1.146.140 258.46 KiB 16 11.6% > 43dbbe91-5dac-4c3a-9df5-2f5ccf268eb6 SSW02 > UN 10.1.146.132 157.03 KiB 16 12.3% > 1c0cb23c-af78-4fa2-bd92-20fa7d39ec30 SSW02 > UN 10.1.146.135 301.13 KiB 16 11.2% > 26159fbe-cf78-4c94-88e0-54773bcf7bed SSW02 > UN 10.1.146.130 305.16 KiB 16 12.5% > d6d6c490-551d-4a97-a93c-3b772b750d7d SSW02 > > So I restarted the service on one of the missing addresses. It appeared in > the list but one other dropped off. I tried this several times. It seems I > can only get 9 and 8 not 12 and 10. > > Anyone have an idea why this may be so? > > Thanks > > Marc