Hi, I have a cluster with two node servers(I know it’s in a wrong way but it‘s builded by another colleague who has left), and it's keyspace set like:
CREATE KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} AND durable_writes = true; one day my boss said one node was down for a long time and another worked normally, tell my to restart the cluster. First, I make a snapshot from the working node; then, I check the data numbers with select count(*) cql statement, the result is more then 170000; Next, I add two new nodes. After new node worked, I use select count(*) cql to check the data several times, but now I get uncertain resluts, and each reslut is less then 10000; I check node status with ./nodetool status cql, and every node is UN, but the load of two new nodes is far less then the normal node。 I stop the two new nodes, use “select count(*)” cql and get the right result again. I build a new cluster in sandbox env with snapshot file, and get the same result like above。 I used "./nodetool repair" sql,then the cluster works well but I don't know why. I guess it because two nodes with "replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} " can make splite brain and the data won't be consistent,or the data file is broken but not make sure。Why did it happen, why I have to use "./nodetool repair" command, and when to use it? Thanks! ------------------ 赵豫峰 环信即时通讯云/研发