Hi, I have a cluster with two node servers(I know it’s in a wrong way  but it‘s 
builded by another colleague who has left), and it's keyspace set like:


CREATE KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '3'}  AND durable_writes = true;


one day my boss said one node was down for a long time and another worked 
normally, tell my to restart the cluster.


First, I make a snapshot from the working node;
then, I check the data numbers with select count(*) cql statement, the result 
is more then 170000;
Next, I add two new nodes. After new node worked, I use select count(*)  cql to 
check the data several times, but now I get uncertain resluts, and each reslut 
is less then 10000; I check node status with ./nodetool status cql, and every 
node is UN, but the load of two new nodes is far less then the normal node。
I stop the two new nodes, use “select count(*)” cql and get the right result 
again.


I build a new cluster in sandbox env with snapshot file, and get the same 
result like above。 I used "./nodetool repair" sql,then the cluster works well 
but I don't know why.


I guess it because two nodes with "replication = {'class': 'SimpleStrategy', 
'replication_factor': '3'} " can make splite brain and the data won't be 
consistent,or the data file is broken but not make sure。Why did it happen, why 
I have to use "./nodetool repair" command, and when to use it?


Thanks!





------------------


赵豫峰



环信即时通讯云/研发

Reply via email to