Hi, I could find stale data after truncating a table. It seems that truncating starts while recovery is being executed just after a node restarts. After the truncating finishes, recovery still continues? Is it expected?
I use C* 2.2.8 and can reproduce it as below. ==== [create table] ==== cqlsh $ip -e "drop keyspace testdb;" cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'};" cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val int);" ==== [script] ==== #!/bin/sh node1_ip=<node1 IP address> node2_ip=<node2 IP address> node3_ip=<node3 IP address> node3_user=<user name> rows=10000 echo "consistency quorum;" > init_data.cql for key in $(seq 0 $(expr $rows - 1)) do echo "insert into testdb.testtbl (key, val) values($key, 1111) IF NOT EXISTS;" >> init_data.cql done while true do echo "truncate the table" cqlsh $node1_ip -e "truncate table testdb.testtbl" if [ $? -ne 0 ]; then echo "truncating failed" continue else break fi done echo "kill C* process on node3" pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | grep CassandraDaemon | awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9" echo "insert $rows rows" cqlsh $node1_ip -f init_data.cql > insert_log 2>&1 echo "restart C* process on node3" pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra start" while true do echo "truncate the table again" cqlsh $node1_ip -e "truncate table testdb.testtbl" if [ $? -ne 0 ]; then echo "truncating failed" continue else break fi done cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select count(*) from testdb.testtbl;" sleep 10 cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select count(*) from testdb.testtbl;" ==== [result] ==== truncate the table kill C* process on node3 insert 10000 rows restart C* process on node3 10.91.145.27: Starting Cassandra: OK truncate the table again <stdin>:1:TruncateError: Error during truncate: Cannot achieve consistency level ALL truncating failed truncate the table again <stdin>:1:TruncateError: Error during truncate: Cannot achieve consistency level ALL truncating failed truncate the table again <stdin>:1:TruncateError: Error during truncate: Cannot achieve consistency level ALL truncating failed truncate the table again <stdin>:1:TruncateError: Error during truncate: Cannot achieve consistency level ALL truncating failed truncate the table again <stdin>:1:TruncateError: Error during truncate: Cannot achieve consistency level ALL truncating failed truncate the table again <stdin>:1:TruncateError: Error during truncate: Cannot achieve consistency level ALL truncating failed truncate the table again Consistency level set to SERIAL. count ------- 300 (1 rows) Warnings : Aggregation query used without partition key Consistency level set to SERIAL. count ------- 2304 (1 rows) Warnings : Aggregation query used without partition key ==== I found it when I was investigating data lost problem. (Ref. "failure node rejoin" thread) I'm not sure this problem is related to data lost. Thanks, yuji