Hi,

I could find stale data after truncating a table.
It seems that truncating starts while recovery is being executed just after
a node restarts.
After the truncating finishes, recovery still continues?
Is it expected?

I use C* 2.2.8 and can reproduce it as below.

==== [create table] ====
cqlsh $ip -e "drop keyspace testdb;"
cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class':
'SimpleStrategy', 'replication_factor': '3'};"
cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val int);"

==== [script] ====
#!/bin/sh

node1_ip=<node1 IP address>
node2_ip=<node2 IP address>
node3_ip=<node3 IP address>
node3_user=<user name>
rows=10000

echo "consistency quorum;" > init_data.cql
for key in $(seq 0 $(expr $rows - 1))
do
    echo "insert into testdb.testtbl (key, val) values($key, 1111) IF NOT
EXISTS;" >> init_data.cql
done

while true
do
echo "truncate the table"
cqlsh $node1_ip -e "truncate table testdb.testtbl"
if [ $? -ne 0 ]; then
    echo "truncating failed"
    continue
else
    break
fi
done

echo "kill C* process on node3"
pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | grep CassandraDaemon |
awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"

echo "insert $rows rows"
cqlsh $node1_ip -f init_data.cql > insert_log 2>&1

echo "restart C* process on node3"
pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra start"

while true
do
echo "truncate the table again"
cqlsh $node1_ip -e "truncate table testdb.testtbl"
if [ $? -ne 0 ]; then
    echo "truncating failed"
    continue
else
    break
fi
done

cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
count(*) from testdb.testtbl;"
sleep 10
cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
count(*) from testdb.testtbl;"


==== [result] ====
truncate the table
kill C* process on node3
insert 10000 rows
restart C* process on node3
10.91.145.27: Starting Cassandra: OK
truncate the table again
<stdin>:1:TruncateError: Error during truncate: Cannot achieve consistency
level ALL
truncating failed
truncate the table again
<stdin>:1:TruncateError: Error during truncate: Cannot achieve consistency
level ALL
truncating failed
truncate the table again
<stdin>:1:TruncateError: Error during truncate: Cannot achieve consistency
level ALL
truncating failed
truncate the table again
<stdin>:1:TruncateError: Error during truncate: Cannot achieve consistency
level ALL
truncating failed
truncate the table again
<stdin>:1:TruncateError: Error during truncate: Cannot achieve consistency
level ALL
truncating failed
truncate the table again
<stdin>:1:TruncateError: Error during truncate: Cannot achieve consistency
level ALL
truncating failed
truncate the table again
Consistency level set to SERIAL.

 count
-------
   300

(1 rows)

Warnings :
Aggregation query used without partition key

Consistency level set to SERIAL.

 count
-------
  2304

(1 rows)

Warnings :
Aggregation query used without partition key
====

I found it when I was investigating data lost problem. (Ref. "failure node
rejoin" thread)
I'm not sure this problem is related to data lost.

Thanks,
yuji

Reply via email to