Hey guys,
We're having a very strange issue: deleted columns get resurrected when
"repair" is run on a node.
Info about the setup. Cassandra 2.0.13, multi datacenter with 12 nodes in
one datacenter and 6 nodes in another one. Schema:
cqlsh> describe keyspace blackbook;
CREATE KEYSPACE blackbook WITH replication = {
'class': 'NetworkTopologyStrategy',
'IAD': '3',
'ORD': '3'
};
USE blackbook;
CREATE TABLE bounces (
domainid text,
address text,
message text,
"timestamp" bigint,
PRIMARY KEY (domainid, address)
) WITH
bloom_filter_fp_chance=0.100000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.000000 AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'LeveledCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
We're using wide rows for the "bounces" table that can store hundreds of
thousands of addresses for each "domainid" (in practice it's much less
usually, but some rows may contain up to several million columns).
All queries are done using LOCAL_QUORUM consistency. Sometimes bounces are
deleted from the table using the following CQL3 statement:
delete from bounces where domainid = 'domain.com' and address = '
[email protected]';
But the thing is, after "repair" is run on any node that owns "domain.com"
key, the column gets resurrected on all nodes as if the tombstone has
disappeared. We checked this multiple times using cqlsh: issue a delete
statement and verify that data is not returned; then run "repair" and the
deleted data is returned again.
Our gc_grace_seconds is of the default value and no nodes ever were down
for anywhere close to 10 days, so it doesn't look like it's related. We
also made sure all our servers are running ntpd so time synchronization
should not be an issue as well.
Have you guys ever seen anything like this / have any idea as to what may
be causing this behavior? What could make "tombstone" disappear during
"repair" operation?
Thanks for your help. Let me know if I can provide more information.
Roman