Hi,
I am using Cassandra 2.0.3 and we use STCS for all CFs. We have recently faced 
an issue where sstable count of certain CFs went into THOUSANDS. We realized 
that every week, when "repair -pr" ran on each node, it created 50+ tiny 
sstables of around 1kb. These tables were never compacted during minor 
compactions and thus sstable count kept on increasing with each repair.
Our Root Cause Analysis is as under:We were writing to 5 CFs simultaneously and 
one CF had 3 Secondary Indexes. Our memtable_flush_writers was set to default 
1, which created bottleneck for writes from all CFs and lead to DROPPED 
mutations. Thus, many vnodes were damaged at the time of every weekly repair. 
While repairing inconsistent data, Repair created 50+ tiny sstables for each 
repair -pr ran on nodes. 

Why tiny sstables created during repair didn't compact? As 2.0.3 has a known 
issue ( https://issues.apache.org/jira/browse/CASSANDRA-6483 ), where even if 
you dont specify cold_reads_to_omit or set it zero, still cold sstables are not 
compacted. We think that this prevented these tiny sstables from participating 
in minor compactions.
Moreover, we had long GC pauses leading to nodes being marked down. 

Our Fix:1.  Increases memtable_flush_writers to 3 , so that mutations are not 
dropped and data is consistent.As most of the data is consistent, usual repair 
would not create tiny sstables to          repair vnode ranges.2.  Executed 
major compaction on all sstables where sstable count was in thousands to 
control the situation.3. We also did some compaction throttling, reduced 
total_memtable_space_in_mb and did JVM tuning to prevent long GC pauses.
Queries:1.We have observed that after increasing memtable_flush_writers, doing 
compaction throttling and tuning  JVM , tiny sstables which were not getting 
compacted during Repairs, started participating in compactions and sstable 
count for few CFs reduced considerably after repair(even though all tiny 
sstables were not compacted). As per our RCA, we understood that tiny sstables 
created with every repair are not getting compacted due to COLDNESS. What lead 
to compaction of these tiny tables now? How our changes affected minor 
compactions? Is there any gap in Root Cause Analysis?
2. We thought that CQL compaction subproperty of tombstone_threshold will help 
us after major compactions. This property will ensure that even if we have one 
huge sstable, once tombstone threshold of 20% has reached, sstables will be 
compacted and tombstones will be dropped after gc_grace_periods (even if there 
no similar sized sstables as need by STCS). But in our initial testing, single 
huge sstable is not getting compacted even if we drop all rows in it and 
gc_grace_period has passed.  Why tombstone_threshold is behaving like that?


ThanksAnuj Wadehra








1. High sstable count2. damaged vnodes and coldness issue3. Our solution: We 
tuned our db to stop dropped mutation and thus damaged vnodes creating amll 
tables during repair and did major compaction 

how to deal with side effects of major compaction?1. tombstone_threshhold 
compaction not triggered. STCS anyways never ensure reads through one sstabble
2. If coldness was the issue then Why db sync lead to compaction of cold 
sstables now ? Does date read during repair counted as read and ths data is 
made hot and compacted? 

Reply via email to