Wei Deng created CASSANDRA-12200:
------------------------------------

             Summary: Backlogged compactions can make repair on trivially small 
tables waiting for a long time to finish
                 Key: CASSANDRA-12200
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12200
             Project: Cassandra
          Issue Type: Bug
          Components: Core
            Reporter: Wei Deng


In C* 3.0 we started to use incremental repair by default. However, this seems 
to create a repair performance problem if you have a relatively write-heavy 
workload that can drive all available concurrent_compactors to be used by 
active compactions.

I was able to demonstrate this issue by the following scenario:

1. On a three-node C* 3.0.7 cluster, use "cassandra-stress write n=100000000" 
to generate 100GB of data with keyspace1.standard1 table using LCS (ctrl+c the 
stress client once the data size on each node reaches 35+GB).
2. At this point, there will be hundreds of L0 SSTables waiting for LCS to 
digest on each node, and with concurrent_compactors set to default at 2, the 
two compaction threads are constantly busy processing the backlogged L0 
SSTables.
3. Now create a new keyspace called "trivial_ks" with RF=3 and create a small 
two-column CQL table in it, and insert 6 records.
4. Start a "nodetool repair trivial_ks" session on one of the nodes, and watch 
the following behavior:

{noformat}
automaton@wdengdse50google-98425b985-3:~$ nodetool repair trivial_ks
[2016-07-13 01:57:28,364] Starting repair command #1, repairing keyspace 
trivial_ks with repair options (parallelism: parallel, primary range: false, 
incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: 
[], # of ranges: 3)
[2016-07-13 01:57:31,027] Repair session 27212dd0-489d-11e6-a6d6-cd06faa0aaa2 
for range [(3074457345618258602,-9223372036854775808], 
(-9223372036854775808,-3074457345618258603], 
(-3074457345618258603,3074457345618258602]] finished (progress: 66%)
[2016-07-13 02:07:47,637] Repair completed successfully
[2016-07-13 02:07:47,657] Repair command #1 finished in 10 minutes 19 seconds
{noformat}

Basically for such a small table it took 10+ minutes to finish the repair. 
Looking at debug.log for this particular repair session UUID, you will find 
that all nodes were able to pass through validation compaction within 15ms, but 
one of the nodes actually got stuck waiting for a compaction slot because it 
has to do an anti-compaction step before it can finally tell the initiating 
node that it's done with its part of the repair session, so it took 10+ minutes 
for one compaction slot to be freed up, like shown in the following debug.log 
entries:

{noformat}
DEBUG [AntiEntropyStage:1] 2016-07-13 01:57:30,956  
RepairMessageVerbHandler.java:149 - Got anticompaction request 
AnticompactionRequest{parentRepairSession=27103de0-489d-11e6-a6d6-cd06faa0aaa2} 
org.apache.cassandra.repair.messages.AnticompactionRequest@34449ff4
<...>
<snip>
<...>
DEBUG [CompactionExecutor:5] 2016-07-13 02:07:47,506  CompactionTask.java:217 - 
Compacted (286609e0-489d-11e6-9e03-1fd69c5ec46c) 32 sstables to 
[/var/lib/cassandra/data/keyspace1/standard1-9c02e9c1487c11e6b9161dbd340a212f/mb-499-big,]
 to level=0.  2,892,058,050 bytes to 2,874,333,820 (~99% of original) in 
616,880ms = 4.443617MB/s.  0 total partitions merged to 12,233,340.  Partition 
merge counts were {1:12086760, 2:146580, }
INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,512  
CompactionManager.java:511 - Starting anticompaction for trivial_ks.weitest on 
1/[BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')]
 sstables
INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,513  
CompactionManager.java:540 - SSTable 
BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')
 fully contained in range (-9223372036854775808,-9223372036854775808], mutating 
repairedAt instead of anticompacting
INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,570  
CompactionManager.java:578 - Completed anticompaction successfully
{noformat}

Since validation compaction has its own threads outside of the regular 
compaction thread pool restricted by concurrent_compactors, we were able to 
pass through validation compaction without any issue. If we could treat 
anti-compaction the same way (i.e. to give it its own thread pool), we can 
avoid this kind of repair performance problem from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to