[ 
https://issues.apache.org/jira/browse/CASSANDRA-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-1881.
---------------------------------------

    Resolution: Won't Fix

Concurrent compactions was added in CASSANDRA-2191.  I see small benefit (and a 
lot of complexity) to be gained by rewriting to basically a pool of async 
compaction threads.
                
> support concurrent "tiered" compaction
> --------------------------------------
>
>                 Key: CASSANDRA-1881
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1881
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Priority: Minor
>
> (this has been discussed on the ML:s before; I am filing it now so that there 
> is a ticket to refer to on the wiki)
> CASSANDRA-1876 is open to allow parallel compaction for the purpose of 
> throughput. However, that only addresses one aspect of why parallel 
> compaction is useful; the other half is ensuring that compaction is 
> proceeding in a timely fashion at each "size tier" (for lack of a better 
> term).
> Essentially, CASSANDRA-1876 is about CPU concurrency while this is about 
> functional concurrency. I propose that compaction be a process which performs 
> some amount of compaction work per second (I'm thinking ahead to future rate 
> limiting; that's another ticket to be filed). That work has to be spread out 
> over multiple compaction tiers in a way that is not coupled with CPU 
> concurrency.
> Suggested solution is to have N number of concurrent compaction threads going 
> at any given moment (CASSANDRA-1876), but to have those compaction threads 
> perform work for a variable number of compaction jobs. Compactions would be 
> triggered according to similarly sized sstables as before, but each such 
> compaction would be a compaction "job" that is independent of any actual 
> compaction thread.
> Compaction threads move between compaction jobs at a coarse granularity so 
> that synchronization overhead is irrelevant (for example it might go and look 
> for other work to do every memtable_throughput_in_mb megabytes). Smaller 
> compaction jobs take priority over larger jobs. This is intended to keep 
> sstable counts down, and always leave the larger jobs as the ones having to 
> wait given that they are not latency sensitive anyway due to their size.
> The primary downside is that disk usage spikes would much more easily reach 
> "double cf size" levels when many compactions are running. This is probably 
> something that can be mitigated by CASSANDRA-1608 with its talk of limited 
> sstable sizes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to