[ https://issues.apache.org/jira/browse/CASSANDRA-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis resolved CASSANDRA-1881. --------------------------------------- Resolution: Won't Fix Concurrent compactions was added in CASSANDRA-2191. I see small benefit (and a lot of complexity) to be gained by rewriting to basically a pool of async compaction threads. > support concurrent "tiered" compaction > -------------------------------------- > > Key: CASSANDRA-1881 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1881 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Peter Schuller > Priority: Minor > > (this has been discussed on the ML:s before; I am filing it now so that there > is a ticket to refer to on the wiki) > CASSANDRA-1876 is open to allow parallel compaction for the purpose of > throughput. However, that only addresses one aspect of why parallel > compaction is useful; the other half is ensuring that compaction is > proceeding in a timely fashion at each "size tier" (for lack of a better > term). > Essentially, CASSANDRA-1876 is about CPU concurrency while this is about > functional concurrency. I propose that compaction be a process which performs > some amount of compaction work per second (I'm thinking ahead to future rate > limiting; that's another ticket to be filed). That work has to be spread out > over multiple compaction tiers in a way that is not coupled with CPU > concurrency. > Suggested solution is to have N number of concurrent compaction threads going > at any given moment (CASSANDRA-1876), but to have those compaction threads > perform work for a variable number of compaction jobs. Compactions would be > triggered according to similarly sized sstables as before, but each such > compaction would be a compaction "job" that is independent of any actual > compaction thread. > Compaction threads move between compaction jobs at a coarse granularity so > that synchronization overhead is irrelevant (for example it might go and look > for other work to do every memtable_throughput_in_mb megabytes). Smaller > compaction jobs take priority over larger jobs. This is intended to keep > sstable counts down, and always leave the larger jobs as the ones having to > wait given that they are not latency sensitive anyway due to their size. > The primary downside is that disk usage spikes would much more easily reach > "double cf size" levels when many compactions are running. This is probably > something that can be mitigated by CASSANDRA-1608 with its talk of limited > sstable sizes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira