On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby
<jonathan.co...@gmail.com> wrote:
>
> The way compaction works,  "x" same-sized files are merged into a new 
> SSTable.  This repeats itself and the SSTable get bigger and bigger.
>
> So what is the upper limit??     If you are not deleting stuff fast enough, 
> wouldn't the SSTable sizes grow indefinitely?
>
> I ask because we have some rather large SSTable files (80-100 GB) and I'm 
> starting to worry about future compactions.
>
> Second, compacting such large files is an IO killer.    What can be tuned 
> other than compaction_threshold to help optimize this and prevent the files 
> from getting too big?
>
> Thanks!


The compaction is an iterative process that first compacts uncompacted
SSTables and removes tombstones etc.  This compaction takes multiple
files and merges them into one SSTable.  This process repeats until
you have "compaction_threshold=X" number of similarly sized SSTables,
then those will get re-compacted (merged) together.  The number and
size of SSTables that you have as a result of a flush is tuned by max
size, or records, or time.  Contrary to what you might believe, having
fewer larger SSTables reduces IO compared to compacting many small
SSTables.  Also the merge operation of previously compacted SSTables
is relatively fast.

As far as I know, cassandra will continue compacting SSTables into an
indefinitely larger sized SSTable.  The tunable side of things is for
adjusting when to flush memtable to SSTable, and the number of
SSTables of similar size that must be present to execute a compaction.

-Eric

Reply via email to