>  Is it possible to use scrub to accelerate the clean up of expired/deleted 
> data?
No.
Scrub, and upgradesstables, are used to re-write each file on disk. Scrub may 
remove some rows from a file because of corruption, however upgradesstables 
will not. 

If you have long lived rows and a mixed work load of writes and deletes there 
are a couple of options. 

You can try levelled compaction 
http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

You can tune the default sized tiered compaction by increasing the 
min_compaction_threshold. This will increase the number of files that must 
exist in each size tier before it will be compacted. As a result the speed at 
which rows move into the higher tiers will slow down. 

Note that having lots of files may have a negative impact on read performance. 
You can measure this my looking at the SSTables per read metric in the 
cfhistograms. 

Lastly you can run a user defined or major compaction. User defined compaction 
is available via JMX and allows you to compact any file you want. Manual / 
major compaction is available via node tool. We usually discourage it's use as 
it will create one big file that will not get compacted for a while. 


For background the tombstones / expired columns for a row are only purged from 
the database when all fragments of the row are  in the files been compacted. So 
if you have an old row that is spread out over many files it may not get 
purged. 

Hope that helps. 



-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 14/12/2012, at 3:01 AM, Mike Smith <m...@mailchannels.com> wrote:

> I'm using 1.0.12 and I find that large sstables tend to get compacted 
> infrequently. I've got data that gets deleted or expired frequently. Is it 
> possible to use scrub to accelerate the clean up of expired/deleted data?
> 
> -- 
> Mike Smith
> Director Development, MailChannels
> 

Reply via email to