Thanks for the great explanation. I'd just like some clarification on the last point. Is it the case that if I constantly add new columns to a row, while periodically trimming the row by by deleting the oldest columns, the deleted columns won't get cleaned up until all fragments of the row exist in a single sstable and that sstable undergoes a compaction?
If my understanding is correct, do you know if 1.2 will enable cleanup of columns in rows that have scattered fragments? Or, should I take a different approach? On Thu, Dec 13, 2012 at 5:52 PM, aaron morton <[email protected]>wrote: > Is it possible to use scrub to accelerate the clean up of expired/deleted > data? > > No. > Scrub, and upgradesstables, are used to re-write each file on disk. Scrub > may remove some rows from a file because of corruption, however > upgradesstables will not. > > If you have long lived rows and a mixed work load of writes and deletes > there are a couple of options. > > You can try levelled compaction > http://www.datastax.com/dev/blog/when-to-use-leveled-compaction > > You can tune the default sized tiered compaction by increasing the > min_compaction_threshold. This will increase the number of files that must > exist in each size tier before it will be compacted. As a result the speed > at which rows move into the higher tiers will slow down. > > Note that having lots of files may have a negative impact on read > performance. You can measure this my looking at the SSTables per read > metric in the cfhistograms. > > Lastly you can run a user defined or major compaction. User defined > compaction is available via JMX and allows you to compact any file you > want. Manual / major compaction is available via node tool. We usually > discourage it's use as it will create one big file that will not get > compacted for a while. > > > For background the tombstones / expired columns for a row are only purged > from the database when all fragments of the row are in the files been > compacted. So if you have an old row that is spread out over many files it > may not get purged. > > Hope that helps. > > > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 14/12/2012, at 3:01 AM, Mike Smith <[email protected]> wrote: > > I'm using 1.0.12 and I find that large sstables tend to get compacted > infrequently. I've got data that gets deleted or expired frequently. Is it > possible to use scrub to accelerate the clean up of expired/deleted data? > > -- > Mike Smith > Director Development, MailChannels > > > -- Mike Smith Director Development, MailChannels
