During compaction, both automatic / minor and manual / major. The performance drop is having a lot of expired columns that have not been purged by compaction as they must be read and discarded during reads.
Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 3/01/2012, at 10:38 AM, R. Verlangen wrote: > @Aaron: Small side question, when do columns with a past TTL get removed? On > a repair, (minor) compaction, or .. ? Does it have a performance drop if > that's happening? > > 2012/1/2 aaron morton <[email protected]> > Even if you had compaction enforcing a limit on the number of columns in a > row, there would still be issues with concurrent writes at the same time and > with read-repair. i.e. node a says the this is the first n columns but node b > says something else, you only know who is correct at read time. > > Have you considered using a TTL on the columns ? > > Depending on the use case you could also consider have writes periodically or > randomly trim the data size, or trim on reads. > > It will also make sense to partition the time series data into different > rows, and Viva la Standard Column Families! > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 25/12/2011, at 7:48 PM, Praveen Baratam wrote: > >> Hello Everybody, >> >> Happy Christmas. >> >> I know that this topic has come up quiet a few times on Dev and User lists >> but did not culminate into a solution. >> >> http://www.mail-archive.com/[email protected]/msg15367.html >> >> The above discussion on User list talks about AbstractCompactionStrategy but >> I could not find any relevant documentation as its a fairly new feature in >> Cassandra. >> >> Let me state this necessity and use-case again. >> >> I need a ColumnFamily (CF) wide or SuperColumn (SC) wide option to >> approximately limit the number of columns to "n". "n" can vary a lot and the >> intention is to throw away stale data and not to maintain any hard limit on >> the CF or SC. Its very useful for storing time-series data where stale data >> is not necessary. The goal is to achieve this with minimum overhead and >> since compaction happens all the time it would be clever to implement it as >> part of compaction. >> >> Thanks in advance. >> >> Praveen > >
