During compaction, both automatic / minor and manual / major. 

The performance drop is having a lot of expired columns that have not been 
purged by compaction as they must be read and discarded during reads. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 3/01/2012, at 10:38 AM, R. Verlangen wrote:

> @Aaron: Small side question, when do columns with a past TTL get removed? On 
> a repair, (minor) compaction, or .. ? Does it have a performance drop if 
> that's happening?
> 
> 2012/1/2 aaron morton <[email protected]>
> Even if you had compaction enforcing a limit on the number of columns in a 
> row, there would still be issues with concurrent writes at the same time and 
> with read-repair. i.e. node a says the this is the first n columns but node b 
> says something else, you only know who is correct at read time.
> 
> Have you considered using a TTL on the columns ? 
> 
> Depending on the use case you could also consider have writes periodically or 
> randomly trim the data size, or trim on reads. 
> 
> It will also make sense to partition the time series data into different 
> rows, and Viva la Standard Column Families!
> 
> Hope that helps. 
>  
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 25/12/2011, at 7:48 PM, Praveen Baratam wrote:
> 
>> Hello Everybody,
>> 
>> Happy Christmas.
>> 
>> I know that this topic has come up quiet a few times on Dev and User lists 
>> but did not culminate into a solution.
>> 
>> http://www.mail-archive.com/[email protected]/msg15367.html
>> 
>> The above discussion on User list talks about AbstractCompactionStrategy but 
>> I could not find any relevant documentation as its a fairly new feature in 
>> Cassandra.
>> 
>> Let me state this necessity and use-case again.
>> 
>> I need a ColumnFamily (CF) wide or SuperColumn (SC) wide option to 
>> approximately limit the number of columns to "n". "n" can vary a lot and the 
>> intention is to throw away stale data and not to maintain any hard limit on 
>> the CF or SC. Its very useful for storing time-series data where stale data 
>> is not necessary. The goal is to achieve this with minimum overhead and 
>> since compaction happens all the time it would be clever to implement it as 
>> part of compaction.
>> 
>> Thanks in advance.
>> 
>> Praveen
> 
> 

Reply via email to