Thanks Sylvain

Can you please point us to what interface should be implemented in order to 
write our own custom compaction. And how is it supposed to be configured?

-----Original Message-----
From: Sylvain Lebresne [mailto:sylv...@datastax.com] 
Sent: Tuesday, July 19, 2011 11:40 AM
To: user@cassandra.apache.org
Subject: Re: How to keep only exactly column of key

On Tue, Jul 19, 2011 at 10:15 AM, Lior Golan <lio...@taboola.com> wrote:
> Can't this capping be done (approximately) during compaction. 
> Something
> like:
>
> 1.       Ability to define for a column family that it's a "capped 
> collection" with at most N columns per row
>
> 2.       During write - just add the column
>
> 3.       During reads - get a slice with the most recent / top N 
> column (in terms of column order)
>
> 4.       During compaction - if the number of columns in the row is 
> more than N, trim it to the top N columns (by replacing the rest of 
> the columns with a tombstone in the compacted row)
>
> Since I guess the purpose of this is for automated cleanup, and not 
> for enforcing exactly N columns, I think this would be sufficient

The problem with that is that we cannot enforce this on the query side.
Or more precisely, returning the top N first columns is fine, but what with 
query like "M columns starting from 'b'" ? Or columns by name ?
We cannot do those efficiently while enforcing that we won't return any columns 
after the N first ones. The only solution would be to always query the first N 
ones and then filter afterwards, but that's not efficient.

What I mean here is that it is hard to add that as a column family option given 
the limitation it would entail. That being said, 1.0 will add pluggable 
compaction (it's already in trunk) and it will be very easy to have a 
compaction that just drop columns after the N first. It would then be on the 
client side to deal with the possibility to get more that the first N ones, but 
as you said, if it is for automated cleanup, that will be enough.

--
Sylvain

> From: Tupshin Harper [mailto:tups...@tupshin.com]
> Sent: Tuesday, July 19, 2011 10:04 AM
> To: user@cassandra.apache.org
> Subject: Re: How to keep only exactly column of key
>
>
>
> Speaking from practical experience, it is possible to simulate this 
> feature by retrieving a slice of your row that only contains the most 
> recent 100 items. You can then prevent the rows from growing out of 
> control by checking the size of the row and pruning it back to 100 
> every N writes, where N is small enough to prevent excessive growth, 
> but large enough to prevent excessive overhead. A value of 50 or so 
> for N worked reasonably well for me for. If you do go down this path, 
> though, keep in mind that rapid writes and deletes to a single column 
> are basically a Cassandra anti-pattern due to performance problems with huge 
> numbers of tombstones.
>
>
>
> I would love to see a feature added similar to MongoDB's "capped 
> collections", but I don't believe there is any easy way to retrofit it 
> into Cassandra's sstable approach.
> http://www.mongodb.org/display/DOCS/Capped+Collections
>
>
>
> -Tupshin
>
> On Mon, Jul 18, 2011 at 8:22 AM, JKnight JKnight <beukni...@gmail.com>
> wrote:
>
> Dear all,
>
>
>
> I want to keep only 100 column of a key: when I add a column for a 
> key, if the number column of key is 100, another column (by order) will be 
> deleted.
>
>
>
> Does Cassandra have setting for that?
>
> --
> Best regards,
> JKnight
>
>


Reply via email to