RE: How to keep only exactly column of key

2011-07-20 Thread Lior Golan
Thanks Sylvain

Can you please point us to what interface should be implemented in order to 
write our own custom compaction. And how is it supposed to be configured?

-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com] 
Sent: Tuesday, July 19, 2011 11:40 AM
To: user@cassandra.apache.org
Subject: Re: How to keep only exactly column of key

On Tue, Jul 19, 2011 at 10:15 AM, Lior Golan lio...@taboola.com wrote:
 Can't this capping be done (approximately) during compaction. 
 Something
 like:

 1.   Ability to define for a column family that it's a capped 
 collection with at most N columns per row

 2.   During write - just add the column

 3.   During reads - get a slice with the most recent / top N 
 column (in terms of column order)

 4.   During compaction - if the number of columns in the row is 
 more than N, trim it to the top N columns (by replacing the rest of 
 the columns with a tombstone in the compacted row)

 Since I guess the purpose of this is for automated cleanup, and not 
 for enforcing exactly N columns, I think this would be sufficient

The problem with that is that we cannot enforce this on the query side.
Or more precisely, returning the top N first columns is fine, but what with 
query like M columns starting from 'b' ? Or columns by name ?
We cannot do those efficiently while enforcing that we won't return any columns 
after the N first ones. The only solution would be to always query the first N 
ones and then filter afterwards, but that's not efficient.

What I mean here is that it is hard to add that as a column family option given 
the limitation it would entail. That being said, 1.0 will add pluggable 
compaction (it's already in trunk) and it will be very easy to have a 
compaction that just drop columns after the N first. It would then be on the 
client side to deal with the possibility to get more that the first N ones, but 
as you said, if it is for automated cleanup, that will be enough.

--
Sylvain

 From: Tupshin Harper [mailto:tups...@tupshin.com]
 Sent: Tuesday, July 19, 2011 10:04 AM
 To: user@cassandra.apache.org
 Subject: Re: How to keep only exactly column of key



 Speaking from practical experience, it is possible to simulate this 
 feature by retrieving a slice of your row that only contains the most 
 recent 100 items. You can then prevent the rows from growing out of 
 control by checking the size of the row and pruning it back to 100 
 every N writes, where N is small enough to prevent excessive growth, 
 but large enough to prevent excessive overhead. A value of 50 or so 
 for N worked reasonably well for me for. If you do go down this path, 
 though, keep in mind that rapid writes and deletes to a single column 
 are basically a Cassandra anti-pattern due to performance problems with huge 
 numbers of tombstones.



 I would love to see a feature added similar to MongoDB's capped 
 collections, but I don't believe there is any easy way to retrofit it 
 into Cassandra's sstable approach.
 http://www.mongodb.org/display/DOCS/Capped+Collections



 -Tupshin

 On Mon, Jul 18, 2011 at 8:22 AM, JKnight JKnight beukni...@gmail.com
 wrote:

 Dear all,



 I want to keep only 100 column of a key: when I add a column for a 
 key, if the number column of key is 100, another column (by order) will be 
 deleted.



 Does Cassandra have setting for that?

 --
 Best regards,
 JKnight






Re: How to keep only exactly column of key

2011-07-19 Thread Tupshin Harper
Speaking from practical experience, it is possible to simulate this feature
by retrieving a slice of your row that only contains the most recent 100
items. You can then prevent the rows from growing out of control by checking
the size of the row and pruning it back to 100 every N writes, where N is
small enough to prevent excessive growth, but large enough to prevent
excessive overhead. A value of 50 or so for N worked reasonably well for me
for. If you do go down this path, though, keep in mind that rapid writes and
deletes to a single column are basically a Cassandra anti-pattern due to
performance problems with huge numbers of tombstones.

I would love to see a feature added similar to MongoDB's capped
collections, but I don't believe there is any easy way to retrofit it into
Cassandra's sstable approach.
http://www.mongodb.org/display/DOCS/Capped+Collections

-Tupshin

On Mon, Jul 18, 2011 at 8:22 AM, JKnight JKnight beukni...@gmail.comwrote:

 Dear all,

 I want to keep only 100 column of a key: when I add a column for a key, if
 the number column of key is 100, another column (by order) will be deleted.

 Does Cassandra have setting for that?

 --
 Best regards,
 JKnight



RE: How to keep only exactly column of key

2011-07-19 Thread Lior Golan
Can't this capping be done (approximately) during compaction. Something like:

1.   Ability to define for a column family that it's a capped collection 
with at most N columns per row

2.   During write - just add the column

3.   During reads - get a slice with the most recent / top N column (in 
terms of column order)

4.   During compaction - if the number of columns in the row is more than 
N, trim it to the top N columns (by replacing the rest of the columns with a 
tombstone in the compacted row)
Since I guess the purpose of this is for automated cleanup, and not for 
enforcing exactly N columns, I think this would be sufficient

From: Tupshin Harper [mailto:tups...@tupshin.com]
Sent: Tuesday, July 19, 2011 10:04 AM
To: user@cassandra.apache.org
Subject: Re: How to keep only exactly column of key

Speaking from practical experience, it is possible to simulate this feature by 
retrieving a slice of your row that only contains the most recent 100 items. 
You can then prevent the rows from growing out of control by checking the size 
of the row and pruning it back to 100 every N writes, where N is small enough 
to prevent excessive growth, but large enough to prevent excessive overhead. A 
value of 50 or so for N worked reasonably well for me for. If you do go down 
this path, though, keep in mind that rapid writes and deletes to a single 
column are basically a Cassandra anti-pattern due to performance problems with 
huge numbers of tombstones.

I would love to see a feature added similar to MongoDB's capped collections, 
but I don't believe there is any easy way to retrofit it into Cassandra's 
sstable approach.
http://www.mongodb.org/display/DOCS/Capped+Collections

-Tupshin
On Mon, Jul 18, 2011 at 8:22 AM, JKnight JKnight 
beukni...@gmail.commailto:beukni...@gmail.com wrote:
Dear all,

I want to keep only 100 column of a key: when I add a column for a key, if the 
number column of key is 100, another column (by order) will be deleted.

Does Cassandra have setting for that?

--
Best regards,
JKnight



Re: How to keep only exactly column of key

2011-07-19 Thread Sylvain Lebresne
On Tue, Jul 19, 2011 at 10:15 AM, Lior Golan lio...@taboola.com wrote:
 Can't this capping be done (approximately) during compaction. Something
 like:

 1.   Ability to define for a column family that it's a capped
 collection with at most N columns per row

 2.   During write – just add the column

 3.   During reads – get a slice with the most recent / top N column (in
 terms of column order)

 4.   During compaction – if the number of columns in the row is more
 than N, trim it to the top N columns (by replacing the rest of the columns
 with a tombstone in the compacted row)

 Since I guess the purpose of this is for automated cleanup, and not for
 enforcing exactly N columns, I think this would be sufficient

The problem with that is that we cannot enforce this on the query side.
Or more precisely, returning the top N first columns is fine, but what with
query like M columns starting from 'b' ? Or columns by name ?
We cannot do those efficiently while enforcing that we won't return any
columns after the N first ones. The only solution would be to always query
the first N ones and then filter afterwards, but that's not efficient.

What I mean here is that it is hard to add that as a column family option
given the limitation it would entail. That being said, 1.0 will add pluggable
compaction (it's already in trunk) and it will be very easy to have a compaction
that just drop columns after the N first. It would then be on the client side
to deal with the possibility to get more that the first N ones, but as you said,
if it is for automated cleanup, that will be enough.

--
Sylvain

 From: Tupshin Harper [mailto:tups...@tupshin.com]
 Sent: Tuesday, July 19, 2011 10:04 AM
 To: user@cassandra.apache.org
 Subject: Re: How to keep only exactly column of key



 Speaking from practical experience, it is possible to simulate this feature
 by retrieving a slice of your row that only contains the most recent 100
 items. You can then prevent the rows from growing out of control by checking
 the size of the row and pruning it back to 100 every N writes, where N is
 small enough to prevent excessive growth, but large enough to prevent
 excessive overhead. A value of 50 or so for N worked reasonably well for me
 for. If you do go down this path, though, keep in mind that rapid writes and
 deletes to a single column are basically a Cassandra anti-pattern due to
 performance problems with huge numbers of tombstones.



 I would love to see a feature added similar to MongoDB's capped
 collections, but I don't believe there is any easy way to retrofit it into
 Cassandra's sstable approach.
 http://www.mongodb.org/display/DOCS/Capped+Collections



 -Tupshin

 On Mon, Jul 18, 2011 at 8:22 AM, JKnight JKnight beukni...@gmail.com
 wrote:

 Dear all,



 I want to keep only 100 column of a key: when I add a column for a key, if
 the number column of key is 100, another column (by order) will be deleted.



 Does Cassandra have setting for that?

 --
 Best regards,
 JKnight




Re: How to keep only exactly column of key

2011-07-18 Thread aaron morton
There is no support for a feature like that, and i doubt it would ever be 
supported. For one there there are no locks during a write, so it's not 
possible to definitively say there are 100 columns at a particular instance of 
time. 

You would need to read all columns and delete the ones you no longer need.

You could also try Redis. 

Cheers

  
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 19 Jul 2011, at 03:22, JKnight JKnight wrote:

 Dear all, 
 
 I want to keep only 100 column of a key: when I add a column for a key, if 
 the number column of key is 100, another column (by order) will be deleted. 
 
 Does Cassandra have setting for that?
 
 -- 
 Best regards,
 JKnight



Re: How to keep only exactly column of key

2011-07-18 Thread Jonathan Ellis
You can use expiring columns to say only the last N seconds' of
data.  But not counts per se for the reasons Aaron gave.

On Mon, Jul 18, 2011 at 10:22 AM, JKnight JKnight beukni...@gmail.com wrote:
 Dear all,
 I want to keep only 100 column of a key: when I add a column for a key, if
 the number column of key is 100, another column (by order) will be deleted.
 Does Cassandra have setting for that?

 --
 Best regards,
 JKnight




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com