RE: How to keep only exactly column of key
Thanks Sylvain Can you please point us to what interface should be implemented in order to write our own custom compaction. And how is it supposed to be configured? -Original Message- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: Tuesday, July 19, 2011 11:40 AM To: user@cassandra.apache.org Subject: Re: How to keep only exactly column of key On Tue, Jul 19, 2011 at 10:15 AM, Lior Golan lio...@taboola.com wrote: Can't this capping be done (approximately) during compaction. Something like: 1. Ability to define for a column family that it's a capped collection with at most N columns per row 2. During write - just add the column 3. During reads - get a slice with the most recent / top N column (in terms of column order) 4. During compaction - if the number of columns in the row is more than N, trim it to the top N columns (by replacing the rest of the columns with a tombstone in the compacted row) Since I guess the purpose of this is for automated cleanup, and not for enforcing exactly N columns, I think this would be sufficient The problem with that is that we cannot enforce this on the query side. Or more precisely, returning the top N first columns is fine, but what with query like M columns starting from 'b' ? Or columns by name ? We cannot do those efficiently while enforcing that we won't return any columns after the N first ones. The only solution would be to always query the first N ones and then filter afterwards, but that's not efficient. What I mean here is that it is hard to add that as a column family option given the limitation it would entail. That being said, 1.0 will add pluggable compaction (it's already in trunk) and it will be very easy to have a compaction that just drop columns after the N first. It would then be on the client side to deal with the possibility to get more that the first N ones, but as you said, if it is for automated cleanup, that will be enough. -- Sylvain From: Tupshin Harper [mailto:tups...@tupshin.com] Sent: Tuesday, July 19, 2011 10:04 AM To: user@cassandra.apache.org Subject: Re: How to keep only exactly column of key Speaking from practical experience, it is possible to simulate this feature by retrieving a slice of your row that only contains the most recent 100 items. You can then prevent the rows from growing out of control by checking the size of the row and pruning it back to 100 every N writes, where N is small enough to prevent excessive growth, but large enough to prevent excessive overhead. A value of 50 or so for N worked reasonably well for me for. If you do go down this path, though, keep in mind that rapid writes and deletes to a single column are basically a Cassandra anti-pattern due to performance problems with huge numbers of tombstones. I would love to see a feature added similar to MongoDB's capped collections, but I don't believe there is any easy way to retrofit it into Cassandra's sstable approach. http://www.mongodb.org/display/DOCS/Capped+Collections -Tupshin On Mon, Jul 18, 2011 at 8:22 AM, JKnight JKnight beukni...@gmail.com wrote: Dear all, I want to keep only 100 column of a key: when I add a column for a key, if the number column of key is 100, another column (by order) will be deleted. Does Cassandra have setting for that? -- Best regards, JKnight
Re: How to keep only exactly column of key
Speaking from practical experience, it is possible to simulate this feature by retrieving a slice of your row that only contains the most recent 100 items. You can then prevent the rows from growing out of control by checking the size of the row and pruning it back to 100 every N writes, where N is small enough to prevent excessive growth, but large enough to prevent excessive overhead. A value of 50 or so for N worked reasonably well for me for. If you do go down this path, though, keep in mind that rapid writes and deletes to a single column are basically a Cassandra anti-pattern due to performance problems with huge numbers of tombstones. I would love to see a feature added similar to MongoDB's capped collections, but I don't believe there is any easy way to retrofit it into Cassandra's sstable approach. http://www.mongodb.org/display/DOCS/Capped+Collections -Tupshin On Mon, Jul 18, 2011 at 8:22 AM, JKnight JKnight beukni...@gmail.comwrote: Dear all, I want to keep only 100 column of a key: when I add a column for a key, if the number column of key is 100, another column (by order) will be deleted. Does Cassandra have setting for that? -- Best regards, JKnight
RE: How to keep only exactly column of key
Can't this capping be done (approximately) during compaction. Something like: 1. Ability to define for a column family that it's a capped collection with at most N columns per row 2. During write - just add the column 3. During reads - get a slice with the most recent / top N column (in terms of column order) 4. During compaction - if the number of columns in the row is more than N, trim it to the top N columns (by replacing the rest of the columns with a tombstone in the compacted row) Since I guess the purpose of this is for automated cleanup, and not for enforcing exactly N columns, I think this would be sufficient From: Tupshin Harper [mailto:tups...@tupshin.com] Sent: Tuesday, July 19, 2011 10:04 AM To: user@cassandra.apache.org Subject: Re: How to keep only exactly column of key Speaking from practical experience, it is possible to simulate this feature by retrieving a slice of your row that only contains the most recent 100 items. You can then prevent the rows from growing out of control by checking the size of the row and pruning it back to 100 every N writes, where N is small enough to prevent excessive growth, but large enough to prevent excessive overhead. A value of 50 or so for N worked reasonably well for me for. If you do go down this path, though, keep in mind that rapid writes and deletes to a single column are basically a Cassandra anti-pattern due to performance problems with huge numbers of tombstones. I would love to see a feature added similar to MongoDB's capped collections, but I don't believe there is any easy way to retrofit it into Cassandra's sstable approach. http://www.mongodb.org/display/DOCS/Capped+Collections -Tupshin On Mon, Jul 18, 2011 at 8:22 AM, JKnight JKnight beukni...@gmail.commailto:beukni...@gmail.com wrote: Dear all, I want to keep only 100 column of a key: when I add a column for a key, if the number column of key is 100, another column (by order) will be deleted. Does Cassandra have setting for that? -- Best regards, JKnight
Re: How to keep only exactly column of key
On Tue, Jul 19, 2011 at 10:15 AM, Lior Golan lio...@taboola.com wrote: Can't this capping be done (approximately) during compaction. Something like: 1. Ability to define for a column family that it's a capped collection with at most N columns per row 2. During write – just add the column 3. During reads – get a slice with the most recent / top N column (in terms of column order) 4. During compaction – if the number of columns in the row is more than N, trim it to the top N columns (by replacing the rest of the columns with a tombstone in the compacted row) Since I guess the purpose of this is for automated cleanup, and not for enforcing exactly N columns, I think this would be sufficient The problem with that is that we cannot enforce this on the query side. Or more precisely, returning the top N first columns is fine, but what with query like M columns starting from 'b' ? Or columns by name ? We cannot do those efficiently while enforcing that we won't return any columns after the N first ones. The only solution would be to always query the first N ones and then filter afterwards, but that's not efficient. What I mean here is that it is hard to add that as a column family option given the limitation it would entail. That being said, 1.0 will add pluggable compaction (it's already in trunk) and it will be very easy to have a compaction that just drop columns after the N first. It would then be on the client side to deal with the possibility to get more that the first N ones, but as you said, if it is for automated cleanup, that will be enough. -- Sylvain From: Tupshin Harper [mailto:tups...@tupshin.com] Sent: Tuesday, July 19, 2011 10:04 AM To: user@cassandra.apache.org Subject: Re: How to keep only exactly column of key Speaking from practical experience, it is possible to simulate this feature by retrieving a slice of your row that only contains the most recent 100 items. You can then prevent the rows from growing out of control by checking the size of the row and pruning it back to 100 every N writes, where N is small enough to prevent excessive growth, but large enough to prevent excessive overhead. A value of 50 or so for N worked reasonably well for me for. If you do go down this path, though, keep in mind that rapid writes and deletes to a single column are basically a Cassandra anti-pattern due to performance problems with huge numbers of tombstones. I would love to see a feature added similar to MongoDB's capped collections, but I don't believe there is any easy way to retrofit it into Cassandra's sstable approach. http://www.mongodb.org/display/DOCS/Capped+Collections -Tupshin On Mon, Jul 18, 2011 at 8:22 AM, JKnight JKnight beukni...@gmail.com wrote: Dear all, I want to keep only 100 column of a key: when I add a column for a key, if the number column of key is 100, another column (by order) will be deleted. Does Cassandra have setting for that? -- Best regards, JKnight
Re: How to keep only exactly column of key
There is no support for a feature like that, and i doubt it would ever be supported. For one there there are no locks during a write, so it's not possible to definitively say there are 100 columns at a particular instance of time. You would need to read all columns and delete the ones you no longer need. You could also try Redis. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 19 Jul 2011, at 03:22, JKnight JKnight wrote: Dear all, I want to keep only 100 column of a key: when I add a column for a key, if the number column of key is 100, another column (by order) will be deleted. Does Cassandra have setting for that? -- Best regards, JKnight
Re: How to keep only exactly column of key
You can use expiring columns to say only the last N seconds' of data. But not counts per se for the reasons Aaron gave. On Mon, Jul 18, 2011 at 10:22 AM, JKnight JKnight beukni...@gmail.com wrote: Dear all, I want to keep only 100 column of a key: when I add a column for a key, if the number column of key is 100, another column (by order) will be deleted. Does Cassandra have setting for that? -- Best regards, JKnight -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com