Re: Second Cassandra users survey
It took some time to gather our requirements and to check what are our most important needs. However, here they are: * Column position range queries: We would like to access columns not by their name, but by their position in the row. Example: row(A:v1, B:v2, C:v3, D:v4); ; ordered by UTF8TYPE Example query: Give all elements with position range 1..3 would return (B:v2, C:v3, D:v4) * Arbitrary position range queries: We would like to access arbitrary colums by their position in the row: Example query: Give all elements with positions (0, 3, 1) would return (B:v2, D:v4, A:v1) * Security for client-server communication (thrift): A big benefit for all users of cassandra which deploy the cluster into untrusted environments (Amazon EC2 etc.) would be the possibility to secure the client-server communication with SSL. This has already been implemented in Thrift (see https://issues.apache.org/jira/browse/THRIFT-106) and must probably be added to CassandraDaemon.ThriftServer. Kind regards Arne and Matthias On 11/01/2011 11:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
Re: Second Cassandra users survey
Ability to mix counter columns normal columns in same column family. On Thu, Nov 17, 2011 at 6:46 PM, Boris Yen yulin...@gmail.com wrote: I was wondering if it is possible to provide a funtion like delete from cf where column='value' I think this shold be useful for people who use secondary index a lot. On Nov 15, 2011 11:05 AM, Edward Ribeiro edward.ribe...@gmail.com wrote: +1 on co-processors. Edward
Re: Second Cassandra users survey
- It would be super cool if all of that counter work made it possible to support other atomic data types (sets? CAS? just pass a assoc/commun Function to apply). - Again with types, pluggable type specific compression. - Wishy washy wish: Simpler elasticity I would like to go from 6--8--7 nodes without each of those being an annoying fight with tokens. - Gossip as library. Gossip/failure detection is something C* seems to have gotten particularly right (or at least it's something that has not needed to change much). It would be cool to use Cassandra's gossip protocol as distributed systems building tool a la ZooKeeper. On 11/01/2011 06:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
Re: Second Cassandra users survey
Re Simpler elasticity: Latest opscenter will now rebalance cluster optimally http://www.datastax.com/dev/blog/whats-new-in-opscenter-1-3 /plug -Jake On Mon, Nov 14, 2011 at 7:27 PM, Chris Burroughs chris.burrou...@gmail.comwrote: - It would be super cool if all of that counter work made it possible to support other atomic data types (sets? CAS? just pass a assoc/commun Function to apply). - Again with types, pluggable type specific compression. - Wishy washy wish: Simpler elasticity I would like to go from 6--8--7 nodes without each of those being an annoying fight with tokens. - Gossip as library. Gossip/failure detection is something C* seems to have gotten particularly right (or at least it's something that has not needed to change much). It would be cool to use Cassandra's gossip protocol as distributed systems building tool a la ZooKeeper. On 11/01/2011 06:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- http://twitter.com/tjake
Re: Second Cassandra users survey
On Mon, Nov 14, 2011 at 4:44 PM, Jake Luciani jak...@gmail.com wrote: Re Simpler elasticity: Latest opscenter will now rebalance cluster optimally http://www.datastax.com/dev/blog/whats-new-in-opscenter-1-3 /plug Does it cause any impact on reads and writes while re-balance is in progress? How is it handled on live cluster? -Jake On Mon, Nov 14, 2011 at 7:27 PM, Chris Burroughs chris.burrou...@gmail.com wrote: - It would be super cool if all of that counter work made it possible to support other atomic data types (sets? CAS? just pass a assoc/commun Function to apply). - Again with types, pluggable type specific compression. - Wishy washy wish: Simpler elasticity I would like to go from 6--8--7 nodes without each of those being an annoying fight with tokens. - Gossip as library. Gossip/failure detection is something C* seems to have gotten particularly right (or at least it's something that has not needed to change much). It would be cool to use Cassandra's gossip protocol as distributed systems building tool a la ZooKeeper. On 11/01/2011 06:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- http://twitter.com/tjake
Re: Second Cassandra users survey
+1 on coprocessors On Mon, Nov 14, 2011 at 6:51 PM, Mohit Anchlia mohitanch...@gmail.comwrote: On Mon, Nov 14, 2011 at 4:44 PM, Jake Luciani jak...@gmail.com wrote: Re Simpler elasticity: Latest opscenter will now rebalance cluster optimally http://www.datastax.com/dev/blog/whats-new-in-opscenter-1-3 /plug Does it cause any impact on reads and writes while re-balance is in progress? How is it handled on live cluster? -Jake On Mon, Nov 14, 2011 at 7:27 PM, Chris Burroughs chris.burrou...@gmail.com wrote: - It would be super cool if all of that counter work made it possible to support other atomic data types (sets? CAS? just pass a assoc/commun Function to apply). - Again with types, pluggable type specific compression. - Wishy washy wish: Simpler elasticity I would like to go from 6--8--7 nodes without each of those being an annoying fight with tokens. - Gossip as library. Gossip/failure detection is something C* seems to have gotten particularly right (or at least it's something that has not needed to change much). It would be cool to use Cassandra's gossip protocol as distributed systems building tool a la ZooKeeper. On 11/01/2011 06:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- http://twitter.com/tjake
Re: Second Cassandra users survey
oh yeah, one more BIG one.in memory writes with asynch write-behind to disk like cassandra does for speed. So if you have atomic locking, it writes to the primary node(memory) and some other node(memory) and returns with success to the client. asynch then writes to disk later. This prove to be very fast and 2 machines make it pretty reliable and of course it is asynchronously writing to that third or fourth machine depending on replication factor. later, Dean On Mon, Nov 14, 2011 at 6:59 PM, Dean Hiller d...@alvazan.com wrote: +1 on coprocessors On Mon, Nov 14, 2011 at 6:51 PM, Mohit Anchlia mohitanch...@gmail.comwrote: On Mon, Nov 14, 2011 at 4:44 PM, Jake Luciani jak...@gmail.com wrote: Re Simpler elasticity: Latest opscenter will now rebalance cluster optimally http://www.datastax.com/dev/blog/whats-new-in-opscenter-1-3 /plug Does it cause any impact on reads and writes while re-balance is in progress? How is it handled on live cluster? -Jake On Mon, Nov 14, 2011 at 7:27 PM, Chris Burroughs chris.burrou...@gmail.com wrote: - It would be super cool if all of that counter work made it possible to support other atomic data types (sets? CAS? just pass a assoc/commun Function to apply). - Again with types, pluggable type specific compression. - Wishy washy wish: Simpler elasticity I would like to go from 6--8--7 nodes without each of those being an annoying fight with tokens. - Gossip as library. Gossip/failure detection is something C* seems to have gotten particularly right (or at least it's something that has not needed to change much). It would be cool to use Cassandra's gossip protocol as distributed systems building tool a la ZooKeeper. On 11/01/2011 06:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- http://twitter.com/tjake
Re: Second Cassandra users survey
+1 on co-processors. Edward
Re: Second Cassandra users survey
Lately I've been working on some data processing code in Cassandra and apparently I don't write bug-free code the very first time. :) Hence, while debugging, I often need to look at data in Cassandra to see what my code is doing/should be finding, etc. This turns out to be harder then it should be IMHO. Anyways, what I'd like is a more powerful cqlsh or similar tool which can: 1. List the rows in a CF (no column data), optionally within a given range of keys 2. Count the number of columns in a row and within a range of values for that row 3. Return only the column names for a given row (no values) 4. Support CentOS 5 (currently uses python 2.4, but cqlsh-1.0.5 requires = 2.5. 1.0.3 worked fine on 2.4) 5. Support some basic transformations of data: * return up to the first X bytes of a given value * return length of value in bytes instead of value 6. Alternatively print data in the format of something like this to make it easier to read: RowKey1: \tname = value \tname = value ... RowKey2: ... 7. For BytesType provide an option to print values all values in hex. Ie: no mixed ASCII + \x encoding. Just 0x0a3b... Frankly, I know some of this isn't efficient for the server to do, but the client could do that. I really don't care too much about performance since this is a debugging/diagnostics tool. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: Second Cassandra users survey
It seems like you could use a composite key partioner to accomplish this On Monday, November 7, 2011, Daniel Doubleday daniel.double...@gmx.net wrote: Allow for deterministic / manual sharding of rows. Right now it seems that there is no way to force rows with different row keys will be stored on the same nodes in the ring. This is our number one reason why we get data inconsistencies when nodes fail. Sometimes a logical transaction requires writing rows with different row keys. If we could use something like this: prefix.uniquekey and let the partitioner use only the prefix the probability that only part of the transaction would be written could be reduced considerably. On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Second Cassandra users survey
Hi Todd, Entity Groups : https://issues.apache.org/jira/browse/CASSANDRA-1684 -Jake On Wed, Nov 9, 2011 at 6:44 AM, Todd Burruss bburr...@expedia.com wrote: I believe I heard someone talk at Cassandra SF conference about creating a partitioner that was a derivation of RandomPartitioner. It essentially would look for keys that adhere to a certain pattern, like key:subkey. The key portion would be used for determining the location on the ring, but key:subkey for actually storing. This would allow groups of data (all having the same key) to reside on the same node, while still maintaining uniqueness across the entire keyspace. Unbalanced nodes could still occur, but I don't think any worse than wide/large rows can cause. On 11/8/11 1:29 AM, Daniel Doubleday daniel.double...@gmx.net wrote: Ah cool - thanks for the pointer! On Nov 7, 2011, at 5:25 PM, Ed Anuff wrote: This is basically what entity groups are about - https://issues.apache.org/jira/browse/CASSANDRA-1684 On Mon, Nov 7, 2011 at 5:26 AM, Peter Lin wool...@gmail.com wrote: This feature interests me, so I thought I'd add some comments. Having used partition features in existing databases like DB2, Oracle and manual partitioning, one of the biggest challenges is keeping the partitions balanced. What I've seen with manual partitioning is that often the partitions get unbalanced. Usually the developers take a best guess and hope it ends up balanced. Some of the approaches I've used in the past were zip code, area code, state and some kind of hash. So my question related deterministic sharding is this, what rebalance feature(s) would be useful or needed once the partitions get unbalanced? Without a decent plan for rebalancing, it often ends up being a very painful problem to solve in production. Back when I worked mobile apps, we saw issues with how OpenWave WAP servers partitioned the accounts. The early versions randomly assigned a phone to a server when it is provisioned the first time. Once the phone was associated to that server, it was stuck on that server. If the load on that server was heavier than the others, the only choice was to scale up the hardware. My understanding of Cassandra's current sharding is consistent and random. Does the new feature sit some where in-between? Are you thinking of a pluggable API so that you can provide your own hash algorithm for cassandra to use? On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday daniel.double...@gmx.net wrote: Allow for deterministic / manual sharding of rows. Right now it seems that there is no way to force rows with different row keys will be stored on the same nodes in the ring. This is our number one reason why we get data inconsistencies when nodes fail. Sometimes a logical transaction requires writing rows with different row keys. If we could use something like this: prefix.uniquekey and let the partitioner use only the prefix the probability that only part of the transaction would be written could be reduced considerably. On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg0114 8.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg014 46.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- http://twitter.com/tjake
Re: Second Cassandra users survey
I think this was already asked for, but you can add my vote for TTL support for Counters. On Tue, Nov 1, 2011 at 3:59 PM, Jonathan Ellis jbel...@gmail.com wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: Second Cassandra users survey
Thx jake for the JIRA, but there was someone at the conference that had already implemented what I mentioned. It didn't offer any atomicity, just co-locating a family of data on the same node. From: Jake Luciani jak...@gmail.commailto:jak...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wed, 9 Nov 2011 02:53:20 -0800 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Second Cassandra users survey Hi Todd, Entity Groups : https://issues.apache.org/jira/browse/CASSANDRA-1684 -Jake On Wed, Nov 9, 2011 at 6:44 AM, Todd Burruss bburr...@expedia.commailto:bburr...@expedia.com wrote: I believe I heard someone talk at Cassandra SF conference about creating a partitioner that was a derivation of RandomPartitioner. It essentially would look for keys that adhere to a certain pattern, like key:subkey. The key portion would be used for determining the location on the ring, but key:subkey for actually storing. This would allow groups of data (all having the same key) to reside on the same node, while still maintaining uniqueness across the entire keyspace. Unbalanced nodes could still occur, but I don't think any worse than wide/large rows can cause. On 11/8/11 1:29 AM, Daniel Doubleday daniel.double...@gmx.netmailto:daniel.double...@gmx.net wrote: Ah cool - thanks for the pointer! On Nov 7, 2011, at 5:25 PM, Ed Anuff wrote: This is basically what entity groups are about - https://issues.apache.org/jira/browse/CASSANDRA-1684 On Mon, Nov 7, 2011 at 5:26 AM, Peter Lin wool...@gmail.commailto:wool...@gmail.com wrote: This feature interests me, so I thought I'd add some comments. Having used partition features in existing databases like DB2, Oracle and manual partitioning, one of the biggest challenges is keeping the partitions balanced. What I've seen with manual partitioning is that often the partitions get unbalanced. Usually the developers take a best guess and hope it ends up balanced. Some of the approaches I've used in the past were zip code, area code, state and some kind of hash. So my question related deterministic sharding is this, what rebalance feature(s) would be useful or needed once the partitions get unbalanced? Without a decent plan for rebalancing, it often ends up being a very painful problem to solve in production. Back when I worked mobile apps, we saw issues with how OpenWave WAP servers partitioned the accounts. The early versions randomly assigned a phone to a server when it is provisioned the first time. Once the phone was associated to that server, it was stuck on that server. If the load on that server was heavier than the others, the only choice was to scale up the hardware. My understanding of Cassandra's current sharding is consistent and random. Does the new feature sit some where in-between? Are you thinking of a pluggable API so that you can provide your own hash algorithm for cassandra to use? On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday daniel.double...@gmx.netmailto:daniel.double...@gmx.net wrote: Allow for deterministic / manual sharding of rows. Right now it seems that there is no way to force rows with different row keys will be stored on the same nodes in the ring. This is our number one reason why we get data inconsistencies when nodes fail. Sometimes a logical transaction requires writing rows with different row keys. If we could use something like this: prefix.uniquekey and let the partitioner use only the prefix the probability that only part of the transaction would be written could be reduced considerably. On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg0114 8.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg014 46.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- http://twitter.com/tjake
Re: Second Cassandra users survey
Solandra does this https://github.com/tjake/Solandra/blob/solandra/src/lucandra/dht/RandomPartitioner.java But Row Groups is going to be the official way. -Jake On Wed, Nov 9, 2011 at 5:53 PM, Todd Burruss bburr...@expedia.com wrote: Thx jake for the JIRA, but there was someone at the conference that had already implemented what I mentioned. It didn't offer any atomicity, just co-locating a family of data on the same node. From: Jake Luciani jak...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Wed, 9 Nov 2011 02:53:20 -0800 To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Second Cassandra users survey Hi Todd, Entity Groups : https://issues.apache.org/jira/browse/CASSANDRA-1684 -Jake On Wed, Nov 9, 2011 at 6:44 AM, Todd Burruss bburr...@expedia.com wrote: I believe I heard someone talk at Cassandra SF conference about creating a partitioner that was a derivation of RandomPartitioner. It essentially would look for keys that adhere to a certain pattern, like key:subkey. The key portion would be used for determining the location on the ring, but key:subkey for actually storing. This would allow groups of data (all having the same key) to reside on the same node, while still maintaining uniqueness across the entire keyspace. Unbalanced nodes could still occur, but I don't think any worse than wide/large rows can cause. On 11/8/11 1:29 AM, Daniel Doubleday daniel.double...@gmx.net wrote: Ah cool - thanks for the pointer! On Nov 7, 2011, at 5:25 PM, Ed Anuff wrote: This is basically what entity groups are about - https://issues.apache.org/jira/browse/CASSANDRA-1684 On Mon, Nov 7, 2011 at 5:26 AM, Peter Lin wool...@gmail.com wrote: This feature interests me, so I thought I'd add some comments. Having used partition features in existing databases like DB2, Oracle and manual partitioning, one of the biggest challenges is keeping the partitions balanced. What I've seen with manual partitioning is that often the partitions get unbalanced. Usually the developers take a best guess and hope it ends up balanced. Some of the approaches I've used in the past were zip code, area code, state and some kind of hash. So my question related deterministic sharding is this, what rebalance feature(s) would be useful or needed once the partitions get unbalanced? Without a decent plan for rebalancing, it often ends up being a very painful problem to solve in production. Back when I worked mobile apps, we saw issues with how OpenWave WAP servers partitioned the accounts. The early versions randomly assigned a phone to a server when it is provisioned the first time. Once the phone was associated to that server, it was stuck on that server. If the load on that server was heavier than the others, the only choice was to scale up the hardware. My understanding of Cassandra's current sharding is consistent and random. Does the new feature sit some where in-between? Are you thinking of a pluggable API so that you can provide your own hash algorithm for cassandra to use? On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday daniel.double...@gmx.net wrote: Allow for deterministic / manual sharding of rows. Right now it seems that there is no way to force rows with different row keys will be stored on the same nodes in the ring. This is our number one reason why we get data inconsistencies when nodes fail. Sometimes a logical transaction requires writing rows with different row keys. If we could use something like this: prefix.uniquekey and let the partitioner use only the prefix the probability that only part of the transaction would be written could be reduced considerably. On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg0114 8.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg014 46.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- http://twitter.com/tjake -- http://twitter.com/tjake
Re: Second Cassandra users survey
My wish list: 1) Conditional updates: if a column has a value then put column in the column family atomically else fail. 2) getAndSet: on counters: a separate API 3) Revert the count when client disconnects or receives a exception (so they can safely retry). 4) Something like a freeze API for updates to a row/CF (this can be used as a lock). Regards, /VJ On Tue, Nov 1, 2011 at 3:59 PM, Jonathan Ellis jbel...@gmail.com wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Second Cassandra users survey
Ah cool - thanks for the pointer! On Nov 7, 2011, at 5:25 PM, Ed Anuff wrote: This is basically what entity groups are about - https://issues.apache.org/jira/browse/CASSANDRA-1684 On Mon, Nov 7, 2011 at 5:26 AM, Peter Lin wool...@gmail.com wrote: This feature interests me, so I thought I'd add some comments. Having used partition features in existing databases like DB2, Oracle and manual partitioning, one of the biggest challenges is keeping the partitions balanced. What I've seen with manual partitioning is that often the partitions get unbalanced. Usually the developers take a best guess and hope it ends up balanced. Some of the approaches I've used in the past were zip code, area code, state and some kind of hash. So my question related deterministic sharding is this, what rebalance feature(s) would be useful or needed once the partitions get unbalanced? Without a decent plan for rebalancing, it often ends up being a very painful problem to solve in production. Back when I worked mobile apps, we saw issues with how OpenWave WAP servers partitioned the accounts. The early versions randomly assigned a phone to a server when it is provisioned the first time. Once the phone was associated to that server, it was stuck on that server. If the load on that server was heavier than the others, the only choice was to scale up the hardware. My understanding of Cassandra's current sharding is consistent and random. Does the new feature sit some where in-between? Are you thinking of a pluggable API so that you can provide your own hash algorithm for cassandra to use? On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday daniel.double...@gmx.net wrote: Allow for deterministic / manual sharding of rows. Right now it seems that there is no way to force rows with different row keys will be stored on the same nodes in the ring. This is our number one reason why we get data inconsistencies when nodes fail. Sometimes a logical transaction requires writing rows with different row keys. If we could use something like this: prefix.uniquekey and let the partitioner use only the prefix the probability that only part of the transaction would be written could be reduced considerably. On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Second Cassandra users survey
A use case that could use this (but isn't in my top requests) is usage history for a given user. I use a single row to save history per user, each column is a user action with name a TimeUUID and value is a blob. I use the TimeUUID to sort the actions, but I don't really care about exact time. after the number of user actions exceeds a threshold, I want to remove enough to bring the actions back below the threshold. I could model this as you say and remove chunks of actions by deleting a row, but that is more cumbersome for the client. The reason this isn't in my top requests is regardless if the client or the server performs this sort of delete, the row must first be read. By having the server do it, a network hop is saved as well as implementing a common usage pattern. On 11/5/11 2:45 PM, Brandon Williams dri...@gmail.com wrote: On Fri, Nov 4, 2011 at 9:50 PM, Jim Newsham jnews...@referentia.com wrote: Our use case is time-series data (such as sampled sensor data). Each row describes a particular statistic over time, the column name is a time, and the column value is the sample. So it makes perfect sense to want to delete columns for a given time range. I'm sure there must be numerous other use cases for which using a range of column names makes sense. Assuming you are bucketing your rows at some interval (as in http://rubyscale.com/2011/basic-time-series-with-cassandra/), why is deleting the entire row for the interval not acceptable? -Brandon
Re: Second Cassandra users survey
Take a look at this: http://www.oracle.com/technetwork/database/nosqldb/overview/index.html I understand the limitation/advantages of the architecture. Read this http://en.wikipedia.org/wiki/CAP_theorem
Re: Second Cassandra users survey
Allow for deterministic / manual sharding of rows. Right now it seems that there is no way to force rows with different row keys will be stored on the same nodes in the ring. This is our number one reason why we get data inconsistencies when nodes fail. Sometimes a logical transaction requires writing rows with different row keys. If we could use something like this: prefix.uniquekey and let the partitioner use only the prefix the probability that only part of the transaction would be written could be reduced considerably. On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Second Cassandra users survey
This feature interests me, so I thought I'd add some comments. Having used partition features in existing databases like DB2, Oracle and manual partitioning, one of the biggest challenges is keeping the partitions balanced. What I've seen with manual partitioning is that often the partitions get unbalanced. Usually the developers take a best guess and hope it ends up balanced. Some of the approaches I've used in the past were zip code, area code, state and some kind of hash. So my question related deterministic sharding is this, what rebalance feature(s) would be useful or needed once the partitions get unbalanced? Without a decent plan for rebalancing, it often ends up being a very painful problem to solve in production. Back when I worked mobile apps, we saw issues with how OpenWave WAP servers partitioned the accounts. The early versions randomly assigned a phone to a server when it is provisioned the first time. Once the phone was associated to that server, it was stuck on that server. If the load on that server was heavier than the others, the only choice was to scale up the hardware. My understanding of Cassandra's current sharding is consistent and random. Does the new feature sit some where in-between? Are you thinking of a pluggable API so that you can provide your own hash algorithm for cassandra to use? On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday daniel.double...@gmx.net wrote: Allow for deterministic / manual sharding of rows. Right now it seems that there is no way to force rows with different row keys will be stored on the same nodes in the ring. This is our number one reason why we get data inconsistencies when nodes fail. Sometimes a logical transaction requires writing rows with different row keys. If we could use something like this: prefix.uniquekey and let the partitioner use only the prefix the probability that only part of the transaction would be written could be reduced considerably. On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Second Cassandra users survey
We are using Cassandra for time series storage. Strong points: write performance. Pain points: dinamically adding column families as new time series come in. Caused a lot of headaches, mismatchers between nodes, etc. In the end we just put everything together in a single (huge) column family. Wish list: A decent GUI to explore data kept in Cassandra would be much valuable. It should also be extendable to provide viewers for custom data. Il 11/1/2011 23:59 PM, Jonathan Ellis ha scritto: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
Re: Second Cassandra users survey
So my question related deterministic sharding is this, what rebalance feature(s) would be useful or needed once the partitions get unbalanced? In current cassandra you can use nodetool move for rebalancing. Its fast operation, portion of existing data is moved to new server.
Re: Second Cassandra users survey
Actually, the data will be visible at QUORUM as well if you can see it with ONE. QUORUM actually gives you a higher chance of seeing the new value than ONE does. In the case of R=3 you have 2/3 chance of seeing the new value with QUORUM, with ONE you have 1/3... And this JIRA fixed an issue where two QUORUM reads in a row could give you the NEW value and then the OLD value. https://issues.apache.org/jira/browse/CASSANDRA-2494 So quorum read on fail for a single row always gives consistent results now. For multiple rows your still have issues, but you can always mitigate that in app with something like giving all of the changes the same time stamp, and then on read checking to make sure the time stamps match, and reading the data again if they don't. I'm not arguing against atomic batch operations, they would be nice =). Just clarifying how things work now. -Jeremiah On 11/06/2011 02:05 PM, Pierre Chalamet wrote: - support for atomic operations or batches (if QUORUM fails, data should not be visible with ONE) zookeeper is solving that. I might have screwed up a little bit since I didn't talk about isolation; let's reformulate: support for read committed (using DB terminology). Cassandra is more like read uncommitted. Even if row mutations in one CF for one key are atomic on one server , stuff is not rolled back when the CL can't be satisfied at the coordinator level. Data won't be visible at QUORUM level, but when using weaker CL, invalid data can appear imho. Also it should be possible to tell which operations failed with batch_mutate but unfortunately it is not
Re: Second Cassandra users survey
- Batch read/slice from multiple column families. On 11/01/2011 05:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
Re: Second Cassandra users survey
This is basically what entity groups are about - https://issues.apache.org/jira/browse/CASSANDRA-1684 On Mon, Nov 7, 2011 at 5:26 AM, Peter Lin wool...@gmail.com wrote: This feature interests me, so I thought I'd add some comments. Having used partition features in existing databases like DB2, Oracle and manual partitioning, one of the biggest challenges is keeping the partitions balanced. What I've seen with manual partitioning is that often the partitions get unbalanced. Usually the developers take a best guess and hope it ends up balanced. Some of the approaches I've used in the past were zip code, area code, state and some kind of hash. So my question related deterministic sharding is this, what rebalance feature(s) would be useful or needed once the partitions get unbalanced? Without a decent plan for rebalancing, it often ends up being a very painful problem to solve in production. Back when I worked mobile apps, we saw issues with how OpenWave WAP servers partitioned the accounts. The early versions randomly assigned a phone to a server when it is provisioned the first time. Once the phone was associated to that server, it was stuck on that server. If the load on that server was heavier than the others, the only choice was to scale up the hardware. My understanding of Cassandra's current sharding is consistent and random. Does the new feature sit some where in-between? Are you thinking of a pluggable API so that you can provide your own hash algorithm for cassandra to use? On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday daniel.double...@gmx.net wrote: Allow for deterministic / manual sharding of rows. Right now it seems that there is no way to force rows with different row keys will be stored on the same nodes in the ring. This is our number one reason why we get data inconsistencies when nodes fail. Sometimes a logical transaction requires writing rows with different row keys. If we could use something like this: prefix.uniquekey and let the partitioner use only the prefix the probability that only part of the transaction would be written could be reduced considerably. On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
RE: Second Cassandra users survey
I second transparent disk encryption. Also: Matching column names via 'like' and %wildcards Parameterized CQL plus Support for 'AND' and 'OR' Bulk row deletion. Also, more clarification on various parameters and configuration - If you are doing this, change Thanks for the opportunity, -Derek -- Derek Deeter, Sr. Software Engineer Intuit Financial Services (818) 597-5932 (x76932)5601 Lindero Canyon Rd. derek.dee...@digitalinsight.com Westlake, CA 91362 -Original Message- From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Sunday, November 06, 2011 10:58 AM To: user@cassandra.apache.org Subject: Re: Second Cassandra users survey Transparent on disk encryption with pluggable keyprovider will also be really helpful to secure sensitive information. On Sun, Nov 6, 2011 at 9:42 AM, Aaron Turner synfina...@gmail.com wrote: The intent was to have a lighter solution for common problems then having to go with Hadoop or streaming large quantities of data back to the client. Is this feature creep? Yeah, prolly. Is it useful? Yes. If it can't be done well, then it probably shouldn't be done, but it never hurts to ask. :) On Sun, Nov 6, 2011 at 9:13 AM, Sarah Baker sba...@mspot.com wrote: Isn't this sort of heading on the slippery slope of things that weigh you down? It was my understanding that Cassandra was stick to your core competency sort of database that really wanted to leave such utilities external. At its core was get and put. Did I miss something in my reading of intent? -Sarah -Original Message- From: Aaron Turner [mailto:synfina...@gmail.com] Sent: Sunday, November 06, 2011 8:25 AM To: user@cassandra.apache.org Subject: Re: Second Cassandra users survey 1. Basic SQL-like summary transforms for both CQL and Thrift API clients like: SUM AVG MIN MAX -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: Second Cassandra users survey
Wish list: A decent GUI to explore data kept in Cassandra would be much valuable. It should also be extendable to provide viewers for custom data. +1 to that. @jonathan - This is what google moderator is really good at. Perhaps start one and move the idea creation / voting there.
Re: Second Cassandra users survey
Well - given the example in our case the prefix that determines the endpoints where a token should be routed to could be something like a user-id so with key = userid + . + userthingid; instead of // this is happening right now getEndpoints(hash(key)) you would have getEndpoints(userid) Since count(users) is much larger than number of nodes in the ring we would still have a balanced cluster. I guess what we would need is something like a compound row key You could almost do something like this with the current code base but I remember that there are certain assumptions about how keys translate to tokens on the ring make this impossible. But in essence this would result in another partitioner implementation. So you'd have OrderPreserverPartitioner, RandomPartitioner and maybe ShardedPartitioner On Nov 7, 2011, at 2:26 PM, Peter Lin wrote: This feature interests me, so I thought I'd add some comments. Having used partition features in existing databases like DB2, Oracle and manual partitioning, one of the biggest challenges is keeping the partitions balanced. What I've seen with manual partitioning is that often the partitions get unbalanced. Usually the developers take a best guess and hope it ends up balanced. Some of the approaches I've used in the past were zip code, area code, state and some kind of hash. So my question related deterministic sharding is this, what rebalance feature(s) would be useful or needed once the partitions get unbalanced? Without a decent plan for rebalancing, it often ends up being a very painful problem to solve in production. Back when I worked mobile apps, we saw issues with how OpenWave WAP servers partitioned the accounts. The early versions randomly assigned a phone to a server when it is provisioned the first time. Once the phone was associated to that server, it was stuck on that server. If the load on that server was heavier than the others, the only choice was to scale up the hardware. My understanding of Cassandra's current sharding is consistent and random. Does the new feature sit some where in-between? Are you thinking of a pluggable API so that you can provide your own hash algorithm for cassandra to use? On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday daniel.double...@gmx.net wrote: Allow for deterministic / manual sharding of rows. Right now it seems that there is no way to force rows with different row keys will be stored on the same nodes in the ring. This is our number one reason why we get data inconsistencies when nodes fail. Sometimes a logical transaction requires writing rows with different row keys. If we could use something like this: prefix.uniquekey and let the partitioner use only the prefix the probability that only part of the transaction would be written could be reduced considerably. On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Second Cassandra users survey
It should be dead-simple to build a slick GUI on the REST layer. (@Virgilhttp://code.google.com/a/apache-extras.org/p/virgil/ ) I had planned to crank one out this week (using ExtJS) that mimicked the Squirrel/Toad look and feel. The UI would have a tree-panel of keyspaces and column families on the left. Then the main panel would be partitioned into two. The top of the main panel would would allow a user to type in CQL/Pig, etc. The bottom of the main panel would show the data contained in the column family / result set. Any other thoughts on design before I get started? If we build this based on the JSON/REST interface, it should be pretty easy to embed in other applications. -brian On Mon, Nov 7, 2011 at 2:36 PM, Ian Danforth idanfo...@numenta.com wrote: Wish list: A decent GUI to explore data kept in Cassandra would be much valuable. It should also be extendable to provide viewers for custom data. +1 to that. @jonathan - This is what google moderator is really good at. Perhaps start one and move the idea creation / voting there. -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Re: Second Cassandra users survey
Decompression without compression (for lack of a better name). We store into Cassandra log batches that come in over http either uncompressed, deflate, snappy. We just add 'magic e.g. \0 \s \n \a \p \p \y as a prefix to the column value so we can decode it when serve it back up. Seems like Cassandra could detect data with the appropriate magic, store as is and decode for us automatically on the way back. Colin.
Re: Second Cassandra users survey
1. Basic SQL-like summary transforms for both CQL and Thrift API clients like: SUM AVG MIN MAX 2. Native 64bit UNsigned datatype 3. Add support for matching column names via LIKE (% and _ wildcards) for ascii type -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
RE: Second Cassandra users survey
Isn't this sort of heading on the slippery slope of things that weigh you down? It was my understanding that Cassandra was stick to your core competency sort of database that really wanted to leave such utilities external. At its core was get and put. Did I miss something in my reading of intent? -Sarah -Original Message- From: Aaron Turner [mailto:synfina...@gmail.com] Sent: Sunday, November 06, 2011 8:25 AM To: user@cassandra.apache.org Subject: Re: Second Cassandra users survey 1. Basic SQL-like summary transforms for both CQL and Thrift API clients like: SUM AVG MIN MAX
Re: Second Cassandra users survey
The intent was to have a lighter solution for common problems then having to go with Hadoop or streaming large quantities of data back to the client. Is this feature creep? Yeah, prolly. Is it useful? Yes. If it can't be done well, then it probably shouldn't be done, but it never hurts to ask. :) On Sun, Nov 6, 2011 at 9:13 AM, Sarah Baker sba...@mspot.com wrote: Isn't this sort of heading on the slippery slope of things that weigh you down? It was my understanding that Cassandra was stick to your core competency sort of database that really wanted to leave such utilities external. At its core was get and put. Did I miss something in my reading of intent? -Sarah -Original Message- From: Aaron Turner [mailto:synfina...@gmail.com] Sent: Sunday, November 06, 2011 8:25 AM To: user@cassandra.apache.org Subject: Re: Second Cassandra users survey 1. Basic SQL-like summary transforms for both CQL and Thrift API clients like: SUM AVG MIN MAX -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: Second Cassandra users survey
Transparent on disk encryption with pluggable keyprovider will also be really helpful to secure sensitive information. On Sun, Nov 6, 2011 at 9:42 AM, Aaron Turner synfina...@gmail.com wrote: The intent was to have a lighter solution for common problems then having to go with Hadoop or streaming large quantities of data back to the client. Is this feature creep? Yeah, prolly. Is it useful? Yes. If it can't be done well, then it probably shouldn't be done, but it never hurts to ask. :) On Sun, Nov 6, 2011 at 9:13 AM, Sarah Baker sba...@mspot.com wrote: Isn't this sort of heading on the slippery slope of things that weigh you down? It was my understanding that Cassandra was stick to your core competency sort of database that really wanted to leave such utilities external. At its core was get and put. Did I miss something in my reading of intent? -Sarah -Original Message- From: Aaron Turner [mailto:synfina...@gmail.com] Sent: Sunday, November 06, 2011 8:25 AM To: user@cassandra.apache.org Subject: Re: Second Cassandra users survey 1. Basic SQL-like summary transforms for both CQL and Thrift API clients like: SUM AVG MIN MAX -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
RE: Second Cassandra users survey
- support for atomic operations or batches (if QUORUM fails, data should not be visible with ONE) zookeeper is solving that. Yeah, I can use HBase too. I might have screwed up a little bit since I didn't talk about isolation; let's reformulate: support for read committed (using DB terminology). Cassandra is more like read uncommitted. Even if row mutations in one CF for one key are atomic on one server , stuff is not rolled back when the CL can't be satisfied at the coordinator level. Data won't be visible at QUORUM level, but when using weaker CL, invalid data can appear imho. Also it should be possible to tell which operations failed with batch_mutate but unfortunately it is not. HBase has a clear semantic on mutations (http://hbase.apache.org/acid-semantics.html), maybe a similar page could be written to tell the way Cassandra is handling this. I don't know if some materials already exists in the wiki, but I can try to write something about this. Anyway, all in all, I'd like to see read committed appears at some point in Cassandra. The way Cassandra is working might render this a bit hard to introduce - but anyway, I'm just expressing my own feedback on this even if this is far away from the weak consistency Cassandra offer. - TTL on CF, rows and counters TTL on counters will be nice, but i am good with rest as it is Actually, counters have to be handle specifically to work around the non TTL support. This would be nice to unify storage capabilities. While talking about counters, I will add to the wish list the capability to mix normal columns with counter columns in the same CF allowing easy dropping of a row or atomic mutations on one row.
Re: Second Cassandra users survey
On Sun, Nov 6, 2011 at 12:52 AM, Radim Kolar h...@sendmail.cz wrote: - support for atomic operations or batches (if QUORUM fails, data should not be visible with ONE) zookeeper is solving that. I'd like to see official support for Zookeeper inside of Cassandra. I'd like it to be something that can be optionally configured. I'd like to be able to make batch mutations atomic using it.
Re: Second Cassandra users survey
On Nov 6, 2011, at 3:41 PM, Ed Anuff e...@anuff.com wrote: I'd like to see official support for Zookeeper inside of Cassandra. I'd like it to be something that can be optionally configured. I'd like to be able to make batch mutations atomic using it. Not sure how possible this is, but we are forced to use a Zookeeper component in some of our applications due to the need for a small number atomic updates to multiple rows or CF's. I would also like to see the ability to store counter columns alongside of regular columns. It would definitely alleviate some of the need for the transactional qualities of Zookeeper as it would remove one of the bigger reasons we have to update two CF's at a time. (Inserting new columns into regular CF and increment counter of said columns.) Another possible solution is to allow some sort of efficient counting of columns for a given row-key. To take that even further if we could couple that ability with the current composite column functionality then we could get an efficient count of columns containing a particular prefix. Robert Jackson
Re: Second Cassandra users survey
Yeah, I can use HBase too. but why you are not using hbase if its feature set fits your needs better and want to have same functionality in cassandra? Its good that both projects are different in this area. From rest of your post it looks like you want to have cassandra ACID compliant, which is against its design ideas. If you want ACID compliant nosql engine then there are few others not only hbase. Cassandra is more like read uncommitted. yes. Even if row mutations in one CF for one key are atomic on one server , stuff is not rolled back when the CL can't be satisfied at the coordinator level. Data won't be visible at QUORUM level, but when using weaker CL, invalid data can appear imho. Thats right. Its responsibility of application designer to code application in that way - use correct CL. In SQL databases its server responsibility to deal with inconsistent data, but in nosql its client responsibility. In reality its not problem because you have your applications under control. This problem might be worked around by cassandra core if additional settings are added to CF - minimum CL levels for read/write. Submit feature request to jira if you are interested in that.
RE: Second Cassandra users survey
I do not want to use HBase because Cassandra is way far easier to deploy and it is working pretty well - and for the 99% of our apps the model fits perfectly. The other 1% has a workaround by ordering writes. I assume the trade off anyway :) Don't miss the point: I love Cassandra and the way it works, and I understand the limitation/advantages of the architecture - I'm just saying it could be nice to have something stronger (understand more guarantee) when updating several columns - the pain area is with updating regular column and counter in order to keep everything consistent for one key. Actually, everything is under control as you say - so it's not a real problem and we can live without. But since this is a discussion about wishes, I'm not shy asking for the moon :) -Original Message- From: Radim Kolar [mailto:h...@sendmail.cz] Sent: Monday, November 07, 2011 8:02 AM To: user@cassandra.apache.org Subject: Re: Second Cassandra users survey Yeah, I can use HBase too. but why you are not using hbase if its feature set fits your needs better and want to have same functionality in cassandra? Its good that both projects are different in this area. From rest of your post it looks like you want to have cassandra ACID compliant, which is against its design ideas. If you want ACID compliant nosql engine then there are few others not only hbase. Cassandra is more like read uncommitted. yes. Even if row mutations in one CF for one key are atomic on one server , stuff is not rolled back when the CL can't be satisfied at the coordinator level. Data won't be visible at QUORUM level, but when using weaker CL, invalid data can appear imho. Thats right. Its responsibility of application designer to code application in that way - use correct CL. In SQL databases its server responsibility to deal with inconsistent data, but in nosql its client responsibility. In reality its not problem because you have your applications under control. This problem might be worked around by cassandra core if additional settings are added to CF - minimum CL levels for read/write. Submit feature request to jira if you are interested in that.
RE: Second Cassandra users survey
Dear Santa, here is my wish list :) - support for atomic operations or batches (if QUORUM fails, data should not be visible with ONE) - TTL on CF, rows and counters - restart the TTL when a row, column or CF is touched - streamed data transfer (both send receive). At least for receive (multiget), it does not support streaming (probably thrift limitation). - client could be more involved. Actually, it just wait a response from the coordinator and does nothing interesting meanwhile. It would be nice to let the client act as the coordinator (network usage improvement, tunable consistency, send read/write request to one of the natural point directly...). Probably already doable with the storage API (? - java only albeit). - cql is cool but the API is far too simple (it's text, suited for cli). Cql queries should support parameters (like JDBC '?') ; parameters could be set in binary format. It would consume less network bandwidth and could easier client support. cql responses should be streamed too (thrift again...). - support for authentication / dynamic security configuration. Allow access for all CF of KS (to support dynamically created CF in KS) or I missed something. - Pierre -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: mercredi 2 novembre 2011 00:00 To: user Subject: Second Cassandra users survey Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.htm l [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Second Cassandra users survey
On Fri, Nov 4, 2011 at 9:50 PM, Jim Newsham jnews...@referentia.com wrote: Our use case is time-series data (such as sampled sensor data). Each row describes a particular statistic over time, the column name is a time, and the column value is the sample. So it makes perfect sense to want to delete columns for a given time range. I'm sure there must be numerous other use cases for which using a range of column names makes sense. Assuming you are bucketing your rows at some interval (as in http://rubyscale.com/2011/basic-time-series-with-cassandra/), why is deleting the entire row for the interval not acceptable? -Brandon
Re: Second Cassandra users survey
- Bulk column deletion by (column name) range. Without this feature, we are forced to perform a range query and iterate over all of the columns, deleting them one by one (we do this in a batch, but it's still a very slow approach). See CASSANDRA-494/3448. If anyone else has a need for this issue, please raise your voice, as the feature has been tabled due to lack of interest. On 11/3/2011 11:44 AM, Todd Burruss wrote: - Better performance when access random columns in a wide row - caching subsets of wide rows - possibly on the same boundaries as the index - some sort of notification architecture when data is inserted. This could be co-processors, triggers, plugins, etc - auto load balance when adding new nodes On 11/1/11 3:59 PM, Jonathan Ellisjbel...@gmail.com wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.ht ml [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.h tml [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Second Cassandra users survey
On Fri, Nov 4, 2011 at 9:19 PM, Jim Newsham jnews...@referentia.com wrote: - Bulk column deletion by (column name) range. Without this feature, we are forced to perform a range query and iterate over all of the columns, deleting them one by one (we do this in a batch, but it's still a very slow approach). See CASSANDRA-494/3448. If anyone else has a need for this issue, please raise your voice, as the feature has been tabled due to lack of interest. I think the lack of interest here has been this: it's unusual to want to delete columns for which you do not know the names, but also not want to delete the entire row. Is there any chance you're trying to delete the entire row, or is it truly the case I just described? -Brandon
Re: Second Cassandra users survey
On 11/4/2011 4:32 PM, Brandon Williams wrote: On Fri, Nov 4, 2011 at 9:19 PM, Jim Newshamjnews...@referentia.com wrote: - Bulk column deletion by (column name) range. Without this feature, we are forced to perform a range query and iterate over all of the columns, deleting them one by one (we do this in a batch, but it's still a very slow approach). See CASSANDRA-494/3448. If anyone else has a need for this issue, please raise your voice, as the feature has been tabled due to lack of interest. I think the lack of interest here has been this: it's unusual to want to delete columns for which you do not know the names, but also not want to delete the entire row. Is there any chance you're trying to delete the entire row, or is it truly the case I just described? -Brandon Our use case is time-series data (such as sampled sensor data). Each row describes a particular statistic over time, the column name is a time, and the column value is the sample. So it makes perfect sense to want to delete columns for a given time range. I'm sure there must be numerous other use cases for which using a range of column names makes sense. Regards, Jim
Re: Second Cassandra users survey
I'm using Cassandra as a big graph database, loading large volumes of data live and linking on the fly. The number of edges grow geometrically with data added, and need to be read to continue linking the graph on the fly. Consequently, my problem is constrained by: * Predominantly read - especially when data gets large and reads are quasi random * I have lots of data to plow in, to be read * Although the problem scale out and possibly all be in RAM, it requires too much kit for the to be viable So, my findings with Cassandra are: * Compaction is expensive, I need it but 1) It takes away disk IO from my reads 2) Destroys the file cache I've not had chance to do extensive tests with the Level db compaction * Compaction has been too hard to configure historically * Memory hungry So for me the biggest features would be * Cheaper compaction - * Lower memory usage * Indexing dynamic colnames (eg Lucene TermEnum against rowkey:colkey) I do a lot of checking against dynamic colnames The great features are that redundancy, and live addition of shards is available out of the box. I've also experimented with Golden Orb and Triggered updates, I think there is a fair bit that can be achieved in my problem with local data access. Through GoldenOrb and Hadoop writables a managed to get both a BigTable and Pregel access model onto my Cassandra data. It was schema specific, but provided a local compute model. p From: Jonathan Ellis jbel...@gmail.com To: user user@cassandra.apache.org Sent: Tuesday, 1 November 2011, 22:59 Subject: Second Cassandra users survey Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Second Cassandra users survey
* Compaction is expensive Yes, it is. Thats why i deciced not to go with hadoop hdfs backed by cassandra.
Re: Second Cassandra users survey
On Thu, Nov 3, 2011 at 5:46 AM, Peter Tillotson slatem...@yahoo.co.uk wrote: I'm using Cassandra as a big graph database, loading large volumes of data live and linking on the fly. Not sure if Cassandra is right fit to model complex vertexes and edges. The number of edges grow geometrically with data added, and need to be read to continue linking the graph on the fly. Consequently, my problem is constrained by: * Predominantly read - especially when data gets large and reads are quasi random * I have lots of data to plow in, to be read * Although the problem scale out and possibly all be in RAM, it requires too much kit for the to be viable So, my findings with Cassandra are: * Compaction is expensive, I need it but 1) It takes away disk IO from my reads 2) Destroys the file cache I've not had chance to do extensive tests with the Level db compaction * Compaction has been too hard to configure historically * Memory hungry So for me the biggest features would be * Cheaper compaction - * Lower memory usage * Indexing dynamic colnames (eg Lucene TermEnum against rowkey:colkey) I do a lot of checking against dynamic colnames I agree, some kind of integration with search engine is required to support adhoc queries as well and searching on column names. This will be really helpful. Currently, one of the options is to write in 2 places. Cassandra + search engine. The great features are that redundancy, and live addition of shards is available out of the box. I've also experimented with Golden Orb and Triggered updates, I think there is a fair bit that can be achieved in my problem with local data access. Through GoldenOrb and Hadoop writables a managed to get both a BigTable and Pregel access model onto my Cassandra data. It was schema specific, but provided a local compute model. p From: Jonathan Ellis jbel...@gmail.com To: user user@cassandra.apache.org Sent: Tuesday, 1 November 2011, 22:59 Subject: Second Cassandra users survey Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Second Cassandra users survey
* Indexing dynamic colnames (eg Lucene TermEnum against rowkey:colkey) I do a lot of checking against dynamic colnames I agree, some kind of integration with search engine is required to support adhoc queries as well and searching on column names. This will be really helpful. Currently, one of the options is to write in 2 places. Cassandra + search engine. I thought a disk backed skiplist, with every nth rowkey:colkey dragged into memory per sstable as per Lucene TermEnum. From: Mohit Anchlia mohitanch...@gmail.com To: user@cassandra.apache.org; Peter Tillotson slatem...@yahoo.co.uk Sent: Thursday, 3 November 2011, 14:15 Subject: Re: Second Cassandra users survey On Thu, Nov 3, 2011 at 5:46 AM, Peter Tillotson slatem...@yahoo.co.uk wrote: I'm using Cassandra as a big graph database, loading large volumes of data live and linking on the fly. Not sure if Cassandra is right fit to model complex vertexes and edges. The number of edges grow geometrically with data added, and need to be read to continue linking the graph on the fly. Consequently, my problem is constrained by: * Predominantly read - especially when data gets large and reads are quasi random * I have lots of data to plow in, to be read * Although the problem scale out and possibly all be in RAM, it requires too much kit for the to be viable So, my findings with Cassandra are: * Compaction is expensive, I need it but 1) It takes away disk IO from my reads 2) Destroys the file cache I've not had chance to do extensive tests with the Level db compaction * Compaction has been too hard to configure historically * Memory hungry So for me the biggest features would be * Cheaper compaction - * Lower memory usage * Indexing dynamic colnames (eg Lucene TermEnum against rowkey:colkey) I do a lot of checking against dynamic colnames I agree, some kind of integration with search engine is required to support adhoc queries as well and searching on column names. This will be really helpful. Currently, one of the options is to write in 2 places. Cassandra + search engine. The great features are that redundancy, and live addition of shards is available out of the box. I've also experimented with Golden Orb and Triggered updates, I think there is a fair bit that can be achieved in my problem with local data access. Through GoldenOrb and Hadoop writables a managed to get both a BigTable and Pregel access model onto my Cassandra data. It was schema specific, but provided a local compute model. p From: Jonathan Ellis jbel...@gmail.com To: user user@cassandra.apache.org Sent: Tuesday, 1 November 2011, 22:59 Subject: Second Cassandra users survey Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Second Cassandra users survey
Provide an option to sort columns by timestamp i.e, in the order they have been added to the row, with the facility to use any column names. On Wed, Nov 2, 2011 at 4:29 AM, Jonathan Ellis jbel...@gmail.com wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Second Cassandra users survey
I realize that it is not realistic to expect it, but is would be good to have a Partitioner that supports both range slices and automatic load balancing. On Thu, Nov 3, 2011 at 13:57, Ertio Lew ertio...@gmail.com wrote: Provide an option to sort columns by timestamp i.e, in the order they have been added to the row, with the facility to use any column names. On Wed, Nov 2, 2011 at 4:29 AM, Jonathan Ellis jbel...@gmail.com wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Second Cassandra users survey
- Better performance when access random columns in a wide row - caching subsets of wide rows - possibly on the same boundaries as the index - some sort of notification architecture when data is inserted. This could be co-processors, triggers, plugins, etc - auto load balance when adding new nodes On 11/1/11 3:59 PM, Jonathan Ellis jbel...@gmail.com wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.ht ml [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.h tml [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Second Cassandra users survey
- entity groups - co-processors - materialized views - CQL support directly in cassandra-cli On Tue, Nov 1, 2011 at 6:59 PM, Jonathan Ellis jbel...@gmail.com wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Second Cassandra users survey
1. entity groups 2. cql support in cassandra-cli. 3. offset support in slice_range. 4. more sophisticated secondary index implementation. On Wed, Nov 2, 2011 at 8:38 PM, Patrick Julien pjul...@gmail.com wrote: - entity groups - co-processors - materialized views - CQL support directly in cassandra-cli On Tue, Nov 1, 2011 at 6:59 PM, Jonathan Ellis jbel...@gmail.com wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Second Cassandra users survey
Here is my wish list - I would love Cassandra to - provide a efficient method to retrieve the count of columns for a given row without resorting to read all columns and calculate the count for a given row key. - support auto increment column names - Column slice based query doesn't take advantage of the Column Bloom Filter and it is not always easy to enumerate the column names in a deterministic manner. - provide JNA support for Key cache - remove dependency on running node tool repair when any column is deleted ( so tombstones doesn't get resurrected ) thanks Ramesh On Tue, Nov 1, 2011 at 5:59 PM, Jonathan Ellis jbel...@gmail.com wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com