Re: Second Cassandra users survey

2011-12-06 Thread Matthias Pfau
It took some time to gather our requirements and to check what are our 
most important needs. However, here they are:


* Column position range queries: We would like to access columns not by 
their name, but by their position in the row.

Example: row(A:v1, B:v2, C:v3, D:v4); ; ordered by UTF8TYPE
Example query: Give all elements with position range 1..3 would return 
(B:v2, C:v3, D:v4)


* Arbitrary position range queries: We would like to access arbitrary 
colums by their position in the row:
Example query: Give all elements with positions (0, 3, 1) would return 
(B:v2, D:v4, A:v1)


* Security for client-server communication (thrift): A big benefit for 
all users of cassandra which deploy the cluster into untrusted 
environments (Amazon EC2 etc.) would be the possibility to secure the 
client-server communication with SSL. This has already been implemented 
in Thrift (see https://issues.apache.org/jira/browse/THRIFT-106) and 
must probably be added to CassandraDaemon.ThriftServer.


Kind regards
Arne and Matthias

On 11/01/2011 11:59 PM, Jonathan Ellis wrote:

Hi all,

Two years ago I asked for Cassandra use cases and feature requests.
[1]  The results [2] have been extremely useful in setting and
prioritizing goals for Cassandra development.  But with the release of
1.0 we've accomplished basically everything from our original wish
list. [3]

I'd love to hear from modern Cassandra users again, especially if
you're usually a quiet lurker.  What does Cassandra do well?  What are
your pain points?  What's your feature wish list?

As before, if you're in stealth mode or don't want to say anything in
public, feel free to reply to me privately and I will keep it off the
record.

[1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
[2] 
http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
[3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html





Re: Second Cassandra users survey

2011-11-28 Thread Aditya
Ability to mix counter columns  normal columns in same column family.

On Thu, Nov 17, 2011 at 6:46 PM, Boris Yen yulin...@gmail.com wrote:

 I was wondering if it is possible to provide a funtion like delete  from
 cf where column='value'  

 I think this shold be useful for people who use secondary index a lot.

 On Nov 15, 2011 11:05 AM, Edward Ribeiro edward.ribe...@gmail.com
 wrote:
 
  +1 on co-processors.
 
 
  Edward



Re: Second Cassandra users survey

2011-11-14 Thread Chris Burroughs
 - It would be super cool if all of that counter work made it possible
to support other atomic data types (sets? CAS?  just pass a assoc/commun
Function to apply).
 - Again with types, pluggable type specific compression.
 - Wishy washy wish: Simpler elasticity  I would like to go from
6--8--7 nodes without each of those being an annoying fight with tokens.
 - Gossip as library.  Gossip/failure detection is something C* seems to
have gotten particularly right (or at least it's something that has not
needed to change much).  It would be cool to use Cassandra's gossip
protocol as distributed systems building tool a la ZooKeeper.

On 11/01/2011 06:59 PM, Jonathan Ellis wrote:
 Hi all,
 
 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]
 
 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?
 
 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.
 
 [1] 
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2] 
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
 



Re: Second Cassandra users survey

2011-11-14 Thread Jake Luciani
Re  Simpler elasticity:

Latest opscenter will now rebalance cluster optimally
http://www.datastax.com/dev/blog/whats-new-in-opscenter-1-3

/plug

-Jake

On Mon, Nov 14, 2011 at 7:27 PM, Chris Burroughs
chris.burrou...@gmail.comwrote:

  - It would be super cool if all of that counter work made it possible
 to support other atomic data types (sets? CAS?  just pass a assoc/commun
 Function to apply).
  - Again with types, pluggable type specific compression.
  - Wishy washy wish: Simpler elasticity  I would like to go from
 6--8--7 nodes without each of those being an annoying fight with tokens.
  - Gossip as library.  Gossip/failure detection is something C* seems to
 have gotten particularly right (or at least it's something that has not
 needed to change much).  It would be cool to use Cassandra's gossip
 protocol as distributed systems building tool a la ZooKeeper.

 On 11/01/2011 06:59 PM, Jonathan Ellis wrote:
  Hi all,
 
  Two years ago I asked for Cassandra use cases and feature requests.
  [1]  The results [2] have been extremely useful in setting and
  prioritizing goals for Cassandra development.  But with the release of
  1.0 we've accomplished basically everything from our original wish
  list. [3]
 
  I'd love to hear from modern Cassandra users again, especially if
  you're usually a quiet lurker.  What does Cassandra do well?  What are
  your pain points?  What's your feature wish list?
 
  As before, if you're in stealth mode or don't want to say anything in
  public, feel free to reply to me privately and I will keep it off the
  record.
 
  [1]
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
  [2]
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
  [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
 




-- 
http://twitter.com/tjake


Re: Second Cassandra users survey

2011-11-14 Thread Mohit Anchlia
On Mon, Nov 14, 2011 at 4:44 PM, Jake Luciani jak...@gmail.com wrote:
 Re  Simpler elasticity:
 Latest opscenter will now rebalance cluster optimally
 http://www.datastax.com/dev/blog/whats-new-in-opscenter-1-3
 /plug

Does it cause any impact on reads and writes while re-balance is in
progress? How is it handled on live cluster?

 -Jake

 On Mon, Nov 14, 2011 at 7:27 PM, Chris Burroughs chris.burrou...@gmail.com
 wrote:

  - It would be super cool if all of that counter work made it possible
 to support other atomic data types (sets? CAS?  just pass a assoc/commun
 Function to apply).
  - Again with types, pluggable type specific compression.
  - Wishy washy wish: Simpler elasticity  I would like to go from
 6--8--7 nodes without each of those being an annoying fight with tokens.
  - Gossip as library.  Gossip/failure detection is something C* seems to
 have gotten particularly right (or at least it's something that has not
 needed to change much).  It would be cool to use Cassandra's gossip
 protocol as distributed systems building tool a la ZooKeeper.

 On 11/01/2011 06:59 PM, Jonathan Ellis wrote:
  Hi all,
 
  Two years ago I asked for Cassandra use cases and feature requests.
  [1]  The results [2] have been extremely useful in setting and
  prioritizing goals for Cassandra development.  But with the release of
  1.0 we've accomplished basically everything from our original wish
  list. [3]
 
  I'd love to hear from modern Cassandra users again, especially if
  you're usually a quiet lurker.  What does Cassandra do well?  What are
  your pain points?  What's your feature wish list?
 
  As before, if you're in stealth mode or don't want to say anything in
  public, feel free to reply to me privately and I will keep it off the
  record.
 
  [1]
  http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
  [2]
  http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
  [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
 




 --
 http://twitter.com/tjake



Re: Second Cassandra users survey

2011-11-14 Thread Dean Hiller
+1 on coprocessors

On Mon, Nov 14, 2011 at 6:51 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 On Mon, Nov 14, 2011 at 4:44 PM, Jake Luciani jak...@gmail.com wrote:
  Re  Simpler elasticity:
  Latest opscenter will now rebalance cluster optimally
  http://www.datastax.com/dev/blog/whats-new-in-opscenter-1-3
  /plug

 Does it cause any impact on reads and writes while re-balance is in
 progress? How is it handled on live cluster?

  -Jake
 
  On Mon, Nov 14, 2011 at 7:27 PM, Chris Burroughs 
 chris.burrou...@gmail.com
  wrote:
 
   - It would be super cool if all of that counter work made it possible
  to support other atomic data types (sets? CAS?  just pass a assoc/commun
  Function to apply).
   - Again with types, pluggable type specific compression.
   - Wishy washy wish: Simpler elasticity  I would like to go from
  6--8--7 nodes without each of those being an annoying fight with
 tokens.
   - Gossip as library.  Gossip/failure detection is something C* seems to
  have gotten particularly right (or at least it's something that has not
  needed to change much).  It would be cool to use Cassandra's gossip
  protocol as distributed systems building tool a la ZooKeeper.
 
  On 11/01/2011 06:59 PM, Jonathan Ellis wrote:
   Hi all,
  
   Two years ago I asked for Cassandra use cases and feature requests.
   [1]  The results [2] have been extremely useful in setting and
   prioritizing goals for Cassandra development.  But with the release of
   1.0 we've accomplished basically everything from our original wish
   list. [3]
  
   I'd love to hear from modern Cassandra users again, especially if
   you're usually a quiet lurker.  What does Cassandra do well?  What are
   your pain points?  What's your feature wish list?
  
   As before, if you're in stealth mode or don't want to say anything in
   public, feel free to reply to me privately and I will keep it off the
   record.
  
   [1]
  
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
   [2]
  
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
   [3]
 http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
  
 
 
 
 
  --
  http://twitter.com/tjake
 



Re: Second Cassandra users survey

2011-11-14 Thread Dean Hiller
oh yeah, one more BIG one.in memory writes with asynch write-behind to
disk like cassandra does for speed.

So if you have atomic locking, it writes to the primary node(memory) and
some other node(memory) and returns with success to the client.  asynch
then writes to disk later.  This prove to be very fast and 2 machines make
it pretty reliable and of course it is asynchronously writing to that third
or fourth machine depending on replication factor.

later,
Dean

On Mon, Nov 14, 2011 at 6:59 PM, Dean Hiller d...@alvazan.com wrote:

 +1 on coprocessors


 On Mon, Nov 14, 2011 at 6:51 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 On Mon, Nov 14, 2011 at 4:44 PM, Jake Luciani jak...@gmail.com wrote:
  Re  Simpler elasticity:
  Latest opscenter will now rebalance cluster optimally
  http://www.datastax.com/dev/blog/whats-new-in-opscenter-1-3
  /plug

 Does it cause any impact on reads and writes while re-balance is in
 progress? How is it handled on live cluster?

  -Jake
 
  On Mon, Nov 14, 2011 at 7:27 PM, Chris Burroughs 
 chris.burrou...@gmail.com
  wrote:
 
   - It would be super cool if all of that counter work made it possible
  to support other atomic data types (sets? CAS?  just pass a
 assoc/commun
  Function to apply).
   - Again with types, pluggable type specific compression.
   - Wishy washy wish: Simpler elasticity  I would like to go from
  6--8--7 nodes without each of those being an annoying fight with
 tokens.
   - Gossip as library.  Gossip/failure detection is something C* seems
 to
  have gotten particularly right (or at least it's something that has not
  needed to change much).  It would be cool to use Cassandra's gossip
  protocol as distributed systems building tool a la ZooKeeper.
 
  On 11/01/2011 06:59 PM, Jonathan Ellis wrote:
   Hi all,
  
   Two years ago I asked for Cassandra use cases and feature requests.
   [1]  The results [2] have been extremely useful in setting and
   prioritizing goals for Cassandra development.  But with the release
 of
   1.0 we've accomplished basically everything from our original wish
   list. [3]
  
   I'd love to hear from modern Cassandra users again, especially if
   you're usually a quiet lurker.  What does Cassandra do well?  What
 are
   your pain points?  What's your feature wish list?
  
   As before, if you're in stealth mode or don't want to say anything in
   public, feel free to reply to me privately and I will keep it off the
   record.
  
   [1]
  
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
   [2]
  
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
   [3]
 http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
  
 
 
 
 
  --
  http://twitter.com/tjake
 





Re: Second Cassandra users survey

2011-11-14 Thread Edward Ribeiro
+1 on co-processors.


Edward


Re: Second Cassandra users survey

2011-11-11 Thread Aaron Turner
Lately I've been working on some data processing code in Cassandra and
apparently I don't write bug-free code the very first time. :)  Hence,
while debugging, I often need to look at data in Cassandra to see what
my code is doing/should be finding, etc. This turns out to be harder
then it should be IMHO.

Anyways, what I'd like is a more powerful cqlsh or similar tool which can:

1. List the rows in a CF (no column data), optionally within a given
range of keys
2. Count the number of columns in a row and within a range of values
for that row
3. Return only the column names for a given row (no values)
4. Support CentOS 5 (currently uses python 2.4, but cqlsh-1.0.5
requires = 2.5.  1.0.3 worked fine on 2.4)
5. Support some basic transformations of data:
  * return up to the first X bytes of a given value
  * return length of value in bytes instead of value
6. Alternatively print data in the format of something like this to
make it easier to read:

RowKey1:
\tname = value
\tname = value
...

RowKey2:
...

7. For BytesType provide an option to print values all values in hex.
Ie: no mixed ASCII + \x encoding.  Just 0x0a3b...


Frankly, I know some of this isn't efficient for the server to do, but
the client could do that.  I really don't care too much about
performance since this is a debugging/diagnostics tool.

-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
carpe diem quam minimum credula postero


Re: Second Cassandra users survey

2011-11-11 Thread Edward Capriolo
It seems like you could use a composite key partioner to accomplish this

On Monday, November 7, 2011, Daniel Doubleday daniel.double...@gmx.net
wrote:
 Allow for deterministic / manual sharding of rows.

 Right now it seems that there is no way to force rows with different row
keys will be stored on the same nodes in the ring.
 This is our number one reason why we get data inconsistencies when nodes
fail.

 Sometimes a logical transaction requires writing rows with different row
keys. If we could use something like this:

 prefix.uniquekey and let the partitioner use only the prefix the
probability that only part of the transaction would be written could be
reduced considerably.



 On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote:

 Hi all,

 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]

 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?

 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.

 [1]
http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2]
http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




Re: Second Cassandra users survey

2011-11-09 Thread Jake Luciani
Hi Todd,

Entity Groups : https://issues.apache.org/jira/browse/CASSANDRA-1684

-Jake

On Wed, Nov 9, 2011 at 6:44 AM, Todd Burruss bburr...@expedia.com wrote:

 I believe I heard someone talk at Cassandra SF conference about creating a
 partitioner that was a derivation of RandomPartitioner.  It essentially
 would look for keys that adhere to a certain pattern, like key:subkey.
  The key portion would be used for determining the location on the ring,
 but key:subkey for actually storing.  This would allow groups of data
 (all having the same key) to reside on the same node, while still
 maintaining uniqueness across the entire keyspace.

 Unbalanced nodes could still occur, but I don't think any worse than
 wide/large rows can cause.


 On 11/8/11 1:29 AM, Daniel Doubleday daniel.double...@gmx.net wrote:

 Ah cool - thanks for the pointer!
 
 On Nov 7, 2011, at 5:25 PM, Ed Anuff wrote:
 
  This is basically what entity groups are about -
  https://issues.apache.org/jira/browse/CASSANDRA-1684
 
  On Mon, Nov 7, 2011 at 5:26 AM, Peter Lin wool...@gmail.com wrote:
  This feature interests me, so I thought I'd add some comments.
 
  Having used partition features in existing databases like DB2, Oracle
  and manual partitioning, one of the biggest challenges is keeping the
  partitions balanced. What I've seen with manual partitioning is that
  often the partitions get unbalanced. Usually the developers take a
  best guess and hope it ends up balanced.
 
  Some of the approaches I've used in the past were zip code, area code,
  state and some kind of hash.
 
  So my question related deterministic sharding is this, what rebalance
  feature(s) would be useful or needed once the partitions get
  unbalanced?
 
  Without a decent plan for rebalancing, it often ends up being a very
  painful problem to solve in production. Back when I worked mobile
  apps, we saw issues with how OpenWave WAP servers partitioned the
  accounts. The early versions randomly assigned a phone to a server
  when it is provisioned the first time. Once the phone was associated
  to that server, it was stuck on that server. If the load on that
  server was heavier than the others, the only choice was to scale up
  the hardware.
 
  My understanding of Cassandra's current sharding is consistent and
  random. Does the new feature sit some where in-between? Are you
  thinking of a pluggable API so that you can provide your own hash
  algorithm for cassandra to use?
 
 
 
  On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday
  daniel.double...@gmx.net wrote:
  Allow for deterministic / manual sharding of rows.
 
  Right now it seems that there is no way to force rows with different
 row keys will be stored on the same nodes in the ring.
  This is our number one reason why we get data inconsistencies when
 nodes fail.
 
  Sometimes a logical transaction requires writing rows with different
 row keys. If we could use something like this:
 
  prefix.uniquekey and let the partitioner use only the prefix the
 probability that only part of the transaction would be written could
 be reduced considerably.
 
 
 
  On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote:
 
  Hi all,
 
  Two years ago I asked for Cassandra use cases and feature requests.
  [1]  The results [2] have been extremely useful in setting and
  prioritizing goals for Cassandra development.  But with the release
 of
  1.0 we've accomplished basically everything from our original wish
  list. [3]
 
  I'd love to hear from modern Cassandra users again, especially if
  you're usually a quiet lurker.  What does Cassandra do well?  What
 are
  your pain points?  What's your feature wish list?
 
  As before, if you're in stealth mode or don't want to say anything in
  public, feel free to reply to me privately and I will keep it off the
  record.
 
  [1]
 
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg0114
 8.html
  [2]
 
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg014
 46.html
  [3]
 http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 
 
 




-- 
http://twitter.com/tjake


Re: Second Cassandra users survey

2011-11-09 Thread Aaron Turner
I think this was already asked for, but you can add my vote for TTL
support for Counters.

On Tue, Nov 1, 2011 at 3:59 PM, Jonathan Ellis jbel...@gmail.com wrote:
 Hi all,

 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]

 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?

 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.

 [1] 
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2] 
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
carpe diem quam minimum credula postero


Re: Second Cassandra users survey

2011-11-09 Thread Todd Burruss
Thx jake for the JIRA, but there was someone at the conference that had already 
implemented what I mentioned.  It didn't offer any atomicity, just co-locating 
a family of data on the same node.

From: Jake Luciani jak...@gmail.commailto:jak...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wed, 9 Nov 2011 02:53:20 -0800
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Second Cassandra users survey

Hi Todd,

Entity Groups : https://issues.apache.org/jira/browse/CASSANDRA-1684

-Jake

On Wed, Nov 9, 2011 at 6:44 AM, Todd Burruss 
bburr...@expedia.commailto:bburr...@expedia.com wrote:
I believe I heard someone talk at Cassandra SF conference about creating a
partitioner that was a derivation of RandomPartitioner.  It essentially
would look for keys that adhere to a certain pattern, like key:subkey.
 The key portion would be used for determining the location on the ring,
but key:subkey for actually storing.  This would allow groups of data
(all having the same key) to reside on the same node, while still
maintaining uniqueness across the entire keyspace.

Unbalanced nodes could still occur, but I don't think any worse than
wide/large rows can cause.


On 11/8/11 1:29 AM, Daniel Doubleday 
daniel.double...@gmx.netmailto:daniel.double...@gmx.net wrote:

Ah cool - thanks for the pointer!

On Nov 7, 2011, at 5:25 PM, Ed Anuff wrote:

 This is basically what entity groups are about -
 https://issues.apache.org/jira/browse/CASSANDRA-1684

 On Mon, Nov 7, 2011 at 5:26 AM, Peter Lin 
 wool...@gmail.commailto:wool...@gmail.com wrote:
 This feature interests me, so I thought I'd add some comments.

 Having used partition features in existing databases like DB2, Oracle
 and manual partitioning, one of the biggest challenges is keeping the
 partitions balanced. What I've seen with manual partitioning is that
 often the partitions get unbalanced. Usually the developers take a
 best guess and hope it ends up balanced.

 Some of the approaches I've used in the past were zip code, area code,
 state and some kind of hash.

 So my question related deterministic sharding is this, what rebalance
 feature(s) would be useful or needed once the partitions get
 unbalanced?

 Without a decent plan for rebalancing, it often ends up being a very
 painful problem to solve in production. Back when I worked mobile
 apps, we saw issues with how OpenWave WAP servers partitioned the
 accounts. The early versions randomly assigned a phone to a server
 when it is provisioned the first time. Once the phone was associated
 to that server, it was stuck on that server. If the load on that
 server was heavier than the others, the only choice was to scale up
 the hardware.

 My understanding of Cassandra's current sharding is consistent and
 random. Does the new feature sit some where in-between? Are you
 thinking of a pluggable API so that you can provide your own hash
 algorithm for cassandra to use?



 On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday
 daniel.double...@gmx.netmailto:daniel.double...@gmx.net wrote:
 Allow for deterministic / manual sharding of rows.

 Right now it seems that there is no way to force rows with different
row keys will be stored on the same nodes in the ring.
 This is our number one reason why we get data inconsistencies when
nodes fail.

 Sometimes a logical transaction requires writing rows with different
row keys. If we could use something like this:

 prefix.uniquekey and let the partitioner use only the prefix the
probability that only part of the transaction would be written could
be reduced considerably.



 On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote:

 Hi all,

 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release
of
 1.0 we've accomplished basically everything from our original wish
 list. [3]

 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What
are
 your pain points?  What's your feature wish list?

 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.

 [1]
http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg0114
8.html
 [2]
http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg014
46.html
 [3]
http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com








--
http://twitter.com/tjake


Re: Second Cassandra users survey

2011-11-09 Thread Jake Luciani
Solandra does this
https://github.com/tjake/Solandra/blob/solandra/src/lucandra/dht/RandomPartitioner.java

But Row Groups is going to be the official way.

-Jake

On Wed, Nov 9, 2011 at 5:53 PM, Todd Burruss bburr...@expedia.com wrote:

 Thx jake for the JIRA, but there was someone at the conference that had
 already implemented what I mentioned.  It didn't offer any atomicity, just
 co-locating a family of data on the same node.

 From: Jake Luciani jak...@gmail.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Wed, 9 Nov 2011 02:53:20 -0800
 To: user@cassandra.apache.org user@cassandra.apache.org

 Subject: Re: Second Cassandra users survey

 Hi Todd,

 Entity Groups : https://issues.apache.org/jira/browse/CASSANDRA-1684

 -Jake

 On Wed, Nov 9, 2011 at 6:44 AM, Todd Burruss bburr...@expedia.com wrote:

 I believe I heard someone talk at Cassandra SF conference about creating a
 partitioner that was a derivation of RandomPartitioner.  It essentially
 would look for keys that adhere to a certain pattern, like key:subkey.
  The key portion would be used for determining the location on the ring,
 but key:subkey for actually storing.  This would allow groups of data
 (all having the same key) to reside on the same node, while still
 maintaining uniqueness across the entire keyspace.

 Unbalanced nodes could still occur, but I don't think any worse than
 wide/large rows can cause.


 On 11/8/11 1:29 AM, Daniel Doubleday daniel.double...@gmx.net wrote:

 Ah cool - thanks for the pointer!
 
 On Nov 7, 2011, at 5:25 PM, Ed Anuff wrote:
 
  This is basically what entity groups are about -
  https://issues.apache.org/jira/browse/CASSANDRA-1684
 
  On Mon, Nov 7, 2011 at 5:26 AM, Peter Lin wool...@gmail.com wrote:
  This feature interests me, so I thought I'd add some comments.
 
  Having used partition features in existing databases like DB2, Oracle
  and manual partitioning, one of the biggest challenges is keeping the
  partitions balanced. What I've seen with manual partitioning is that
  often the partitions get unbalanced. Usually the developers take a
  best guess and hope it ends up balanced.
 
  Some of the approaches I've used in the past were zip code, area code,
  state and some kind of hash.
 
  So my question related deterministic sharding is this, what rebalance
  feature(s) would be useful or needed once the partitions get
  unbalanced?
 
  Without a decent plan for rebalancing, it often ends up being a very
  painful problem to solve in production. Back when I worked mobile
  apps, we saw issues with how OpenWave WAP servers partitioned the
  accounts. The early versions randomly assigned a phone to a server
  when it is provisioned the first time. Once the phone was associated
  to that server, it was stuck on that server. If the load on that
  server was heavier than the others, the only choice was to scale up
  the hardware.
 
  My understanding of Cassandra's current sharding is consistent and
  random. Does the new feature sit some where in-between? Are you
  thinking of a pluggable API so that you can provide your own hash
  algorithm for cassandra to use?
 
 
 
  On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday
  daniel.double...@gmx.net wrote:
  Allow for deterministic / manual sharding of rows.
 
  Right now it seems that there is no way to force rows with different
 row keys will be stored on the same nodes in the ring.
  This is our number one reason why we get data inconsistencies when
 nodes fail.
 
  Sometimes a logical transaction requires writing rows with different
 row keys. If we could use something like this:
 
  prefix.uniquekey and let the partitioner use only the prefix the
 probability that only part of the transaction would be written could
 be reduced considerably.
 
 
 
  On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote:
 
  Hi all,
 
  Two years ago I asked for Cassandra use cases and feature requests.
  [1]  The results [2] have been extremely useful in setting and
  prioritizing goals for Cassandra development.  But with the release
 of
  1.0 we've accomplished basically everything from our original wish
  list. [3]
 
  I'd love to hear from modern Cassandra users again, especially if
  you're usually a quiet lurker.  What does Cassandra do well?  What
 are
  your pain points?  What's your feature wish list?
 
  As before, if you're in stealth mode or don't want to say anything
 in
  public, feel free to reply to me privately and I will keep it off
 the
  record.
 
  [1]
 
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg0114
 8.html
  [2]
 
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg014
 46.html
  [3]
 http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra
 support
  http://www.datastax.com
 
 
 
 




 --
 http://twitter.com/tjake




-- 
http://twitter.com/tjake


Re: Second Cassandra users survey

2011-11-09 Thread Vijay
My wish list:

1) Conditional updates: if a column has a value then put column in the
column family atomically else fail.
2) getAndSet: on counters: a separate API
3) Revert the count when client disconnects or receives a exception (so
they can safely retry).
4) Something like a freeze API for updates to a row/CF (this can be used as
a lock).

Regards,
/VJ



On Tue, Nov 1, 2011 at 3:59 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Hi all,

 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]

 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?

 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.

 [1]
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2]
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Second Cassandra users survey

2011-11-08 Thread Daniel Doubleday
Ah cool - thanks for the pointer!

On Nov 7, 2011, at 5:25 PM, Ed Anuff wrote:

 This is basically what entity groups are about -
 https://issues.apache.org/jira/browse/CASSANDRA-1684
 
 On Mon, Nov 7, 2011 at 5:26 AM, Peter Lin wool...@gmail.com wrote:
 This feature interests me, so I thought I'd add some comments.
 
 Having used partition features in existing databases like DB2, Oracle
 and manual partitioning, one of the biggest challenges is keeping the
 partitions balanced. What I've seen with manual partitioning is that
 often the partitions get unbalanced. Usually the developers take a
 best guess and hope it ends up balanced.
 
 Some of the approaches I've used in the past were zip code, area code,
 state and some kind of hash.
 
 So my question related deterministic sharding is this, what rebalance
 feature(s) would be useful or needed once the partitions get
 unbalanced?
 
 Without a decent plan for rebalancing, it often ends up being a very
 painful problem to solve in production. Back when I worked mobile
 apps, we saw issues with how OpenWave WAP servers partitioned the
 accounts. The early versions randomly assigned a phone to a server
 when it is provisioned the first time. Once the phone was associated
 to that server, it was stuck on that server. If the load on that
 server was heavier than the others, the only choice was to scale up
 the hardware.
 
 My understanding of Cassandra's current sharding is consistent and
 random. Does the new feature sit some where in-between? Are you
 thinking of a pluggable API so that you can provide your own hash
 algorithm for cassandra to use?
 
 
 
 On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday
 daniel.double...@gmx.net wrote:
 Allow for deterministic / manual sharding of rows.
 
 Right now it seems that there is no way to force rows with different row 
 keys will be stored on the same nodes in the ring.
 This is our number one reason why we get data inconsistencies when nodes 
 fail.
 
 Sometimes a logical transaction requires writing rows with different row 
 keys. If we could use something like this:
 
 prefix.uniquekey and let the partitioner use only the prefix the 
 probability that only part of the transaction would be written could be 
 reduced considerably.
 
 
 
 On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote:
 
 Hi all,
 
 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]
 
 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?
 
 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.
 
 [1] 
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2] 
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
 
 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com
 
 
 



Re: Second Cassandra users survey

2011-11-08 Thread Todd Burruss
A use case that could use this (but isn't in my top requests) is usage
history for a given user.  I use a single row to save history per user,
each column is a user action with name a TimeUUID and value is a blob.  I
use the TimeUUID to sort the actions, but I don't really care about exact
time.  after the number of user actions exceeds a threshold, I want to
remove enough to bring the actions back below the threshold.  I could
model this as you say and remove chunks of actions by deleting a row, but
that is more cumbersome for the client.

The reason this isn't in my top requests is regardless if the client or
the server performs this sort of delete, the row must first be read.  By
having the server do it, a network hop is saved as well as implementing a
common usage pattern.

On 11/5/11 2:45 PM, Brandon Williams dri...@gmail.com wrote:

On Fri, Nov 4, 2011 at 9:50 PM, Jim Newsham jnews...@referentia.com
wrote:
 Our use case is time-series data (such as sampled sensor data).  Each
row
 describes a particular statistic over time, the column name is a time,
and
 the column value is the sample.  So it makes perfect sense to want to
delete
 columns for a given time range.  I'm sure there must be numerous other
use
 cases for which using a range of column names makes sense.

Assuming you are bucketing your rows at some interval (as in
http://rubyscale.com/2011/basic-time-series-with-cassandra/), why is
deleting the entire row for the interval not acceptable?

-Brandon



Re: Second Cassandra users survey

2011-11-07 Thread Radim Kolar

Take a look at this:

http://www.oracle.com/technetwork/database/nosqldb/overview/index.html

 I understand the limitation/advantages of the architecture.
Read this http://en.wikipedia.org/wiki/CAP_theorem



Re: Second Cassandra users survey

2011-11-07 Thread Daniel Doubleday
Allow for deterministic / manual sharding of rows.

Right now it seems that there is no way to force rows with different row keys 
will be stored on the same nodes in the ring.
This is our number one reason why we get data inconsistencies when nodes fail.

Sometimes a logical transaction requires writing rows with different row keys. 
If we could use something like this:

prefix.uniquekey and let the partitioner use only the prefix the probability 
that only part of the transaction would be written could be reduced 
considerably.



On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote:

 Hi all,
 
 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]
 
 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?
 
 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.
 
 [1] 
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2] 
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Second Cassandra users survey

2011-11-07 Thread Peter Lin
This feature interests me, so I thought I'd add some comments.

Having used partition features in existing databases like DB2, Oracle
and manual partitioning, one of the biggest challenges is keeping the
partitions balanced. What I've seen with manual partitioning is that
often the partitions get unbalanced. Usually the developers take a
best guess and hope it ends up balanced.

Some of the approaches I've used in the past were zip code, area code,
state and some kind of hash.

So my question related deterministic sharding is this, what rebalance
feature(s) would be useful or needed once the partitions get
unbalanced?

Without a decent plan for rebalancing, it often ends up being a very
painful problem to solve in production. Back when I worked mobile
apps, we saw issues with how OpenWave WAP servers partitioned the
accounts. The early versions randomly assigned a phone to a server
when it is provisioned the first time. Once the phone was associated
to that server, it was stuck on that server. If the load on that
server was heavier than the others, the only choice was to scale up
the hardware.

My understanding of Cassandra's current sharding is consistent and
random. Does the new feature sit some where in-between? Are you
thinking of a pluggable API so that you can provide your own hash
algorithm for cassandra to use?



On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday
daniel.double...@gmx.net wrote:
 Allow for deterministic / manual sharding of rows.

 Right now it seems that there is no way to force rows with different row keys 
 will be stored on the same nodes in the ring.
 This is our number one reason why we get data inconsistencies when nodes fail.

 Sometimes a logical transaction requires writing rows with different row 
 keys. If we could use something like this:

 prefix.uniquekey and let the partitioner use only the prefix the probability 
 that only part of the transaction would be written could be reduced 
 considerably.



 On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote:

 Hi all,

 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]

 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?

 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.

 [1] 
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2] 
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




Re: Second Cassandra users survey

2011-11-07 Thread Flavio Baronti

We are using Cassandra for time series storage.
Strong points: write performance.
Pain points: dinamically adding column families as new time series come in. Caused a lot of headaches, mismatchers 
between nodes, etc. In the end we just put everything together in a single (huge) column family.
Wish list: A decent GUI to explore data kept in Cassandra would be much valuable. It should also be extendable to 
provide viewers for custom data.



Il 11/1/2011 23:59 PM, Jonathan Ellis ha scritto:

Hi all,

Two years ago I asked for Cassandra use cases and feature requests.
[1]  The results [2] have been extremely useful in setting and
prioritizing goals for Cassandra development.  But with the release of
1.0 we've accomplished basically everything from our original wish
list. [3]

I'd love to hear from modern Cassandra users again, especially if
you're usually a quiet lurker.  What does Cassandra do well?  What are
your pain points?  What's your feature wish list?

As before, if you're in stealth mode or don't want to say anything in
public, feel free to reply to me privately and I will keep it off the
record.

[1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
[2] 
http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
[3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html





Re: Second Cassandra users survey

2011-11-07 Thread Radim Kolar
 So my question related deterministic sharding is this, what 
rebalance feature(s) would be useful or needed once the partitions get 
unbalanced?


In current cassandra you can use nodetool move for rebalancing. Its 
fast operation, portion of existing data is moved to new server.




Re: Second Cassandra users survey

2011-11-07 Thread Jeremiah Jordan
Actually, the data will be visible at QUORUM as well if you can see it 
with ONE.  QUORUM actually gives you a higher chance of seeing the new 
value than ONE does.  In the case of R=3 you have 2/3 chance of seeing 
the new value with QUORUM, with ONE you have 1/3...  And this JIRA fixed 
an issue where two QUORUM reads in a row could give you the NEW value 
and then the OLD value.


https://issues.apache.org/jira/browse/CASSANDRA-2494

So quorum read on fail for a single row always gives consistent results 
now.  For multiple rows your still have issues, but you can always 
mitigate that in app with something like giving all of the changes the 
same time stamp, and then on read checking to make sure the time stamps 
match, and reading the data again if they don't.


I'm not arguing against atomic batch operations, they would be nice =).  
Just clarifying how things work now.


-Jeremiah

On 11/06/2011 02:05 PM, Pierre Chalamet wrote:

- support for atomic operations or batches (if QUORUM fails, data should

not be visible with ONE)

zookeeper is solving that.

I might have screwed up a little bit since I didn't talk about isolation;
let's reformulate: support for read committed (using DB terminology).
Cassandra is more like read uncommitted.
Even if row mutations in one CF for one key are atomic on one server , stuff
is not rolled back when the CL can't be satisfied at the coordinator level.
Data won't be visible at QUORUM level, but when using weaker CL, invalid
data can appear imho.
Also it should be possible to tell which operations failed with batch_mutate
but unfortunately it is not


Re: Second Cassandra users survey

2011-11-07 Thread Jeremiah Jordan

- Batch read/slice from multiple column families.


On 11/01/2011 05:59 PM, Jonathan Ellis wrote:

Hi all,

Two years ago I asked for Cassandra use cases and feature requests.
[1]  The results [2] have been extremely useful in setting and
prioritizing goals for Cassandra development.  But with the release of
1.0 we've accomplished basically everything from our original wish
list. [3]

I'd love to hear from modern Cassandra users again, especially if
you're usually a quiet lurker.  What does Cassandra do well?  What are
your pain points?  What's your feature wish list?

As before, if you're in stealth mode or don't want to say anything in
public, feel free to reply to me privately and I will keep it off the
record.

[1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
[2] 
http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
[3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html



Re: Second Cassandra users survey

2011-11-07 Thread Ed Anuff
This is basically what entity groups are about -
https://issues.apache.org/jira/browse/CASSANDRA-1684

On Mon, Nov 7, 2011 at 5:26 AM, Peter Lin wool...@gmail.com wrote:
 This feature interests me, so I thought I'd add some comments.

 Having used partition features in existing databases like DB2, Oracle
 and manual partitioning, one of the biggest challenges is keeping the
 partitions balanced. What I've seen with manual partitioning is that
 often the partitions get unbalanced. Usually the developers take a
 best guess and hope it ends up balanced.

 Some of the approaches I've used in the past were zip code, area code,
 state and some kind of hash.

 So my question related deterministic sharding is this, what rebalance
 feature(s) would be useful or needed once the partitions get
 unbalanced?

 Without a decent plan for rebalancing, it often ends up being a very
 painful problem to solve in production. Back when I worked mobile
 apps, we saw issues with how OpenWave WAP servers partitioned the
 accounts. The early versions randomly assigned a phone to a server
 when it is provisioned the first time. Once the phone was associated
 to that server, it was stuck on that server. If the load on that
 server was heavier than the others, the only choice was to scale up
 the hardware.

 My understanding of Cassandra's current sharding is consistent and
 random. Does the new feature sit some where in-between? Are you
 thinking of a pluggable API so that you can provide your own hash
 algorithm for cassandra to use?



 On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday
 daniel.double...@gmx.net wrote:
 Allow for deterministic / manual sharding of rows.

 Right now it seems that there is no way to force rows with different row 
 keys will be stored on the same nodes in the ring.
 This is our number one reason why we get data inconsistencies when nodes 
 fail.

 Sometimes a logical transaction requires writing rows with different row 
 keys. If we could use something like this:

 prefix.uniquekey and let the partitioner use only the prefix the probability 
 that only part of the transaction would be written could be reduced 
 considerably.



 On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote:

 Hi all,

 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]

 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?

 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.

 [1] 
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2] 
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





RE: Second Cassandra users survey

2011-11-07 Thread Deeter, Derek
I second transparent disk encryption.
Also:
Matching column names via 'like' and %wildcards
Parameterized CQL plus Support for 'AND' and 'OR'
Bulk row deletion.
Also, more clarification on various parameters and configuration - If you are 
doing this, change 

Thanks for the opportunity,
-Derek

--
Derek Deeter, Sr. Software Engineer Intuit Financial 
Services
(818) 597-5932  (x76932)5601 Lindero Canyon Rd.
derek.dee...@digitalinsight.com Westlake, CA 91362
 


-Original Message-
From: Mohit Anchlia [mailto:mohitanch...@gmail.com] 
Sent: Sunday, November 06, 2011 10:58 AM
To: user@cassandra.apache.org
Subject: Re: Second Cassandra users survey

Transparent on disk encryption with pluggable keyprovider will also be
really helpful to secure sensitive information.

On Sun, Nov 6, 2011 at 9:42 AM, Aaron Turner synfina...@gmail.com wrote:
 The intent was to have a lighter solution for common problems then
 having to go with Hadoop or streaming large quantities of data back to
 the client.  Is this feature creep?  Yeah, prolly.  Is it useful?
 Yes.  If it can't be done well, then it probably shouldn't be done,
 but it never hurts to ask. :)

 On Sun, Nov 6, 2011 at 9:13 AM, Sarah Baker sba...@mspot.com wrote:
 Isn't this sort of heading on the slippery slope of things that weigh you 
 down?
 It was my understanding that Cassandra was stick to your core competency 
 sort of database
 that really wanted to leave such utilities external.  At its core was get 
 and put.
 Did I miss something in my reading of intent?
 -Sarah

 -Original Message-
 From: Aaron Turner [mailto:synfina...@gmail.com]
 Sent: Sunday, November 06, 2011 8:25 AM
 To: user@cassandra.apache.org
 Subject: Re: Second Cassandra users survey

 1. Basic SQL-like summary transforms for both CQL and Thrift API clients 
 like:

 SUM
 AVG
 MIN
 MAX






 --
 Aaron Turner
 http://synfin.net/         Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
     -- Benjamin Franklin
 carpe diem quam minimum credula postero



Re: Second Cassandra users survey

2011-11-07 Thread Ian Danforth


 Wish list: A decent GUI to explore data kept in Cassandra would be much
 valuable. It should also be extendable to
 provide viewers for custom data.


+1 to that.

@jonathan - This is what google moderator is really good at. Perhaps start
one and move the idea creation / voting there.


Re: Second Cassandra users survey

2011-11-07 Thread Daniel Doubleday
Well - given the example in our case the prefix that determines the endpoints 
where a token should be routed to could be something like a user-id

so with 

key = userid + . + userthingid;

instead of

// this is happening right now
getEndpoints(hash(key))

you would have

getEndpoints(userid)

Since count(users) is much larger than number of nodes in the ring we would 
still have a balanced cluster.

I guess what we would need is something like a compound row key

You could almost do something like this with the current code base but I 
remember that there are certain assumptions about how keys translate to tokens 
on the ring make this impossible. 

But in essence this would result in another partitioner implementation. 
So you'd have OrderPreserverPartitioner, RandomPartitioner and maybe 
ShardedPartitioner


On Nov 7, 2011, at 2:26 PM, Peter Lin wrote:

 This feature interests me, so I thought I'd add some comments.
 
 Having used partition features in existing databases like DB2, Oracle
 and manual partitioning, one of the biggest challenges is keeping the
 partitions balanced. What I've seen with manual partitioning is that
 often the partitions get unbalanced. Usually the developers take a
 best guess and hope it ends up balanced.
 
 Some of the approaches I've used in the past were zip code, area code,
 state and some kind of hash.
 
 So my question related deterministic sharding is this, what rebalance
 feature(s) would be useful or needed once the partitions get
 unbalanced?
 
 Without a decent plan for rebalancing, it often ends up being a very
 painful problem to solve in production. Back when I worked mobile
 apps, we saw issues with how OpenWave WAP servers partitioned the
 accounts. The early versions randomly assigned a phone to a server
 when it is provisioned the first time. Once the phone was associated
 to that server, it was stuck on that server. If the load on that
 server was heavier than the others, the only choice was to scale up
 the hardware.
 
 My understanding of Cassandra's current sharding is consistent and
 random. Does the new feature sit some where in-between? Are you
 thinking of a pluggable API so that you can provide your own hash
 algorithm for cassandra to use?
 
 
 
 On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday
 daniel.double...@gmx.net wrote:
 Allow for deterministic / manual sharding of rows.
 
 Right now it seems that there is no way to force rows with different row 
 keys will be stored on the same nodes in the ring.
 This is our number one reason why we get data inconsistencies when nodes 
 fail.
 
 Sometimes a logical transaction requires writing rows with different row 
 keys. If we could use something like this:
 
 prefix.uniquekey and let the partitioner use only the prefix the probability 
 that only part of the transaction would be written could be reduced 
 considerably.
 
 
 
 On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote:
 
 Hi all,
 
 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]
 
 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?
 
 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.
 
 [1] 
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2] 
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
 
 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com
 
 



Re: Second Cassandra users survey

2011-11-07 Thread Brian O'Neill
It should be dead-simple to build a slick GUI on the REST layer.
(@Virgilhttp://code.google.com/a/apache-extras.org/p/virgil/
)

I had planned to crank one out this week (using ExtJS) that mimicked the
Squirrel/Toad look and feel.  The UI would have a tree-panel of keyspaces
and column families on the left. Then the main panel would be partitioned
into two.  The top of the main panel would would allow a user to type in
CQL/Pig, etc.  The bottom of the main panel would show the data contained
in the column family / result set.  Any other thoughts on design before I
get started?

If we build this based on the JSON/REST interface, it should be pretty easy
to embed in other applications.

-brian

On Mon, Nov 7, 2011 at 2:36 PM, Ian Danforth idanfo...@numenta.com wrote:


 Wish list: A decent GUI to explore data kept in Cassandra would be much
 valuable. It should also be extendable to
 provide viewers for custom data.


 +1 to that.

 @jonathan - This is what google moderator is really good at. Perhaps start
 one and move the idea creation / voting there.





-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: Second Cassandra users survey

2011-11-07 Thread Colin Taylor
Decompression without compression (for lack of a better name).

We store into Cassandra log batches that come in over http either
uncompressed, deflate, snappy. We just add 'magic e.g. \0 \s \n \a \p
\p \y  as a prefix to the column value so we can decode it when serve
it back up.

Seems like Cassandra could detect data with the appropriate magic,
store as is and decode for us automatically on the way back.

Colin.


Re: Second Cassandra users survey

2011-11-06 Thread Aaron Turner
1. Basic SQL-like summary transforms for both CQL and Thrift API clients like:

SUM
AVG
MIN
MAX

2. Native 64bit UNsigned datatype

3. Add support for matching column names via LIKE (% and _ wildcards)
for ascii type




-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
carpe diem quam minimum credula postero


RE: Second Cassandra users survey

2011-11-06 Thread Sarah Baker
Isn't this sort of heading on the slippery slope of things that weigh you down?
It was my understanding that Cassandra was stick to your core competency sort 
of database
that really wanted to leave such utilities external.  At its core was get and 
put.
Did I miss something in my reading of intent?
-Sarah

-Original Message-
From: Aaron Turner [mailto:synfina...@gmail.com] 
Sent: Sunday, November 06, 2011 8:25 AM
To: user@cassandra.apache.org
Subject: Re: Second Cassandra users survey

1. Basic SQL-like summary transforms for both CQL and Thrift API clients like:

SUM
AVG
MIN
MAX




Re: Second Cassandra users survey

2011-11-06 Thread Aaron Turner
The intent was to have a lighter solution for common problems then
having to go with Hadoop or streaming large quantities of data back to
the client.  Is this feature creep?  Yeah, prolly.  Is it useful?
Yes.  If it can't be done well, then it probably shouldn't be done,
but it never hurts to ask. :)

On Sun, Nov 6, 2011 at 9:13 AM, Sarah Baker sba...@mspot.com wrote:
 Isn't this sort of heading on the slippery slope of things that weigh you 
 down?
 It was my understanding that Cassandra was stick to your core competency 
 sort of database
 that really wanted to leave such utilities external.  At its core was get 
 and put.
 Did I miss something in my reading of intent?
 -Sarah

 -Original Message-
 From: Aaron Turner [mailto:synfina...@gmail.com]
 Sent: Sunday, November 06, 2011 8:25 AM
 To: user@cassandra.apache.org
 Subject: Re: Second Cassandra users survey

 1. Basic SQL-like summary transforms for both CQL and Thrift API clients like:

 SUM
 AVG
 MIN
 MAX






-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
carpe diem quam minimum credula postero


Re: Second Cassandra users survey

2011-11-06 Thread Mohit Anchlia
Transparent on disk encryption with pluggable keyprovider will also be
really helpful to secure sensitive information.

On Sun, Nov 6, 2011 at 9:42 AM, Aaron Turner synfina...@gmail.com wrote:
 The intent was to have a lighter solution for common problems then
 having to go with Hadoop or streaming large quantities of data back to
 the client.  Is this feature creep?  Yeah, prolly.  Is it useful?
 Yes.  If it can't be done well, then it probably shouldn't be done,
 but it never hurts to ask. :)

 On Sun, Nov 6, 2011 at 9:13 AM, Sarah Baker sba...@mspot.com wrote:
 Isn't this sort of heading on the slippery slope of things that weigh you 
 down?
 It was my understanding that Cassandra was stick to your core competency 
 sort of database
 that really wanted to leave such utilities external.  At its core was get 
 and put.
 Did I miss something in my reading of intent?
 -Sarah

 -Original Message-
 From: Aaron Turner [mailto:synfina...@gmail.com]
 Sent: Sunday, November 06, 2011 8:25 AM
 To: user@cassandra.apache.org
 Subject: Re: Second Cassandra users survey

 1. Basic SQL-like summary transforms for both CQL and Thrift API clients 
 like:

 SUM
 AVG
 MIN
 MAX






 --
 Aaron Turner
 http://synfin.net/         Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
     -- Benjamin Franklin
 carpe diem quam minimum credula postero



RE: Second Cassandra users survey

2011-11-06 Thread Pierre Chalamet

- support for atomic operations or batches (if QUORUM fails, data should
not be visible with ONE) 
zookeeper is solving that.

Yeah, I can use HBase too.

I might have screwed up a little bit since I didn't talk about isolation;
let's reformulate: support for read committed (using DB terminology).
Cassandra is more like read uncommitted.
Even if row mutations in one CF for one key are atomic on one server , stuff
is not rolled back when the CL can't be satisfied at the coordinator level.
Data won't be visible at QUORUM level, but when using weaker CL, invalid
data can appear imho.
Also it should be possible to tell which operations failed with batch_mutate
but unfortunately it is not.

HBase has a clear semantic on mutations
(http://hbase.apache.org/acid-semantics.html), maybe a similar page could be
written to tell the way Cassandra is handling this. I don't know if some
materials already exists in the wiki, but I can try to write something about
this.

Anyway, all in all, I'd like to see read committed appears at some point in
Cassandra. The way Cassandra is working might render this a bit hard to
introduce - but anyway, I'm just expressing my own feedback on this even if
this is far away from the weak consistency Cassandra offer.

 - TTL on CF, rows and counters
 TTL on counters will be nice, but i am good with rest as it is

Actually, counters have to be handle specifically to work around the non TTL
support. This would be nice to unify storage capabilities.

While talking about counters, I will add to the wish list the capability to
mix normal columns with counter columns in the same CF allowing easy
dropping of a row or atomic mutations on one row.




Re: Second Cassandra users survey

2011-11-06 Thread Ed Anuff
On Sun, Nov 6, 2011 at 12:52 AM, Radim Kolar h...@sendmail.cz wrote:
 - support for atomic operations or batches (if QUORUM fails, data should not
 be visible with ONE)
 zookeeper is solving that.

I'd like to see official support for Zookeeper inside of Cassandra.
I'd like it to be something that can be optionally configured.  I'd
like to be able to make batch mutations atomic using it.


Re: Second Cassandra users survey

2011-11-06 Thread Robert Jackson
On Nov 6, 2011, at 3:41 PM, Ed Anuff e...@anuff.com wrote:

 I'd like to see official support for Zookeeper inside of Cassandra.
 I'd like it to be something that can be optionally configured.  I'd
 like to be able to make batch mutations atomic using it.

Not sure how possible this is, but we are forced to use a Zookeeper component 
in some of our applications due to the need for a small number atomic updates 
to multiple rows or CF's. 

I would also like to see the ability to store counter columns alongside of 
regular columns. It would definitely alleviate some of the need for the 
transactional qualities of Zookeeper as it would remove one of the bigger 
reasons we have to update two CF's at a time. (Inserting new columns into 
regular CF and increment counter of said columns.) 

Another possible solution is to allow some sort of efficient counting of 
columns for a given row-key. To take that even further if we could couple that 
ability with the current composite column functionality then we could get an 
efficient count of columns containing a particular prefix. 

Robert Jackson



Re: Second Cassandra users survey

2011-11-06 Thread Radim Kolar

Yeah, I can use HBase too.
but why you are not using hbase if its feature set fits your needs 
better and  want to have same functionality in cassandra? Its good that 
both projects are different in this area. From rest of your post it 
looks like you want to have cassandra ACID compliant, which is against 
its design ideas. If you want ACID compliant nosql engine then there are 
few others not only hbase.



Cassandra is more like read uncommitted.

yes.

Even if row mutations in one CF for one key are atomic on one server , stuff
is not rolled back when the CL can't be satisfied at the coordinator level.
Data won't be visible at QUORUM level, but when using weaker CL, invalid
data can appear imho.
Thats right. Its responsibility of application designer to code 
application in that way - use correct CL. In SQL databases its server 
responsibility to deal with inconsistent data, but in nosql its client 
responsibility. In reality its not problem because you have your 
applications under control. This problem might be worked around by 
cassandra core if additional settings are added to CF - minimum CL 
levels for read/write. Submit feature request to jira if you are 
interested in that.


RE: Second Cassandra users survey

2011-11-06 Thread Pierre Chalamet
I do not want to use HBase because Cassandra is way far easier to deploy and
it is working pretty well - and for the 99% of our apps the model fits
perfectly. The other 1% has a workaround by ordering writes. I assume the
trade off anyway :)

Don't miss the point: I love Cassandra and the way it works, and I
understand the limitation/advantages of the architecture - I'm just saying
it could be nice to have something stronger (understand more guarantee) when
updating several columns - the pain area is with updating regular column and
counter in order to keep everything consistent for one key. Actually,
everything is under control as you say - so it's not a real problem and we
can live without.

But since this is a discussion about wishes, I'm not shy asking for the moon
:)


-Original Message-
From: Radim Kolar [mailto:h...@sendmail.cz] 
Sent: Monday, November 07, 2011 8:02 AM
To: user@cassandra.apache.org
Subject: Re: Second Cassandra users survey

 Yeah, I can use HBase too.
but why you are not using hbase if its feature set fits your needs 
better and  want to have same functionality in cassandra? Its good that 
both projects are different in this area. From rest of your post it 
looks like you want to have cassandra ACID compliant, which is against 
its design ideas. If you want ACID compliant nosql engine then there are 
few others not only hbase.

 Cassandra is more like read uncommitted.
yes.
 Even if row mutations in one CF for one key are atomic on one server ,
stuff
 is not rolled back when the CL can't be satisfied at the coordinator
level.
 Data won't be visible at QUORUM level, but when using weaker CL, invalid
 data can appear imho.
Thats right. Its responsibility of application designer to code 
application in that way - use correct CL. In SQL databases its server 
responsibility to deal with inconsistent data, but in nosql its client 
responsibility. In reality its not problem because you have your 
applications under control. This problem might be worked around by 
cassandra core if additional settings are added to CF - minimum CL 
levels for read/write. Submit feature request to jira if you are 
interested in that.



RE: Second Cassandra users survey

2011-11-05 Thread Pierre Chalamet
Dear Santa, here is my wish list :)

- support for atomic operations or batches (if QUORUM fails, data should not
be visible with ONE)

- TTL on CF, rows and counters

- restart the TTL when a row, column or CF is touched

- streamed data transfer (both send  receive). At least for receive
(multiget), it does not support streaming (probably thrift limitation).

- client could be more involved. Actually, it just wait a response from the
coordinator and does nothing interesting meanwhile. It would be nice to let
the client act as the coordinator (network usage improvement, tunable
consistency, send read/write request to one of the natural point
directly...). Probably already doable with the storage API (? - java only
albeit).

- cql is cool but the API is far too simple (it's text, suited for cli). Cql
queries should support parameters (like JDBC '?') ; parameters could be set
in binary format. It would consume less network bandwidth and could easier
client support. cql responses should be streamed too (thrift again...).

- support for authentication / dynamic security configuration. Allow access
for all CF of KS (to support dynamically created CF in KS) or I missed
something.

- Pierre

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: mercredi 2 novembre 2011 00:00
To: user
Subject: Second Cassandra users survey

Hi all,

Two years ago I asked for Cassandra use cases and feature requests.
[1]  The results [2] have been extremely useful in setting and prioritizing
goals for Cassandra development.  But with the release of
1.0 we've accomplished basically everything from our original wish list. [3]

I'd love to hear from modern Cassandra users again, especially if you're
usually a quiet lurker.  What does Cassandra do well?  What are your pain
points?  What's your feature wish list?

As before, if you're in stealth mode or don't want to say anything in
public, feel free to reply to me privately and I will keep it off the
record.

[1]
http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
[2]
http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.htm
l
[3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com



Re: Second Cassandra users survey

2011-11-05 Thread Brandon Williams
On Fri, Nov 4, 2011 at 9:50 PM, Jim Newsham jnews...@referentia.com wrote:
 Our use case is time-series data (such as sampled sensor data).  Each row
 describes a particular statistic over time, the column name is a time, and
 the column value is the sample.  So it makes perfect sense to want to delete
 columns for a given time range.  I'm sure there must be numerous other use
 cases for which using a range of column names makes sense.

Assuming you are bucketing your rows at some interval (as in
http://rubyscale.com/2011/basic-time-series-with-cassandra/), why is
deleting the entire row for the interval not acceptable?

-Brandon


Re: Second Cassandra users survey

2011-11-04 Thread Jim Newsham


- Bulk column deletion by (column name) range.  Without this feature, we 
are forced to perform a range query and iterate over all of the columns, 
deleting them one by one (we do this in a batch, but it's still a very 
slow approach).  See CASSANDRA-494/3448.  If anyone else has a need for 
this issue, please raise your voice, as the feature has been tabled due 
to lack of interest.


On 11/3/2011 11:44 AM, Todd Burruss wrote:

- Better performance when access random columns in a wide row
- caching subsets of wide rows - possibly on the same boundaries as the
index
- some sort of notification architecture when data is inserted.  This
could be co-processors, triggers, plugins, etc
- auto load balance when adding new nodes

On 11/1/11 3:59 PM, Jonathan Ellisjbel...@gmail.com  wrote:


Hi all,

Two years ago I asked for Cassandra use cases and feature requests.
[1]  The results [2] have been extremely useful in setting and
prioritizing goals for Cassandra development.  But with the release of
1.0 we've accomplished basically everything from our original wish
list. [3]

I'd love to hear from modern Cassandra users again, especially if
you're usually a quiet lurker.  What does Cassandra do well?  What are
your pain points?  What's your feature wish list?

As before, if you're in stealth mode or don't want to say anything in
public, feel free to reply to me privately and I will keep it off the
record.

[1]
http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.ht
ml
[2]
http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.h
tml
[3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com




Re: Second Cassandra users survey

2011-11-04 Thread Brandon Williams
On Fri, Nov 4, 2011 at 9:19 PM, Jim Newsham jnews...@referentia.com wrote:
 - Bulk column deletion by (column name) range.  Without this feature, we are
 forced to perform a range query and iterate over all of the columns,
 deleting them one by one (we do this in a batch, but it's still a very slow
 approach).  See CASSANDRA-494/3448.  If anyone else has a need for this
 issue, please raise your voice, as the feature has been tabled due to lack
 of interest.

I think the lack of interest here has been this: it's unusual to want
to delete columns for which you do not know the names, but also not
want to delete the entire row.  Is there any chance you're trying to
delete the entire row, or is it truly the case I just described?

-Brandon


Re: Second Cassandra users survey

2011-11-04 Thread Jim Newsham

On 11/4/2011 4:32 PM, Brandon Williams wrote:

On Fri, Nov 4, 2011 at 9:19 PM, Jim Newshamjnews...@referentia.com  wrote:

- Bulk column deletion by (column name) range.  Without this feature, we are
forced to perform a range query and iterate over all of the columns,
deleting them one by one (we do this in a batch, but it's still a very slow
approach).  See CASSANDRA-494/3448.  If anyone else has a need for this
issue, please raise your voice, as the feature has been tabled due to lack
of interest.

I think the lack of interest here has been this: it's unusual to want
to delete columns for which you do not know the names, but also not
want to delete the entire row.  Is there any chance you're trying to
delete the entire row, or is it truly the case I just described?

-Brandon


Our use case is time-series data (such as sampled sensor data).  Each 
row describes a particular statistic over time, the column name is a 
time, and the column value is the sample.  So it makes perfect sense to 
want to delete columns for a given time range.  I'm sure there must be 
numerous other use cases for which using a range of column names makes 
sense.


Regards,
Jim



Re: Second Cassandra users survey

2011-11-03 Thread Peter Tillotson
I'm using Cassandra as a big graph database, loading large volumes of data live 
and linking on the fly. 
The number of edges grow geometrically with data added, and need to be read to 
continue linking the graph on the fly. 


Consequently, my problem is constrained by:
 * Predominantly read - especially when data gets large and reads are quasi 
random
 * I have lots of data to plow in, to be read
 * Although the problem scale out and possibly all be in RAM, it requires too 
much kit for the to be viable 

So, my findings with Cassandra are:
 * Compaction is expensive, I need it but
   1) It takes away disk IO from my reads
   2) Destroys the file cache
   I've not had chance to do extensive tests with the Level db compaction
 * Compaction has been too hard to configure historically
 * Memory hungry

So for me the biggest features would be
 * Cheaper compaction -   
 * Lower memory usage
 * Indexing dynamic colnames (eg Lucene TermEnum against rowkey:colkey)
   I do a lot of checking against dynamic colnames  
 
The great features are that redundancy, and live addition of shards is 
available out of the box. 


I've also experimented with Golden Orb and Triggered updates, I think there is 
a fair bit that can be achieved in my problem with local data access. Through 
GoldenOrb and Hadoop writables a managed to get both a BigTable and Pregel 
access model onto my Cassandra data. It was schema specific, but provided a 
local compute model. 

p 



From: Jonathan Ellis jbel...@gmail.com
To: user user@cassandra.apache.org
Sent: Tuesday, 1 November 2011, 22:59
Subject: Second Cassandra users survey

Hi all,

Two years ago I asked for Cassandra use cases and feature requests.
[1]  The results [2] have been extremely useful in setting and
prioritizing goals for Cassandra development.  But with the release of
1.0 we've accomplished basically everything from our original wish
list. [3]

I'd love to hear from modern Cassandra users again, especially if
you're usually a quiet lurker.  What does Cassandra do well?  What are
your pain points?  What's your feature wish list?

As before, if you're in stealth mode or don't want to say anything in
public, feel free to reply to me privately and I will keep it off the
record.

[1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
[2] 
http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
[3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Second Cassandra users survey

2011-11-03 Thread Radim Kolar

 * Compaction is expensive
Yes, it is. Thats why i deciced not to go with hadoop hdfs backed by 
cassandra.


Re: Second Cassandra users survey

2011-11-03 Thread Mohit Anchlia
On Thu, Nov 3, 2011 at 5:46 AM, Peter Tillotson slatem...@yahoo.co.uk wrote:
 I'm using Cassandra as a big graph database, loading large volumes of data
 live and linking on the fly.

Not sure if Cassandra is right fit to model complex vertexes and edges.

 The number of edges grow geometrically with data added, and need to be read
 to continue linking the graph on the fly.

 Consequently, my problem is constrained by:
  * Predominantly read - especially when data gets large and reads are quasi
 random
  * I have lots of data to plow in, to be read
  * Although the problem scale out and possibly all be in RAM, it requires
 too much kit for the to be viable
 So, my findings with Cassandra are:
  * Compaction is expensive, I need it but
    1) It takes away disk IO from my reads
    2) Destroys the file cache
    I've not had chance to do extensive tests with the Level db compaction
  * Compaction has been too hard to configure historically
  * Memory hungry
 So for me the biggest features would be
  * Cheaper compaction -
  * Lower memory usage
  * Indexing dynamic colnames (eg Lucene TermEnum against rowkey:colkey)
    I do a lot of checking against dynamic colnames

I agree, some kind of integration with search engine is required to
support adhoc queries as well and searching on column names. This will
be really helpful.

Currently, one of the options is to write in 2 places. Cassandra +
search engine.

 The great features are that redundancy, and live addition of shards is
 available out of the box.

 I've also experimented with Golden Orb and Triggered updates, I think there
 is a fair bit that can be achieved in my problem with local data access.
 Through GoldenOrb and Hadoop writables a managed to get both a BigTable and
 Pregel access model onto my Cassandra data. It was schema specific, but
 provided a local compute model.
 p
 
 From: Jonathan Ellis jbel...@gmail.com
 To: user user@cassandra.apache.org
 Sent: Tuesday, 1 November 2011, 22:59
 Subject: Second Cassandra users survey

 Hi all,

 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]

 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?

 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.

 [1]
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2]
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





Re: Second Cassandra users survey

2011-11-03 Thread Peter Tillotson
  * Indexing dynamic colnames (eg Lucene TermEnum against rowkey:colkey)
    I do a lot of checking against dynamic colnames

I agree, some kind of integration with search engine is required to
support adhoc queries as well and searching on column names. This will
be really helpful.

Currently, one of the options is to write in 2 places. Cassandra +
search engine.


I thought a disk backed skiplist, with every nth rowkey:colkey dragged into 
memory per sstable as per Lucene TermEnum.  



From: Mohit Anchlia mohitanch...@gmail.com
To: user@cassandra.apache.org; Peter Tillotson slatem...@yahoo.co.uk
Sent: Thursday, 3 November 2011, 14:15
Subject: Re: Second Cassandra users survey

On Thu, Nov 3, 2011 at 5:46 AM, Peter Tillotson slatem...@yahoo.co.uk wrote:
 I'm using Cassandra as a big graph database, loading large volumes of data
 live and linking on the fly.

Not sure if Cassandra is right fit to model complex vertexes and edges.

 The number of edges grow geometrically with data added, and need to be read
 to continue linking the graph on the fly.

 Consequently, my problem is constrained by:
  * Predominantly read - especially when data gets large and reads are quasi
 random
  * I have lots of data to plow in, to be read
  * Although the problem scale out and possibly all be in RAM, it requires
 too much kit for the to be viable
 So, my findings with Cassandra are:
  * Compaction is expensive, I need it but
    1) It takes away disk IO from my reads
    2) Destroys the file cache
    I've not had chance to do extensive tests with the Level db compaction
  * Compaction has been too hard to configure historically
  * Memory hungry
 So for me the biggest features would be
  * Cheaper compaction -
  * Lower memory usage
  * Indexing dynamic colnames (eg Lucene TermEnum against rowkey:colkey)
    I do a lot of checking against dynamic colnames

I agree, some kind of integration with search engine is required to
support adhoc queries as well and searching on column names. This will
be really helpful.

Currently, one of the options is to write in 2 places. Cassandra +
search engine.

 The great features are that redundancy, and live addition of shards is
 available out of the box.

 I've also experimented with Golden Orb and Triggered updates, I think there
 is a fair bit that can be achieved in my problem with local data access.
 Through GoldenOrb and Hadoop writables a managed to get both a BigTable and
 Pregel access model onto my Cassandra data. It was schema specific, but
 provided a local compute model.
 p
 
 From: Jonathan Ellis jbel...@gmail.com
 To: user user@cassandra.apache.org
 Sent: Tuesday, 1 November 2011, 22:59
 Subject: Second Cassandra users survey

 Hi all,

 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]

 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?

 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.

 [1]
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2]
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




Re: Second Cassandra users survey

2011-11-03 Thread Ertio Lew
Provide an option to sort columns by timestamp i.e, in the order they have
been added to the row, with the facility to use any column names.

On Wed, Nov 2, 2011 at 4:29 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Hi all,

 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]

 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?

 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.

 [1]
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2]
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Second Cassandra users survey

2011-11-03 Thread Konstantin Naryshkin
I realize that it is not realistic to expect it, but is would be good
to have a Partitioner that supports both range slices and automatic
load balancing.

On Thu, Nov 3, 2011 at 13:57, Ertio Lew ertio...@gmail.com wrote:
 Provide an option to sort columns by timestamp i.e, in the order they have
 been added to the row, with the facility to use any column names.

 On Wed, Nov 2, 2011 at 4:29 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Hi all,

 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]

 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?

 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.

 [1]
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2]
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




Re: Second Cassandra users survey

2011-11-03 Thread Todd Burruss
- Better performance when access random columns in a wide row
- caching subsets of wide rows - possibly on the same boundaries as the
index
- some sort of notification architecture when data is inserted.  This
could be co-processors, triggers, plugins, etc
- auto load balance when adding new nodes

On 11/1/11 3:59 PM, Jonathan Ellis jbel...@gmail.com wrote:

Hi all,

Two years ago I asked for Cassandra use cases and feature requests.
[1]  The results [2] have been extremely useful in setting and
prioritizing goals for Cassandra development.  But with the release of
1.0 we've accomplished basically everything from our original wish
list. [3]

I'd love to hear from modern Cassandra users again, especially if
you're usually a quiet lurker.  What does Cassandra do well?  What are
your pain points?  What's your feature wish list?

As before, if you're in stealth mode or don't want to say anything in
public, feel free to reply to me privately and I will keep it off the
record.

[1] 
http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.ht
ml
[2] 
http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.h
tml
[3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com



Re: Second Cassandra users survey

2011-11-02 Thread Patrick Julien
- entity groups
- co-processors
- materialized views
- CQL support directly in cassandra-cli

On Tue, Nov 1, 2011 at 6:59 PM, Jonathan Ellis jbel...@gmail.com wrote:
 Hi all,

 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]

 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?

 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.

 [1] 
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2] 
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Second Cassandra users survey

2011-11-02 Thread Boris Yen
1. entity groups
2. cql support in cassandra-cli.
3. offset support in slice_range.
4. more sophisticated secondary index implementation.

On Wed, Nov 2, 2011 at 8:38 PM, Patrick Julien pjul...@gmail.com wrote:

 - entity groups
 - co-processors
 - materialized views
 - CQL support directly in cassandra-cli

 On Tue, Nov 1, 2011 at 6:59 PM, Jonathan Ellis jbel...@gmail.com wrote:
  Hi all,
 
  Two years ago I asked for Cassandra use cases and feature requests.
  [1]  The results [2] have been extremely useful in setting and
  prioritizing goals for Cassandra development.  But with the release of
  1.0 we've accomplished basically everything from our original wish
  list. [3]
 
  I'd love to hear from modern Cassandra users again, especially if
  you're usually a quiet lurker.  What does Cassandra do well?  What are
  your pain points?  What's your feature wish list?
 
  As before, if you're in stealth mode or don't want to say anything in
  public, feel free to reply to me privately and I will keep it off the
  record.
 
  [1]
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
  [2]
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
  [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 



Re: Second Cassandra users survey

2011-11-01 Thread Ramesh Natarajan
Here is my wish list -

I would love Cassandra to

- provide a efficient method to retrieve the count of columns for a
given row without resorting to read all columns and calculate the
count for a given row key.

- support auto increment column names -  Column slice based query
doesn't  take advantage of  the Column Bloom Filter and it is not
always easy to enumerate the column names in a deterministic manner.

- provide JNA support for Key cache

- remove dependency on running node tool repair when any column is
deleted ( so tombstones doesn't get resurrected )

thanks
Ramesh




On Tue, Nov 1, 2011 at 5:59 PM, Jonathan Ellis jbel...@gmail.com wrote:
 Hi all,

 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]

 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?

 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.

 [1] 
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2] 
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com