schema design question

2010-03-08 Thread Matteo Caprari
Hi. We have a collection operation that generates documents like this: item: { id: unique item id, title: ..., liked_by: [user_2, user_3, ...] } The liked_by list contains on average 100 unique users. Users may also appear in other items. Our database contains a few million entries and is

Use Case scenario: Keeping a window of data + online analytics

2010-03-08 Thread Aníbal Rojas
Hello, Have been testing alternatives for MySQL / Postgres based app with the following characteristics: - A high rate of inserts. Heavy bursts are expected. - A high rate of deletes to remove old data. We keep a window, as old data is not relevant. - Online analytics based on

Re: Use Case scenario: Keeping a window of data + online analytics

2010-03-08 Thread Daniel Lundin
A few comments on building a time-series store in Cassandra... Using the timestamp dimension of columns, reusing columns, could prove quite useful. This allows simple use of batch_mutate deletes (new in 0.6) to purge old data outside the active time window. Otherwise, performance wise, deletes

Reason for not allowing null values for in Column

2010-03-08 Thread Erik Holstad
Hey! Been looking at the src and have a couple of questions: Why is it that null column values are not allowed? What is the reason for using a ConcurrentSkipListMapbyte[], IColumn for columns_ in ColumnFamily compared to using the set version and use the comparator to sort on the name field in

Re: Reason for not allowing null values for in Column

2010-03-08 Thread Erik Holstad
On Mon, Mar 8, 2010 at 9:10 AM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Mar 8, 2010 at 11:07 AM, Erik Holstad erikhols...@gmail.com wrote: Why is it that null column values are not allowed? It's semantically unnecessary and potentially harmful at an implementation level. (Many

Re: Reason for not allowing null values for in Column

2010-03-08 Thread Jonathan Ellis
On Mon, Mar 8, 2010 at 11:22 AM, Erik Holstad erikhols...@gmail.com wrote: I was probably a little bit unclear here. I'm wondering about the two byte[] in Column. One for name and one for value. I was under the impression that the skiplistmap wraps the Columns, not that the name and the value

Re: Reason for not allowing null values for in Column

2010-03-08 Thread Erik Holstad
On Mon, Mar 8, 2010 at 10:14 AM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Mar 8, 2010 at 12:07 PM, Erik Holstad erikhols...@gmail.com wrote: So why is it again that the value field in the Column cannot be null if it is not the value field in the map, but just a part of the value

RE: Testing row cache feature in trunk: write should put record in cache

2010-03-08 Thread Daniel Kluesing
This is interesting for the use cases I'm looking at Cassandra for, so if that offer still stands I'll take you up on it. I took a crack at it in https://issues.apache.org/jira/browse/CASSANDRA-860 - also in large part to get my feet wet with the code. -Original Message- From:

RE: Latest check-in to trunk/ is broken

2010-03-08 Thread Stu Hood
Run `ant clean` before building. A few files moved around. -Original Message- From: Cool BSD c...@coolbsd.com Sent: Monday, March 8, 2010 5:18pm To: cassandra-user cassandra-user@incubator.apache.org Subject: Latest check-in to trunk/ is broken version info: $ svn info Path: . URL:

Re: DigestMismatchException

2010-03-08 Thread Jonathan Ellis
It means that you're doing a lot of reads that saw multiple versions of the answer, which depending on your workload may be normal On Mon, Mar 8, 2010 at 5:31 PM, B. Todd Burruss bburr...@real.com wrote: i am seeing a lot of these INFO level messages in cassandra server's logs: 2010-03-08

Re: DigestMismatchException

2010-03-08 Thread B. Todd Burruss
i'm doing quorum reads and quorum writes with N=3 and 4 node cluster. i am updating values in cassandra cluster at a fairly high rate. so does this mean that a read is obtaining its two values (because of quorum) and one of them must have been from the third replica that may not have been

Cassandra latency question

2010-03-08 Thread David Dabbs
Hello. I've been running the vPork load generator against two Cassandra nodes running in VMs. I'm running a trunk build with W=2 and R=1 and out-of-the-box JVM_OPTS which should be fine, or so I thought. Throughput is lower than I expected. Are my expectations out-of-line? Thanks, David

Re: Cassandra latency question

2010-03-08 Thread Jonathan Ellis
something is screwed up if writes are 10x slower than reads On Mon, Mar 8, 2010 at 5:52 PM, David Dabbs dmda...@gmail.com wrote: Hello. I've been running the vPork load generator against two Cassandra nodes running in VMs. I'm running a trunk build with W=2 and R=1 and out-of-the-box JVM_OPTS

Re: schema design question

2010-03-08 Thread Jonathan Ellis
On Mon, Mar 8, 2010 at 6:18 AM, Matteo Caprari matteo.capr...@gmail.com wrote: The 'key' queries are: These map straightforwardly to one CF per query. - list all the items a user liked row key is user id, columns names are timeuuid of when the like-ing occurred, column value is either item

Re: DigestMismatchException

2010-03-08 Thread Jonathan Ellis
yes. On Mon, Mar 8, 2010 at 5:40 PM, B. Todd Burruss bburr...@real.com wrote: i'm doing quorum reads and quorum writes with N=3 and 4 node cluster.  i am updating values in cassandra cluster at a fairly high rate. so does this mean that a read is obtaining its two values (because of quorum)

Re: schema design question

2010-03-08 Thread Keith Thornhill
jonathan, wouldn't using Long values as the column names for the 3rd CF cause potential conflicts if 2 users liked the same # of items? (only saving one user for any given value) was thinking about this same problem (sorted lists of top N user activity) and thought that was a roadblock for that