Re: Cassandra range scans

2010-02-22 Thread Peter Schüller
  1) would you consider Cassandra (0.5+) safe enough for a primary data 

 Yes.  Several companies are deploying 0.5 in production.  It's pretty
 solid.  (We'll have a 0.5.1 fixing some minor issues RSN, and a 0.6
 beta.)  And I agree that it's significantly simpler to deploy (and
 keep running) than HBase.

Is there any public information on the production installations you
mention above? So far Googling/browsing we have mostly (only I think)
seen production cases where the data has been possible to re-generate
in the event of loss. As we are evaluating use of Cassandra for
important data, we would be interested in hearing any success or
failure stories.

/ Peter Schuller aka scode

Re: Cassandra range scans

2010-02-21 Thread Jonathan Ellis
[replying to list, with permission]

On Mon, Feb 22, 2010 at 12:05 AM, wrote:
 I'm looking for a very scalable primary data store for a large web/API 
 application. Our data consists largely of lists of things, per user. So a 
 user has a bunch (dozens to hundreds) of thing A, some of thing B, a few of 
 thing C, etc. There are social elements to the app w/ shared data, so that 
 could be modeled with each user having a list of pointers, but with writes 
 being super cheap I'm more inclined to write everything everywhere (that's a 
 side issue, but it's in the back of my mind). Users number in the millions.

 So basically I'm looking for something scalable, available, fast, and with 
 native support for range scans (given that almost every query is fetching 
 some list of things). This is where my questions lie... I'm pretty familiar 
 with the Bigtable model and it suits my needs quite well, I would store thing 
 A under a row key of userid.thingid (or similar) and then a range scan over 
 userid. will pick them all up at once.

 HBase has been top of my list in terms of data model, but I ran across a 
 performance study which suggested it's questionable and the complexity of 
 components gives me some pause. So Cassandra seems the other obvious choice. 
 However, the data model isn't as clear to me (at least, not yet, which is 
 probably just a terminology problem).

 My questions:

  1) would you consider Cassandra (0.5+) safe enough for a primary data 

Yes.  Several companies are deploying 0.5 in production.  It's pretty
solid.  (We'll have a 0.5.1 fixing some minor issues RSN, and a 0.6
beta.)  And I agree that it's significantly simpler to deploy (and
keep running) than HBase.

  2) is the row key model I suggested above the best approach in Cassandra, or 
 is there something better? My testing so far has been using get_range_slice 
 with a ColumnParent of just the CF and SlicePredicate listing the columns I 
 want (though really I want all columns, is there a shorthand for that?)

Cassandra deals fine with millions of columns per row, and allows
prefix queries on columns too.  So an alternate model would be to have
userX as row key, and column keys A:1, A:2, A:3, ..., B:1, B:2, B:3,
  This will be marginally faster than splitting by row, and has
the added advantage of not requiring OPP.

You could use supercolumns here too (where the supercolumn name is the
thing type).  If you always want to retrieve all things of type A at a
time per user, then that is a more natural fit.  (Otherwise, the lack
of subcolumn indexing could be a performance gotcha for you:

  3) schema changes (i.e. adding a new CF)... seems like currently you take 
 the whole cluster down to accomplish this... is that likely to change in the 

You have to take each node down, but a rolling restart is fine.  No
reason for the whole cluster to be down at once.

We're planning to make CF changes doable against live nodes for 0.7,

  4) any tuning suggestions for this kind of setup? (primary data store using 
 OrderPreservingPartitioner doing lots of range scans, etc.)

Nothing unusual -- just the typical try to have enough RAM to cache
your 'hot' data set.

  5) I noticed mention in some discussion that the OrderPreserving mode is not 
 as well utilized and is probably in need of optimizations... how serious is 
 that, and are there people working on that, or is help needed?

We have range queries in our stress testing tool now, and with Hadoop
integration coming in 0.6 I expect it will get a lot more testing.
Certainly anyone who wants to get their hands dirty is welcome. :)

  6) hardware... we could certainly choose to go with pretty beefy hardware, 
 especially in terms of RAM... is there a point where it just isn't useful?

Some recommdendations in  In general, don't
go beyond the knee of the price/performance curve, since you can
always add more nodes instead.

Past enough for your memtables
(, RAM is only useful
for caching reads, it won't help write performance.  So that's the
main factor in how much do I need.