[replying to list, with permission]
On Mon, Feb 22, 2010 at 12:05 AM, jeremey.barr...@nokia.com wrote:
I'm looking for a very scalable primary data store for a large web/API
application. Our data consists largely of lists of things, per user. So a
user has a bunch (dozens to hundreds) of thing A, some of thing B, a few of
thing C, etc. There are social elements to the app w/ shared data, so that
could be modeled with each user having a list of pointers, but with writes
being super cheap I'm more inclined to write everything everywhere (that's a
side issue, but it's in the back of my mind). Users number in the millions.
So basically I'm looking for something scalable, available, fast, and with
native support for range scans (given that almost every query is fetching
some list of things). This is where my questions lie... I'm pretty familiar
with the Bigtable model and it suits my needs quite well, I would store thing
A under a row key of userid.thingid (or similar) and then a range scan over
userid. will pick them all up at once.
HBase has been top of my list in terms of data model, but I ran across a
performance study which suggested it's questionable and the complexity of
components gives me some pause. So Cassandra seems the other obvious choice.
However, the data model isn't as clear to me (at least, not yet, which is
probably just a terminology problem).
My questions:
1) would you consider Cassandra (0.5+) safe enough for a primary data
store?
Yes. Several companies are deploying 0.5 in production. It's pretty
solid. (We'll have a 0.5.1 fixing some minor issues RSN, and a 0.6
beta.) And I agree that it's significantly simpler to deploy (and
keep running) than HBase.
2) is the row key model I suggested above the best approach in Cassandra, or
is there something better? My testing so far has been using get_range_slice
with a ColumnParent of just the CF and SlicePredicate listing the columns I
want (though really I want all columns, is there a shorthand for that?)
Cassandra deals fine with millions of columns per row, and allows
prefix queries on columns too. So an alternate model would be to have
userX as row key, and column keys A:1, A:2, A:3, ..., B:1, B:2, B:3,
This will be marginally faster than splitting by row, and has
the added advantage of not requiring OPP.
You could use supercolumns here too (where the supercolumn name is the
thing type). If you always want to retrieve all things of type A at a
time per user, then that is a more natural fit. (Otherwise, the lack
of subcolumn indexing could be a performance gotcha for you:
http://issues.apache.org/jira/browse/CASSANDRA-598).
3) schema changes (i.e. adding a new CF)... seems like currently you take
the whole cluster down to accomplish this... is that likely to change in the
future?
You have to take each node down, but a rolling restart is fine. No
reason for the whole cluster to be down at once.
We're planning to make CF changes doable against live nodes for 0.7,
in https://issues.apache.org/jira/browse/CASSANDRA-44.
4) any tuning suggestions for this kind of setup? (primary data store using
OrderPreservingPartitioner doing lots of range scans, etc.)
Nothing unusual -- just the typical try to have enough RAM to cache
your 'hot' data set.
5) I noticed mention in some discussion that the OrderPreserving mode is not
as well utilized and is probably in need of optimizations... how serious is
that, and are there people working on that, or is help needed?
We have range queries in our stress testing tool now, and with Hadoop
integration coming in 0.6 I expect it will get a lot more testing.
Certainly anyone who wants to get their hands dirty is welcome. :)
6) hardware... we could certainly choose to go with pretty beefy hardware,
especially in terms of RAM... is there a point where it just isn't useful?
Some recommdendations in
http://wiki.apache.org/cassandra/CassandraHardware. In general, don't
go beyond the knee of the price/performance curve, since you can
always add more nodes instead.
Past enough for your memtables
(http://wiki.apache.org/cassandra/MemtableSSTable), RAM is only useful
for caching reads, it won't help write performance. So that's the
main factor in how much do I need.
-Jonathan