Re: Understanding atomicity in Cassandra

Jonathan Ellis Fri, 09 Jul 2010 11:51:34 -0700

typically you will update both as part of a batch_mutate, and if it
fails, retry the operation.  re-writing any part that succeeded will
be harmless.


On Thu, Jul 8, 2010 at 11:13 AM, Stuart Langridge
<[email protected]> wrote:
> Hi, Cassandra people!
>
> We're looking at Cassandra as a possible replacement for some parts of
> our database structures, and on an early look I'm a bit confused about
> atomicity guarantees and rollbacks and such, so I wanted to ask what
> standard practice is for dealing with the sorts of situation I outline
> below.
>
> Imagine that we're storing information about files. Each file has a path
> and a uuid, and sometimes we need to look up stuff about a file by its
> path and sometimes by its uuid. The best way to do this, as I understand
> it, is to store the data in Cassandra twice: once indexed by nodeid and
> once by path. So, I have two ColumnFamilies, one indexed by uuid:
>
> {
>  "some-uuid-1": {
>    "path": "/a/b/c",
>    "size": 100000
>  },
>  "some-uuid-2" {
>    ...
>  },
>  ...
> }
>
> and one indexed by path
>
> {
>  "/a/b/c": {
>    "uuid": "some-uuid-1",
>    "size": 100000
>  },
>  "/d/e/f" {
>    ...
>  },
>  ...
> }
>
> So, first, do please correct me if I've misunderstood the terminology
> here (and I've shown a "short form" of ColumnFamily here, as per
> http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model).
>
> The thing I don't quite get is: what happens when I want to add a new
> file? I need to add it to both these ColumnFamilies, but there's no "add
> it to both" atomic operation. What's the way that people handle the
> situation where I add to the first CF and then my program crashes, so I
> never added to the second? (Assume that there is lots more data than
> I've outlined above, so that "put it all in one SuperColumnFamily,
> because that can be updated atomically" won't work because it would end
> up with our entire database in one SCF). Should we add to one, and then
> if we fail to add to the other for some reason continually retry until
> it works? Have a "garbage collection" procedure which finds
> discrepancies between indexes like this and fixes them up and run it
> from cron? We'd love to hear some advice on how to do this, or if we're
> modelling the data in the wrong way and there's a better way which
> avoids these problems!
>
> sil
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Understanding atomicity in Cassandra

Reply via email to