typically you will update both as part of a batch_mutate, and if it fails, retry the operation. re-writing any part that succeeded will be harmless.
On Thu, Jul 8, 2010 at 11:13 AM, Stuart Langridge <[email protected]> wrote: > Hi, Cassandra people! > > We're looking at Cassandra as a possible replacement for some parts of > our database structures, and on an early look I'm a bit confused about > atomicity guarantees and rollbacks and such, so I wanted to ask what > standard practice is for dealing with the sorts of situation I outline > below. > > Imagine that we're storing information about files. Each file has a path > and a uuid, and sometimes we need to look up stuff about a file by its > path and sometimes by its uuid. The best way to do this, as I understand > it, is to store the data in Cassandra twice: once indexed by nodeid and > once by path. So, I have two ColumnFamilies, one indexed by uuid: > > { > "some-uuid-1": { > "path": "/a/b/c", > "size": 100000 > }, > "some-uuid-2" { > ... > }, > ... > } > > and one indexed by path > > { > "/a/b/c": { > "uuid": "some-uuid-1", > "size": 100000 > }, > "/d/e/f" { > ... > }, > ... > } > > So, first, do please correct me if I've misunderstood the terminology > here (and I've shown a "short form" of ColumnFamily here, as per > http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model). > > The thing I don't quite get is: what happens when I want to add a new > file? I need to add it to both these ColumnFamilies, but there's no "add > it to both" atomic operation. What's the way that people handle the > situation where I add to the first CF and then my program crashes, so I > never added to the second? (Assume that there is lots more data than > I've outlined above, so that "put it all in one SuperColumnFamily, > because that can be updated atomically" won't work because it would end > up with our entire database in one SCF). Should we add to one, and then > if we fail to add to the other for some reason continually retry until > it works? Have a "garbage collection" procedure which finds > discrepancies between indexes like this and fixes them up and run it > from cron? We'd love to hear some advice on how to do this, or if we're > modelling the data in the wrong way and there's a better way which > avoids these problems! > > sil > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
