On Thu, Oct 29, 2009 at 01:51:57PM -0400, Damien Katz wrote: >> Is this a sensible API? You decide. I've given my opinion previously. > > > This api seems weird, but it's the closest thing we can have to multi- > document transactions in CouchDB and be a distributed, partitioned > database. This is because it's pretty much impossible to support all- > or-nothing conflict checking transactions with partitioned database > without some sort of double-lock checking, which is slow and expensive.
I don't want to prevent conflicts, nor do I want transactions. As you say, introducing conflicting revisions is a fact of life in a distributed-master system. However, I believe that CouchDB's API actively discourages people from writing apps which deal with conflicts properly, by (a) hiding them, and (b) making resolve-on-read a multi-step process (e.g. readA, readB, readC, writeA, deleteB, deleteC) which itself is race-prone and may lead to more conflicts and odd intermediate states (*) What I would like to see is the following. 1. When you request document X, you get *all* conflicting revisions in one go. That is, they are treated as equal peers; none is promoted to winner. (However, the list can be sorted in a deterministic order, so you could get the current behaviour by just picking the first revision from the list) 2. When you perform this request, you get a single "context" tag which identifies this particular *set* of revisions. 3. When you write back the new document, you supply the context tag, and this simultaneously supercedes all the other documents. Effectively this would be like the _rev you use today, but it would refer to the set. It could actually just be an array of _revs, but the user should treat it as an opaque tag. 4. Views get to see the whole set of revisions too. Again, if they want today's behaviour they can just use docs[0] and ignore the others; but if they want to resolve conflicts they can too. 5. If two clients replace a document or set of conflicts with a new document, and the new documents are identical, then they are not treated as conflicts. When reading papers on systems like Dynamo, they all seem to have properties (1)-(3). That is: it's treated as natural that conflicts should arise; that these are fully exposed to the client; and the client is given the opportunity to resolve them in a single step. > If you want an easier API for saving documents into a conflicted state > (something like ?conflict=ok), that would be a fairly easy patch to > make. But I'm not sure why users would want that for a single document. I think that ultimately the 409 behaviour could be dropped if conflicts were handled as above, but that's not my number one concern. My concern is this: * Someone writes an application * They use the "obvious" API: i.e. simple GET and PUT for reading and updating documents. They code to the 409 for avoiding conflicts. It all works fine and they are delighted with couchdb. * They switch to multi-master * All hell breaks lose. Users see their docs vanishing. Application writer finally works out how to do conflict management properly, and has to rewrite the app entirely so that (for example) one GET becomes a GET with ?conflicts=true, followed by multiple GETs for the additional versions, followed by conflict resolution followed by a POST to _bulk_docs to replace the original document and conflicts. * Application writer curses couchdb, and curses the person who wrote "Most applications require no special planning to take advantage of distributed updates and replication". What I propose above could be introduced incrementally with suitable flags, but care would be needed to do this everywhere [e.g. not just simple GET but also multi-key fetches]. Of course there's a lot of detail that would need to be worked through. Yes, I know patches are welcome. The reason I'm not contributing code for this right now is that I have higher priorities - I'm happy to keep my app 409-tied while I work on other things. But at the back of my mind, I know that I won't be going multi-master for a long time, if ever. Regards, Brian. (*) Yes, I know that *with care* you can do the writes and deletes together as a single _bulk_docs operation, and even bind them together using "all_or_nothing":true. But this is not obvious. And there are still races. For example, I'm not sure that you can use a multi-key fetch for getting all the conflicting revisions in one hit, so you have a series of GETs, and you may find that the revs you're GETting have vanished by the time you read them.
