    In a single thread, using a single database session, then a read after
    successful commit is guaranteed to read a version of the database
    that existed after that commit.

Ah, I'm relieved to hear this clarification - thanks.

    I'd like to see actual examples where that will matter. Meanwhile making
    all selects wait for the cluster will basically just ruin responsiveness
    and waste tons of time, so we should be careful to think this through
    before making any blanket policy.

Matthew's example earlier in the thread is simply a user issuing two
related commands in succession:

$ nova aggregate-create
$ nova aggregate-details

Once that fails a few times, the user will put a poorly commented "sleep
2" in between the two statements, and this will "fix" the problem most
of the time.  A "better" fix would repeat the aggregate-details query
multiple times until it looks like it has found the previous create.

Now, that sleep or poll is of course a poor version of something you
could do at a lower level, by waiting for reads+writes to propagate to a
majority quorum.

    I'd also like to see consideration given to systems that handle
    distributed consistency in a more active manner. etcd and Zookeeper are
    both such systems, and might serve as efficient guards for critical
    sections without raising latency.

+1 for moving to such systems.  Then we can have a repeat of the above
conversation without the added complications of SQL semantics ;)

So just an fyi: exists.


It has a locking api that it provides (that plugs into the various backends); there is also a WIP driver that is being worked for etc.d.


