We've had a couple of brief discussions during the OVN meeting about locks in OVSDB. As I understand it, a few services use OVSDB locks to avoid duplicating work. The question is whether and how to extend OVSDB locks to a distributed context.
First, I think it's worth reviewing how OVSDB locks work, filling in some of the implications that aren't covered by RFC 7047. OVSDB locks are server-level (not database-level) objects that can be owned by at most one client at a time. Clients can obtain them either through a "lock" operation, in which case they get queued to obtain the lock when it's no longer owned by anyone else, or through a "steal" operation that always succeeds immediately, kicking out whoever (if anyone) previously owned the lock. A client loses a lock whenever it releases it with an "unlock" operation or whenever its connection to the server drops. The server notifies a client whenever it acquires a lock or whenever it is stolen by another client. This scheme works perfectly for one particular scenario: where the resource protected by the lock is an OVSDB database (or part of one) on the same server as the lock. This is because OVSDB transactions include an "assert" operation that names a lock and aborts the transaction if the client does not hold the lock. Since the server is both the lock manager and the implementer of the transaction, it can always make the correct decision. This scenario could be extended to distributed locks with the same guarantee. Another scenario that could work acceptably with distributed OVSDB locks is one where the lock guards against duplicated work. For example, suppose a couple of ovn-northd instances both try to grab a lock, with only the winner actually running, to avoid having both of them spend a lot of CPU time recomputing the southbound flow table. A distributed version of OVSDB locks would probably work fine in practice for this, although occasionally due to network propagation delays, "steal" operations, or different ideas between client and server of when a session has dropped, both ovn-northd might think they have the lock. (If, however, they combined this with "assert" when they actually committed their changes to the southbound database, then they would never actually interfere with each other in database commits.) A scenario that would not work acceptably with distributed OVSDB locks, without a change to the model, is where the lock ensures correctness, that is, if two clients both think they have the lock then bad things happen. I believe that this requires clients to understand a concept of leases, which OVSDB doesn't currently have. The "steal" operation is also problematic in this model since it would require canceling a lease. (This scenario also does not work acceptably with single-server OVSDB locks.) I'd appreciate anyone's thoughts on the topic. This webpage is good reading: https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html Thanks, Ben. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev