On 5/1/21 2:55 AM, Ilya Maximets wrote: > Replication can be used to scale out read-only access to the database. > But there are clients that are not read-only, but read-mostly. > One of the main examples is ovn-controller that mostly monitors > updates from the Southbound DB, but needs to claim ports by sending > transactions that changes some database tables. > > Southbound database serves lots of connections: all connections > from ovn-controllers and some service connections from cloud > infrastructure, e.g. some OpenStack agents are monitoring updates. > At a high scale and with a big size of the database ovsdb-server > spends too much time processing monitor updates and it's required > to move this load somewhere else. This patch-set aims to introduce > required functionality to scale out read-mostly connections by > replication. > > Replication mode natively supports replication of standalone and > clustered databases, so it will work for any type of OVN deployment. > > There are 3 missing parts for existing replication mode: > > 1. Ability to handle transactions that aims to modify the data. > Obviously, replica is not allowed to execute this kind of > transactions. Solution is to implement transaction forwarding, > i.e. allow replication server to act as a proxy by forwarding > transactions to the primary server and forwarding replies back > to the client. All read-only transactions and monitors are > still fully served by the replica itself. > > 2. In case where replica replicates a member of a raft cluster, > client needs to know the state of this cluster member in order > to make a decision about re-connection to another server. > This is solved by replicating a Database table of _Server database > from the replication source, so clients are able to check the > clustered database state as usual. > > ** Another solution for this problem is to allow the replication > server itself to have multiple remotes and re-connect as client > will do. However, this would be a significant behavioral change > for the current implementation of the active-backup schema where > backup stays connected no matter what. This will also require > a huge rewrite of the replication state machine and will likely > bring lots of code duplication with ovsdb-cs module. We might > end up re-writing replication code on top of ovsdb-cs (which > might be a good thing, though) and refactoring ovsdb-cs itself, > but that would be much more work. > > 3. Client will need to know if replica is currently connected > to the replication source. For example, for the case where one > of the replicas lost connection with the primary server, client > should be able to re-connect to another replica. > This is implemented by reflecting the connection state in the > 'connected' field of the row in Database table in _Server database. > Currently for active-backup it's always set to 'true'. > > This patch set consists of 4 parts: > > Patch #1 - Implementation of a transaction forwarding. Fully > independent from the rest of the series and it's the only > mandatory change for a 2-Tire deployment. The rest of the > set is to propagate status fields and have correct failover > on a client side. > > Patches #2-5 - Solution for the missing part #2: Replication of a > _Server database and handling on a client side. > > Patch #6 - Solution for the problem #3. > > Patch #7 - Slightly unrelated fix. Bringing one missing re-connection > fix from C version to python IDL. Mostly to add more > tests. > > Note: in order to replicate a clustered Sb DB, ephemeral columns from > the ovn-sb schema should be manually converted to persistent ones before > creating a database file for the replica, otherwise there will be schema > mismatch and replication will fail. >
Hi Ilya, I had a look at the series and the changes look good to me and I acked most of the patches. However, I don't feel confident enough on the ovsdb-server side, so I hope other reviewers will share their opinions on this feature before it's accepted. Regards, Dumitru _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
