On 12/11/20 9:54 PM, Ilya Maximets wrote: > Currently, ovsdb-server stores complete value for the column in a database > file and in a raft log in case this column changed. This means that > transaction that adds, for example, one new acl to a port group creates > a log entry with all UUIDs of all existing acls + one new. Same for > ports in logical switches and routers and more other columns with sets > in Northbound DB. > > There could be thousands of acls in one port group or thousands of ports > in a single logical switch. And the typical use case is to add one new > if we're starting a new service/VM/container or adding one new node in a > kubernetes or OpenStack cluster. This generates huge amount of traffic > within ovsdb raft cluster, grows overall memory consumption and hurts > performance since all these UUIDs are parsed and formatted to/from json > several times and stored on disks. And more values we have in a set - > more space a single log entry will occupy and more time it will take to > process by ovsdb-server cluster members. > > Simple test: > > 1. Start OVN sandbox with clustered DBs: > # make sandbox SANDBOXFLAGS='--nbdb-model=clustered --sbdb-model=clustered' > > 2. Run a script that creates one port group and adds 4000 acls into it: > # cat ../memory-test.sh > pg_name=my_port_group > export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach --log-file > -vsocket_util:off) > ovn-nbctl pg-add $pg_name > for i in $(seq 1 4000); do > echo "Iteration: $i" > ovn-nbctl --log acl-add $pg_name from-lport $i udp drop > done > ovn-nbctl acl-del $pg_name > ovn-nbctl pg-del $pg_name > ovs-appctl -t $(pwd)/sandbox/nb1 memory/show > ovn-appctl -t ovn-nbctl exit > --- > > 4. Check the current memory consumption of ovsdb-server processes and > space occupied by database files: > # ls sandbox/[ns]b*.db -alh > # ps -eo vsz,rss,comm,cmd | egrep '=[ns]b[123].pid' > > Test results with current ovsdb log format: > > On-disk Nb DB size : ~369 MB > RSS of Nb ovsdb-servers: ~2.7 GB > Time to finish the test: ~2m > > In order to mitigate memory consumption issues and reduce computational > load on ovsdb-servers let's store diff between old and new values > instead. This will make size of each log entry that adds single acl to > port group (or port to logical switch or anything else like that) very > small and independent from the number of already existing acls (ports, > etc.). > > Added a new marker '_is_diff' into a file transaction to specify that > this transaction contains diffs instead of replacements for the existing > data. > > One side effect is that this change will actually increase the size of > file transaction that removes more than a half of entries from the set, > because diff will be larger than the resulted new value. However, such > operations are rare. > > Test results with change applied: > > On-disk Nb DB size : ~2.7 MB ---> reduced by 99% > RSS of Nb ovsdb-servers: ~580 MB ---> reduced by 78% > Time to finish the test: ~1m27s ---> reduced by 27% > > After this change new ovsdb-server is still able to read old databases, > but old ovsdb-server will not be able to read new ones. > Since new servers could join ovsdb cluster dynamically it's hard to > implement any runtime mechanism to handle cases where different > versions of ovsdb-server joins the cluster. However we still need to > handle cluster upgrades. For this case added special command line > argument to disable new functionality. Documentation updated with the > recommended way to upgrade the ovsdb cluster. > > Signed-off-by: Ilya Maximets <[email protected]> > ---
Hi Ilya, As mentioned by Ben during the IRC meeting earlier, regarding upgrade/downgrade, a more dynamic approach of having cluster members agree on the database format would be ideal. Nevertheless, I'm not knowledgeable enough to suggest an alternative that would work like that and I think the upgrade recommendation you added in the documentation is fine. FWIW, the rest of the code looks good to me: Acked-by: Dumitru Ceara <[email protected]> Regards, Dumitru _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
