On 12/11/20 9:54 PM, Ilya Maximets wrote:
> Currently, ovsdb-server stores complete value for the column in a database
> file and in a raft log in case this column changed.  This means that
> transaction that adds, for example, one new acl to a port group creates
> a log entry with all UUIDs of all existing acls + one new.  Same for
> ports in logical switches and routers and more other columns with sets
> in Northbound DB.
> 
> There could be thousands of acls in one port group or thousands of ports
> in a single logical switch.  And the typical use case is to add one new
> if we're starting a new service/VM/container or adding one new node in a
> kubernetes or OpenStack cluster.  This generates huge amount of traffic
> within ovsdb raft cluster, grows overall memory consumption and hurts
> performance since all these UUIDs are parsed and formatted to/from json
> several times and stored on disks.  And more values we have in a set -
> more space a single log entry will occupy and more time it will take to
> process by ovsdb-server cluster members.
> 
> Simple test:
> 
> 1. Start OVN sandbox with clustered DBs:
>    # make sandbox SANDBOXFLAGS='--nbdb-model=clustered --sbdb-model=clustered'
> 
> 2. Run a script that creates one port group and adds 4000 acls into it:
>    # cat ../memory-test.sh
>    pg_name=my_port_group
>    export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach --log-file 
> -vsocket_util:off)
>    ovn-nbctl pg-add $pg_name
>    for i in $(seq 1 4000); do
>      echo "Iteration: $i"
>      ovn-nbctl --log acl-add $pg_name from-lport $i udp drop
>    done
>    ovn-nbctl acl-del $pg_name
>    ovn-nbctl pg-del $pg_name
>    ovs-appctl -t $(pwd)/sandbox/nb1 memory/show
>    ovn-appctl -t ovn-nbctl exit
>    ---
> 
> 4. Check the current memory consumption of ovsdb-server processes and
>    space occupied by database files:
>    # ls sandbox/[ns]b*.db -alh
>    # ps -eo vsz,rss,comm,cmd | egrep '=[ns]b[123].pid'
> 
> Test results with current ovsdb log format:
> 
>    On-disk Nb DB size     :  ~369 MB
>    RSS of Nb ovsdb-servers:  ~2.7 GB
>    Time to finish the test:  ~2m
> 
> In order to mitigate memory consumption issues and reduce computational
> load on ovsdb-servers let's store diff between old and new values
> instead.  This will make size of each log entry that adds single acl to
> port group (or port to logical switch or anything else like that) very
> small and independent from the number of already existing acls (ports,
> etc.).
> 
> Added a new marker '_is_diff' into a file transaction to specify that
> this transaction contains diffs instead of replacements for the existing
> data.
> 
> One side effect is that this change will actually increase the size of
> file transaction that removes more than a half of entries from the set,
> because diff will be larger than the resulted new value.  However, such
> operations are rare.
> 
> Test results with change applied:
> 
>    On-disk Nb DB size     :  ~2.7 MB  ---> reduced by 99%
>    RSS of Nb ovsdb-servers:  ~580 MB  ---> reduced by 78%
>    Time to finish the test:  ~1m27s   ---> reduced by 27%
> 
> After this change new ovsdb-server is still able to read old databases,
> but old ovsdb-server will not be able to read new ones.
> Since new servers could join ovsdb cluster dynamically it's hard to
> implement any runtime mechanism to handle cases where different
> versions of ovsdb-server joins the cluster.  However we still need to
> handle cluster upgrades.  For this case added special command line
> argument to disable new functionality.  Documentation updated with the
> recommended way to upgrade the ovsdb cluster.
> 
> Signed-off-by: Ilya Maximets <[email protected]>
> ---

Hi Ilya,

As mentioned by Ben during the IRC meeting earlier, regarding
upgrade/downgrade, a more dynamic approach of having cluster members
agree on the database format would be ideal.

Nevertheless, I'm not knowledgeable enough to suggest an alternative
that would work like that and I think the upgrade recommendation you
added in the documentation is fine.

FWIW, the rest of the code looks good to me:

Acked-by: Dumitru Ceara <[email protected]>

Regards,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to