On 03/08/2018 02:54 PM, Daniel Alvarez Sanchez wrote:
I agree with you Mark. I tried to check how much it would shrink with 1800 ports in the system:

[stack@ovn ovs]$ sudo ovn-nbctl list Logical_Switch_Port | grep uuid | wc -l
1809
[stack@ovn ovs]$ sudo ovn-sbctl list Logical_Flow | grep uuid | wc -l
50780
[stack@ovn ovs]$ ls -alh ovn*.db
-rw-r--r--. 1 stack stack 15M Mar  8 15:56 ovnnb_db.db
-rw-r--r--. 1 stack stack 61M Mar  8 15:56 ovnsb_db.db
[stack@ovn ovs]$ sudo ovs-appctl -t /usr/local/var/run/openvswitch/ovnsb_db.ctl ovsdb-server/compact [stack@ovn ovs]$ sudo ovs-appctl -t /usr/local/var/run/openvswitch/ovnnb_db.ctl ovsdb-server/compact
[stack@ovn ovs]$ ls -alh ovn*.db
-rw-r--r--. 1 stack stack 5.8M Mar  8 20:45 ovnnb_db.db
-rw-r--r--. 1 stack stack  23M Mar  8 20:45 ovnsb_db.db

As you can see, with ~50K lflows, the database min size would be ~23M while the NB database is much smaller. Still I think we need to do something to not allow delay the compact task to kick in this much unnecessarily. Or maybe we want some sort of configuration (ie. normal, aggressive,...) for this since in some situations it may help to have the full log of the DB (although this can be achieved through periodic backups :?). That said, I'm not a big fan of such configs but...


I'm also not a big fan of that sort of configuration. Based on Ben's replies here, I like the idea of being more aggressive with the compacting. The two ideas proposed here, compact at double the size instead of 4x and ensure a compact happens once every 24 hours, sound like good mitigations to me.



On Thu, Mar 8, 2018 at 9:31 PM, Mark Michelson <[email protected] <mailto:[email protected]>> wrote:

    Most of the data in this thread has been pretty easily explainable
    based on what I've seen in the code compared with the nature of the
    data in the southbound database.

    The southbound database tends to have more data in it than other
    databases in OVS, due especially to the Logical_Flow table. The
    result is that auto shrinking of the database does not shrink it
    down by as much as other databases. You can see in Daniel's graphs
    that each time the southbound database is shrunk, its "base" size
    ends up noticeably larger than it previously was.

    Couple that with the fact that the database has to increase to 4x
    its previous snapshot size in order to be shrunk, and you can end up
    with a situation after a while where the "shrunk" southbound
    database is 750MB, and it won't shrink again until it exceeds 3GB.

    To fix this, I think there are a few things that can be done:

    * Somehow make the southbound database have less data in it. I don't
    have any real good ideas for how to do this, and doing this in a
    backwards-compatible way will be difficult.

    * Ease the requirements for shrinking a database. For instance, once
    the database reaches a certain size, maybe it doesn't need to grow
    by 4x in order to be a candidate for shrinking. Maybe it only needs
    to double in size. Or, there could be some time cutoff where the
    database always will be shrunk. So for instance, every hour, always
    shrink the database, no matter how much activity has occurred in it
    (okay, maybe not if there have been 0 transactions).


Maybe we can just do the the shrink if the last compact took place >24h ago regardless of the other conditions. I can send a patch for this if you guys like the idea. It's some sort of "cleanup task" just in case and seems harmless.
What do you say?



    On 03/07/2018 02:50 PM, Ben Pfaff wrote:

        OK.

        I guess we need to investigate this issue from the basics.

        On Wed, Mar 07, 2018 at 09:02:02PM +0100, Daniel Alvarez Sanchez
        wrote:

            With OVS 2.8 branch it never shrank when I started to delete
            the ports since
            the DB sizes didn't grow, which makes sense to me. The
            conditions weren't
            met for further compaction.
            See attached image.

            NB:
            
2018-03-07T18:25:49.269Z|00009|ovsdb_file|INFO|/opt/stack/data/ovs/ovnnb_db.db:
            compacting database online (647.317 seconds old, 436
            transactions, 10505382
            bytes)
            
2018-03-07T18:35:51.414Z|00012|ovsdb_file|INFO|/opt/stack/data/ovs/ovnnb_db.db:
            compacting database online (602.089 seconds old, 431
            transactions, 29551917
            bytes)
            
2018-03-07T18:45:52.263Z|00015|ovsdb_file|INFO|/opt/stack/data/ovs/ovnnb_db.db:
            compacting database online (600.563 seconds old, 463
            transactions, 52843231
            bytes)
            
2018-03-07T18:55:53.810Z|00016|ovsdb_file|INFO|/opt/stack/data/ovs/ovnnb_db.db:
            compacting database online (601.128 seconds old, 365
            transactions, 57618931
            bytes)


            SB:
            
2018-03-07T18:33:24.927Z|00009|ovsdb_file|INFO|/opt/stack/data/ovs/ovnsb_db.db:
            compacting database online (1102.840 seconds old, 775
            transactions,
            10505486 bytes)
            
2018-03-07T18:43:27.569Z|00012|ovsdb_file|INFO|/opt/stack/data/ovs/ovnsb_db.db:
            compacting database online (602.394 seconds old, 445
            transactions, 15293972
            bytes)
            
2018-03-07T18:53:31.664Z|00015|ovsdb_file|INFO|/opt/stack/data/ovs/ovnsb_db.db:
            compacting database online (603.605 seconds old, 385
            transactions, 19282371
            bytes)
            
2018-03-07T19:03:42.116Z|00031|ovsdb_file|INFO|/opt/stack/data/ovs/ovnsb_db.db:
            compacting database online (607.542 seconds old, 371
            transactions, 23538784
            bytes)




            On Wed, Mar 7, 2018 at 7:18 PM, Daniel Alvarez Sanchez
            <[email protected] <mailto:[email protected]>>
            wrote:

                No worries, I just triggered the test now running OVS
                compiled out of
                2.8 branch (2.8.3). I'll post the results and
                investigate too.

                I have just sent a patch to fix the timing issue we can
                see in the traces I
                posted. I applied it and it works, I believe it's good
                to fix as it gives
                us
                an idea of how frequent the compact is, and also to
                backport if you
                agree with it.

                Thanks!

                On Wed, Mar 7, 2018 at 7:13 PM, Ben Pfaff <[email protected]
                <mailto:[email protected]>> wrote:

                    OK, thanks.

                    If this is a lot of trouble, let me know and I'll
                    investigate directly
                    instead of on the basis of a suspected regression.

                    On Wed, Mar 07, 2018 at 07:06:50PM +0100, Daniel
                    Alvarez Sanchez wrote:

                        All right, I'll repeat it with code in branch-2.8.
                        Will post the results once the test finishes.
                        Daniel

                        On Wed, Mar 7, 2018 at 7:03 PM, Ben Pfaff
                        <[email protected] <mailto:[email protected]>> wrote:

                            On Wed, Mar 07, 2018 at 05:53:15PM +0100,
                            Daniel Alvarez Sanchez

                    wrote:

                                Repeated the test with 1000 ports this
                                time. See attached image.
                                For some reason, the sizes grow while
                                deleting the ports (the
                                deletion task starts at around x=2500).
                                The weird thing is why
                                they keep growing and the online compact
                                doesn't work as when
                                I do it through ovs-appctl tool.

                                I suspect this is a bug and eventually
                                it will grow and grow unless
                                we manually compact the db.


                            Would you mind trying out an older
                            ovsdb-server, for example the one
                            from OVS 2.8?  Some of the logic in
                            ovsdb-server around compaction
                            changed in OVS 2.9, so it would be nice to
                            know whether this was a
                            regression or an existing bug.









_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to