Reviewed: https://review.opendev.org/704530 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=647b7f63f9dafedfa9fb6e09e3d92d66fb512f0b Submitter: Zuul Branch: master
commit 647b7f63f9dafedfa9fb6e09e3d92d66fb512f0b Author: Lucas Alvares Gomes <[email protected]> Date: Tue Jan 28 10:46:35 2020 +0000 [OVN] Add an interval between agents health checks This patch adds a minimum interval between each agent health checks. The way OVN checks for the agents liveness is by increasing a value in the NB DB and waiting for it to be propagated to the SB DB but, this can be costy if done many times too quickly. Therefore, a minimum interval between each check is being added. Closes-Bug: #1861092 Change-Id: If1f2d97e3a3a17f6744d546b3e8903bde55e83b9 Signed-off-by: Lucas Alvares Gomes <[email protected]> ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1861092 Title: [OVN] Too frequent agent health-checks causes stress on ovsdb-server Status in neutron: Fix Released Bug description: Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1795198 Looks like neutron-server is pinging agents too frequently as per what's observed in the logs. nb-cfg being bumped at a non-fixed rate: For example, in this part of the log I could find 11 updates in less than 2 minutes: 2020-01-27 12:23:04.247 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49008, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49007) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44 2020-01-27 12:23:05.179 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49009, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49008) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44 2020-01-27 12:23:32.216 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49010, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49009) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44 2020-01-27 12:23:41.248 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49011, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49010) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44 2020-01-27 12:23:42.183 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49012, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49011) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44 2020-01-27 12:24:09.210 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49013, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49012) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44 2020-01-27 12:24:18.252 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49014, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49013) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44 2020-01-27 12:24:19.179 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49015, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49014) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44 2020-01-27 12:24:46.205 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49016, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49015) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44 2020-01-27 12:24:55.254 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49017, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49016) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44 2020-01-27 12:24:56.177 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49018, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49017) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44 This is triggering too frequent writes from *all* metadata-agents and ovn-controllers in the cloud which creates a lot of traffic. At scale, this can be a problem. Imagine a 500 node deployment, with one update per 10 seconds as in the example above. That will translate into 1K (1 metadata agent + 1 ovn-controller per node) write transactions into the SB database every 10 seconds so 100 transactions per second that trigger a JSON RPC command update to every single client into the cloud. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1861092/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

