Reviewed: https://review.opendev.org/c/openstack/neutron/+/858542 Committed: https://opendev.org/openstack/neutron/commit/819a1bb3e6f3b10a1887e2ef836c138e02f8b996 Submitter: "Zuul (22348)" Branch: master
commit 819a1bb3e6f3b10a1887e2ef836c138e02f8b996 Author: Rodolfo Alonso Hernandez <[email protected]> Date: Tue Sep 20 13:32:04 2022 +0200 Move the "ovn_hash_ring" clean up to maintenance worker The "ovn_hash_ring" procedure to clean up the stale/old registers is now executed on the ``HashRingHealthCheckPeriodics`` class, tha is executed on the ``MaintenanceWorker`` process. In a HA scenario, if several servers are rebooted at the same time, the "ovn_hash_ring" clean up operation can clash with API worker method "_load_hash_ring", that executed a SQL read from this table. In some high loaded environments, if the OVN database takes time to be locally cached, this read operation is executed thousand of times; basically any time an OVN database event occurs. In order to avoid/skip a deadlock when deleting the "ovn_hash_ring" table, this clean up is executed in a periodic task. If this task succeeds, the task is stopped. If the task raises a database exception, it is processed again. Now the "ovn_hash_ring" registers are retrieved using the "created_at" time as a filter. The initial time is taken when the OVN mechanism driver is initilized, before any API worker is spawned and any new "ovn_hash_ring" register has been created (an API worker, when started, will create a new "ovn_hash_ring" register). Any stale/old register stored in this table will be ignored; that means any register created before the OVN mechanism driver was started. Closes-Bug: #1990174 Change-Id: I07c4cb6e20b8a84e4ace7a8e34555aced5b5da9f ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1990174 Title: [OVN] Deadlock when starting neutron server, during the OVN hash ring deletion Status in neutron: Fix Released Bug description: Related bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2125842 Description of problem: Neutron server often fails to start and systemd needs to restart it. This is a problem at scale because all workers need to reconnect again to the OVN DBs. How reproducible: 50% Steps to Reproduce: 1. Start neutron server Error log: https://paste.opendev.org/show/bm3jZZ1oWX7ihK8JXzdE/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1990174/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

