Reviewed: https://review.opendev.org/c/openstack/neutron/+/804218 Committed: https://opendev.org/openstack/neutron/commit/668b1cc652f076e555ef1fc1289684367159186a Submitter: "Zuul (22348)" Branch: master
commit 668b1cc652f076e555ef1fc1289684367159186a Author: Rodolfo Alonso Hernandez <[email protected]> Date: Wed Aug 11 09:13:55 2021 +0000 Do not fail if the agent load is not bumped When a new network and its first subnet is created, the DHCP agent bumps the "load" parameter to reflect the number of networks handled. This "load" parameter is modified when: - As commented, when the first subnet of a network is created. The "load" value is bumped. - When periodically the DHCP agent sends the status, informing about the current number of networks handled. If during the subnet creation this "load" value is not updated, it will be in the next periodic update of the agent. This "load" value is used by the scheduler to equally distribute the objects to be managed by any agent type (DHCP agents manage networks). The bug refers to DHCP but is valid for any other agent. Change-Id: Ief402048d99d40b64d81fcf58eb2e39b1ba7ebbb Closes-Bug: #1939432 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1939432 Title: Concurrent DHCP agent updates can result in a DB lock Status in neutron: Fix Released Bug description: Bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=1982981 When a new network and the first subnet are created, the DHCP agent is updated. The agent scheduler increases the DHCP agent register "load" [1] field that will be used to schedule new networks into the same agent. If multiple concurrent networks (and the first subnet) are created, the agent "load" will be modified concurrently. The DB guarantees that only one transaction can increase the agent "load" parameter at once; the other transactions will fail and retried again. E.g.: https://paste.opendev.org/show/807984/ NOTE: when I say network and the first subnet is because that will trigger the spawn of a new dnsmasq process. This is the event that increases +1 the "load" value. Any other new subnet added to this network will modify the dnsmasq config but won't increase the "load" value. As commented in the "BaseResourceFilter.bind" method [2], "the resource being bound might or might not be of the same type which is accounted for the load. It isn't a problem because "+ 1" here does not meant to predict precisely what the load of the agent will be. The value will be corrected by the agent on the next report interval." In other words, when the DHCP agent reports the status, accurately updates the number of resources (networks) that is handling. This bug proposes to catch the DB errors in "BaseResourceFilter.bind" method [2] to avoid the DB retry action. That is unnecessary because the DHCP agent, as commented, will update the "load" value. By avoiding this retry, we avoid unnecessary Neutron server and DB operations and command delays (for example when creating a subnet). [1]https://github.com/openstack/neutron/blob/0ccfed0ae13182f820e6a8c11a2fa801506f3a3a/neutron/db/models/agent.py#L55 [2]https://github.com/openstack/neutron/blob/0ccfed0ae13182f820e6a8c11a2fa801506f3a3a/neutron/scheduler/base_resource_filter.py#L35-L39 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1939432/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

