Public bug reported: Bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=1982981
When a new network and the first subnet are created, the DHCP agent is updated. The agent scheduler increases the DHCP agent register "load" [1] field that will be used to schedule new networks into the same agent. If multiple concurrent networks (and the first subnet) are created, the agent "load" will be modified concurrently. The DB guarantees that only one transaction can increase the agent "load" parameter at once; the other transactions will fail and retried again. E.g.: https://paste.opendev.org/show/807984/ NOTE: when I say network and the first subnet is because that will trigger the spawn of a new dnsmasq process. This is the event that increases +1 the "load" value. Any other new subnet added to this network will modify the dnsmasq config but won't increase the "load" value. As commented in the "BaseResourceFilter.bind" method [2], "the resource being bound might or might not be of the same type which is accounted for the load. It isn't a problem because "+ 1" here does not meant to predict precisely what the load of the agent will be. The value will be corrected by the agent on the next report interval." In other words, when the DHCP agent reports the status, accurately updates the number of resources (networks) that is handling. This bug proposes to catch the DB errors in "BaseResourceFilter.bind" method [2] to avoid the DB retry action. That is unnecessary because the DHCP agent, as commented, will update the "load" value. By avoiding this retry, we avoid unnecessary Neutron server and DB operations and command delays (for example when creating a subnet). [1]https://github.com/openstack/neutron/blob/0ccfed0ae13182f820e6a8c11a2fa801506f3a3a/neutron/db/models/agent.py#L55 [2]https://github.com/openstack/neutron/blob/0ccfed0ae13182f820e6a8c11a2fa801506f3a3a/neutron/scheduler/base_resource_filter.py#L35-L39 ** Affects: neutron Importance: Undecided Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez) Status: New ** Changed in: neutron Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez) ** Description changed: + Bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=1982981 + When a new network and the first subnet are created, the DHCP agent is updated. The agent scheduler increases the DHCP agent register "load" [1] field that will be used to schedule new networks into the same agent. If multiple concurrent networks (and the first subnet) are created, the agent "load" will be modified concurrently. The DB guarantees that only one transaction can increase the agent "load" parameter at once; the other transactions will fail and retried again. E.g.: https://paste.opendev.org/show/807984/ NOTE: when I say network and the first subnet is because that will trigger the spawn of a new dnsmasq process. This is the event that increases +1 the "load" value. Any other new subnet added to this network will modify the dnsmasq config but won't increase the "load" value. As commented in the "BaseResourceFilter.bind" method [2], "the resource being bound might or might not be of the same type which is accounted for the load. It isn't a problem because "+ 1" here does not meant to predict precisely what the load of the agent will be. The value will be corrected by the agent on the next report interval." In other words, when the DHCP agent reports the status, accurately updates the number of resources (networks) that is handling. This bug proposes to catch the DB errors in "BaseResourceFilter.bind" method [2] to avoid the DB retry action. That is unnecessary because the DHCP agent, as commented, will update the "load" value. By avoiding this retry, we avoid unnecessary Neutron server and DB operations and command delays (for example when creating a subnet). [1]https://github.com/openstack/neutron/blob/0ccfed0ae13182f820e6a8c11a2fa801506f3a3a/neutron/db/models/agent.py#L55 [2]https://github.com/openstack/neutron/blob/0ccfed0ae13182f820e6a8c11a2fa801506f3a3a/neutron/scheduler/base_resource_filter.py#L35-L39 -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1939432 Title: Concurrent DHCP agent updates can result in a DB lock Status in neutron: New Bug description: Bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=1982981 When a new network and the first subnet are created, the DHCP agent is updated. The agent scheduler increases the DHCP agent register "load" [1] field that will be used to schedule new networks into the same agent. If multiple concurrent networks (and the first subnet) are created, the agent "load" will be modified concurrently. The DB guarantees that only one transaction can increase the agent "load" parameter at once; the other transactions will fail and retried again. E.g.: https://paste.opendev.org/show/807984/ NOTE: when I say network and the first subnet is because that will trigger the spawn of a new dnsmasq process. This is the event that increases +1 the "load" value. Any other new subnet added to this network will modify the dnsmasq config but won't increase the "load" value. As commented in the "BaseResourceFilter.bind" method [2], "the resource being bound might or might not be of the same type which is accounted for the load. It isn't a problem because "+ 1" here does not meant to predict precisely what the load of the agent will be. The value will be corrected by the agent on the next report interval." In other words, when the DHCP agent reports the status, accurately updates the number of resources (networks) that is handling. This bug proposes to catch the DB errors in "BaseResourceFilter.bind" method [2] to avoid the DB retry action. That is unnecessary because the DHCP agent, as commented, will update the "load" value. By avoiding this retry, we avoid unnecessary Neutron server and DB operations and command delays (for example when creating a subnet). [1]https://github.com/openstack/neutron/blob/0ccfed0ae13182f820e6a8c11a2fa801506f3a3a/neutron/db/models/agent.py#L55 [2]https://github.com/openstack/neutron/blob/0ccfed0ae13182f820e6a8c11a2fa801506f3a3a/neutron/scheduler/base_resource_filter.py#L35-L39 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1939432/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

