Public bug reported: We have 2 DHCP servers per network. After network outages, and when hosts come back online, the number of ACTIVE DHCP servers grow. This happened again after more outages, with some networks having up to 9-10+ DHCP ports, many in ACTIVE state, despite neutron-server's neutron.conf only having dhcp_agents_per_network = 2
It turns out these are "reserved_dhcp_port" as indicated by the device_id. As you can see here: https://github.com/openstack/neutron/blob/master/neutron/db/agentschedulers_db.py#L399 When a network is rescheduled to a new DHCP agent, the old port is not deleted, not is its status marked as DOWN. All that is done is it is marked as reserved and the port updated. However VMs on the network now get advertised all the DHCP ports on the network as internal DNS servers, several stale entries in /etc/resolv.conf in our case. Problem is some of these DHCP agents have been unscheduled so the DNS servers don't actually exist. Also in the VMs, more than 3 entries are not queried. As you can see here, is resolv.conf on a VM: [root@arjunpmk-master ~]# vim /etc/resolv.conf # Generated by NetworkManager search mpt1.pf9.io nameserver 10.128.144.16 nameserver 10.128.144.23 nameserver 10.128.144.15 # NOTE: the libc resolver may not support more than 3 nameservers. # The nameservers listed below may not be recognized. nameserver 10.128.144.7 nameserver 10.128.144.4 nameserver 10.128.144.8 nameserver 10.128.144.9 nameserver 10.128.144.17 nameserver 10.128.144.12 nameserver 10.128.144.45 nameserver 10.128.144.46 nameserver 10.128.144.51 Here you can see all the DHCP ports for the network of this VM: [root@df-us-mpt1-kvm arjun(admin)]# openstack port list --network ead88ed3-f1e0-4498-8c1e-6d091083ae33 --device-owner network:dhcp +--------------------------------------+------+-------------------+------------------------------------------------------------------------------+--------+ | ID | Name | MAC Address | Fixed IP Addresses | Status | +--------------------------------------+------+-------------------+------------------------------------------------------------------------------+--------+ | 02ff0f4c-f39d-4207-90b4-2a69585f4c8a | | fa:16:3e:a9:36:82 | ip_address='10.128.144.16', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE | | 0b612f86-ad06-4bce-a333-bc18f3e9e7b1 | | fa:16:3e:bb:d8:3d | ip_address='10.128.144.23', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | DOWN | | 402338ac-2ca6-4312-a2df-a306fc589f10 | | fa:16:3e:a3:a8:57 | ip_address='10.128.144.15', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE | | 5d2edc73-4eff-44c0-8993-125636973384 | | fa:16:3e:6c:cd:2b | ip_address='10.128.144.7', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE | | 78241da3-9674-479a-8b45-a580c7f8b117 | | fa:16:3e:d0:9d:ef | ip_address='10.128.144.4', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE | | 7b41bf47-d4d4-434a-b704-4c67182ffcaa | | fa:16:3e:4c:cf:54 | ip_address='10.128.144.8', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE | | 96897190-1aa8-4c17-a7d1-c3744f1bf962 | | fa:16:3e:e8:55:29 | ip_address='10.128.144.45', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE | | af87dde6-fb46-4516-9569-e46496398b64 | | fa:16:3e:0e:61:14 | ip_address='10.128.144.9', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE | | c2a2112d-c6ef-4411-a415-1a453d74a838 | | fa:16:3e:d0:39:67 | ip_address='10.128.144.46', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | DOWN | | c8298fbd-06e7-4488-a3e1-874e9341d4cf | | fa:16:3e:d6:3c:ac | ip_address='10.128.144.51', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | DOWN | | d6f0206f-ae3c-4ebf-95cb-104dad786724 | | fa:16:3e:ab:ab:22 | ip_address='10.128.144.17', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE | | e2be0f98-3333-4645-b58a-435e5513a4d3 | | fa:16:3e:b4:ba:c0 | ip_address='10.128.144.12', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | DOWN | +--------------------------------------+------+-------------------+------------------------------------------------------------------------------+--------+ If I view the first DNS server for the VM's resolv.conf (10.128.144.16), you can see its status is ACTIVE but its actually a reserved port. This is the same case for 2nd nameserver entry. Luckily the 3rd entry is valid, but this causes timeouts and all DNS lookups to take 10 seconds since first two fail. VMs on other networks aren't so lucky, where all 3 nameservers are reserved. Expectation: Only DHCP ports that are actually scheduled (not reserved) should be advertised as DNS nameservers. I don't know if this means marking the port as DOWN, or deleting the port when unscheduled. maybe status needs to also be updated here? https://github.com/openstack/neutron/blob/master/neutron/db/agentschedulers_db.py#L417 ** Affects: neutron Importance: Undecided Status: New ** Tags: dns -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1852504 Title: DHCP reserved ports that were unscheduled are advertised as DNS servers Status in neutron: New Bug description: We have 2 DHCP servers per network. After network outages, and when hosts come back online, the number of ACTIVE DHCP servers grow. This happened again after more outages, with some networks having up to 9-10+ DHCP ports, many in ACTIVE state, despite neutron-server's neutron.conf only having dhcp_agents_per_network = 2 It turns out these are "reserved_dhcp_port" as indicated by the device_id. As you can see here: https://github.com/openstack/neutron/blob/master/neutron/db/agentschedulers_db.py#L399 When a network is rescheduled to a new DHCP agent, the old port is not deleted, not is its status marked as DOWN. All that is done is it is marked as reserved and the port updated. However VMs on the network now get advertised all the DHCP ports on the network as internal DNS servers, several stale entries in /etc/resolv.conf in our case. Problem is some of these DHCP agents have been unscheduled so the DNS servers don't actually exist. Also in the VMs, more than 3 entries are not queried. As you can see here, is resolv.conf on a VM: [root@arjunpmk-master ~]# vim /etc/resolv.conf # Generated by NetworkManager search mpt1.pf9.io nameserver 10.128.144.16 nameserver 10.128.144.23 nameserver 10.128.144.15 # NOTE: the libc resolver may not support more than 3 nameservers. # The nameservers listed below may not be recognized. nameserver 10.128.144.7 nameserver 10.128.144.4 nameserver 10.128.144.8 nameserver 10.128.144.9 nameserver 10.128.144.17 nameserver 10.128.144.12 nameserver 10.128.144.45 nameserver 10.128.144.46 nameserver 10.128.144.51 Here you can see all the DHCP ports for the network of this VM: [root@df-us-mpt1-kvm arjun(admin)]# openstack port list --network ead88ed3-f1e0-4498-8c1e-6d091083ae33 --device-owner network:dhcp +--------------------------------------+------+-------------------+------------------------------------------------------------------------------+--------+ | ID | Name | MAC Address | Fixed IP Addresses | Status | +--------------------------------------+------+-------------------+------------------------------------------------------------------------------+--------+ | 02ff0f4c-f39d-4207-90b4-2a69585f4c8a | | fa:16:3e:a9:36:82 | ip_address='10.128.144.16', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE | | 0b612f86-ad06-4bce-a333-bc18f3e9e7b1 | | fa:16:3e:bb:d8:3d | ip_address='10.128.144.23', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | DOWN | | 402338ac-2ca6-4312-a2df-a306fc589f10 | | fa:16:3e:a3:a8:57 | ip_address='10.128.144.15', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE | | 5d2edc73-4eff-44c0-8993-125636973384 | | fa:16:3e:6c:cd:2b | ip_address='10.128.144.7', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE | | 78241da3-9674-479a-8b45-a580c7f8b117 | | fa:16:3e:d0:9d:ef | ip_address='10.128.144.4', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE | | 7b41bf47-d4d4-434a-b704-4c67182ffcaa | | fa:16:3e:4c:cf:54 | ip_address='10.128.144.8', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE | | 96897190-1aa8-4c17-a7d1-c3744f1bf962 | | fa:16:3e:e8:55:29 | ip_address='10.128.144.45', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE | | af87dde6-fb46-4516-9569-e46496398b64 | | fa:16:3e:0e:61:14 | ip_address='10.128.144.9', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE | | c2a2112d-c6ef-4411-a415-1a453d74a838 | | fa:16:3e:d0:39:67 | ip_address='10.128.144.46', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | DOWN | | c8298fbd-06e7-4488-a3e1-874e9341d4cf | | fa:16:3e:d6:3c:ac | ip_address='10.128.144.51', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | DOWN | | d6f0206f-ae3c-4ebf-95cb-104dad786724 | | fa:16:3e:ab:ab:22 | ip_address='10.128.144.17', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE | | e2be0f98-3333-4645-b58a-435e5513a4d3 | | fa:16:3e:b4:ba:c0 | ip_address='10.128.144.12', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | DOWN | +--------------------------------------+------+-------------------+------------------------------------------------------------------------------+--------+ If I view the first DNS server for the VM's resolv.conf (10.128.144.16), you can see its status is ACTIVE but its actually a reserved port. This is the same case for 2nd nameserver entry. Luckily the 3rd entry is valid, but this causes timeouts and all DNS lookups to take 10 seconds since first two fail. VMs on other networks aren't so lucky, where all 3 nameservers are reserved. Expectation: Only DHCP ports that are actually scheduled (not reserved) should be advertised as DNS nameservers. I don't know if this means marking the port as DOWN, or deleting the port when unscheduled. maybe status needs to also be updated here? https://github.com/openstack/neutron/blob/master/neutron/db/agentschedulers_db.py#L417 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1852504/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

