Public bug reported: Upon reload of the dhcp agent as port assignments are updated, the existing leases for dnsmasq are examined against the valid host entries to determine if any of the existing leases are invalid (no longer defined) and should be expired.
As currently written, the neutron code performs its comparison using the tuple of mac, ip and client_id. By default, clients use their hardware type (0x01 for Ethernet in this case) concatenated with their MAC address for client_id (option 61) in DHCPDISCOVER/DHCPREQUEST calls: https://www.rfc-editor.org/rfc/rfc2132.html#section-9.14 By default, dnsmasq uses this client_id internally and also writes it to the leases file. When the neutron dhcp agent code is comparing the host and lease data, since the leases data contains a client_id but the host data does not, neutron concludes that the lease should be expired due to the missing (None) value for client-id in the host file. As such, neutron issues a command to NAK the lease. When the client then receives this NAK, it forces a new DORA from the client to reinitialize the DHCP lease process. For an OS such as Windows, this is very impactful as upon receipt of the NAK, the Windows DHCP client concludes it is no longer authorized to use this IP and should immediately release the IP before performing a new DORA process. In doing so, all connectivity to/from the VM is lost as all active connections are dropped while the DORA process is ongoing and the VM reacquires a new lease to the same IP it had previously. At that point, all connections must be reestablished. Example host/leases files: cat /var/lib/neutron/dhcp/bae80a38-1f4c-4b51-ab4b-1a0df7f79933/leases 1756993657 fa:16:3e:e9:50:dd 172.16.1.77 test 01:fa:16:3e:e9:50:dd 1756993640 fa:16:3e:d5:f8:1b 172.16.2.30 test2 01:fa:16:3e:d5:f8:1b cat /var/lib/neutron/dhcp/bae80a38-1f4c-4b51-ab4b-1a0df7f79933/host fa:16:3e:e9:50:dd,set:66e5668b6d354f38bd80ed7c2a2fb9fe,test,172.16.1.77,set:port-5c9388b8-60f2-4e93-8a7a-954acf662bc5 fa:16:3e:d5:f8:1b,set:66e5668b6d354f38bd80ed7c2a2fb9fe,test2,172.16.2.30,set:port-c23dcf52-1404-4a5a-b7ee-7cd9f7530837 I am conflicted on the best/proper way to resolve this. During local testing, if the '--dhcp-ignore-clid' option is passed to dnsmasq so that it ignores and does not use the client provided client- id and instead writes a '*' to the leases file for client-id, the existing neutron code works as expected when parsing the host/leases files. Existing leases are retained when the agent reloads upon port updates. Given that the current neutron code explicitly checks client-id, it seems undesired to bypass this check in this manner. However, since an individual client can send any value it likes for client-id, it does not seem that the current neutron code will behave as expected in that case. It seems that only when extra_dhcp_opts are specified on the port and an expected client-id is used will the code properly handle comparing the host/leases file entries. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2122079 Title: DHCP agent incorrectly releasing leases due to missing client_id Status in neutron: New Bug description: Upon reload of the dhcp agent as port assignments are updated, the existing leases for dnsmasq are examined against the valid host entries to determine if any of the existing leases are invalid (no longer defined) and should be expired. As currently written, the neutron code performs its comparison using the tuple of mac, ip and client_id. By default, clients use their hardware type (0x01 for Ethernet in this case) concatenated with their MAC address for client_id (option 61) in DHCPDISCOVER/DHCPREQUEST calls: https://www.rfc-editor.org/rfc/rfc2132.html#section-9.14 By default, dnsmasq uses this client_id internally and also writes it to the leases file. When the neutron dhcp agent code is comparing the host and lease data, since the leases data contains a client_id but the host data does not, neutron concludes that the lease should be expired due to the missing (None) value for client-id in the host file. As such, neutron issues a command to NAK the lease. When the client then receives this NAK, it forces a new DORA from the client to reinitialize the DHCP lease process. For an OS such as Windows, this is very impactful as upon receipt of the NAK, the Windows DHCP client concludes it is no longer authorized to use this IP and should immediately release the IP before performing a new DORA process. In doing so, all connectivity to/from the VM is lost as all active connections are dropped while the DORA process is ongoing and the VM reacquires a new lease to the same IP it had previously. At that point, all connections must be reestablished. Example host/leases files: cat /var/lib/neutron/dhcp/bae80a38-1f4c-4b51-ab4b-1a0df7f79933/leases 1756993657 fa:16:3e:e9:50:dd 172.16.1.77 test 01:fa:16:3e:e9:50:dd 1756993640 fa:16:3e:d5:f8:1b 172.16.2.30 test2 01:fa:16:3e:d5:f8:1b cat /var/lib/neutron/dhcp/bae80a38-1f4c-4b51-ab4b-1a0df7f79933/host fa:16:3e:e9:50:dd,set:66e5668b6d354f38bd80ed7c2a2fb9fe,test,172.16.1.77,set:port-5c9388b8-60f2-4e93-8a7a-954acf662bc5 fa:16:3e:d5:f8:1b,set:66e5668b6d354f38bd80ed7c2a2fb9fe,test2,172.16.2.30,set:port-c23dcf52-1404-4a5a-b7ee-7cd9f7530837 I am conflicted on the best/proper way to resolve this. During local testing, if the '--dhcp-ignore-clid' option is passed to dnsmasq so that it ignores and does not use the client provided client-id and instead writes a '*' to the leases file for client-id, the existing neutron code works as expected when parsing the host/leases files. Existing leases are retained when the agent reloads upon port updates. Given that the current neutron code explicitly checks client-id, it seems undesired to bypass this check in this manner. However, since an individual client can send any value it likes for client-id, it does not seem that the current neutron code will behave as expected in that case. It seems that only when extra_dhcp_opts are specified on the port and an expected client-id is used will the code properly handle comparing the host/leases file entries. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2122079/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

