Public bug reported: Issue:
The Neutron DHCP agent bootstraps the DHCP leases file for a network using all associated subnets[1]. In a multi-segment environment, however, a DHCP agent can only service a single segment/subnet of a given network. The DHCP namespace, then, is configured with an interface containing a single IP address for the respective segment/subnet it's servicing. When a VM from the same network but different segment/subnet is deleted, the DHCP release packet that would be issued by dhcp_release isn't sent due to a mismatch between client IP and local addr. Brian Haley patched dhcp_release.c recently to fix a similar issue here: http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=d9f882bea2806799bf3d1f73937f5e72d0bfc650;hp=fef2f1c75eba56b7355cbe729e4362474d558aa4;ds=sidebyside We can probably update dnsmasq-utils in the short term, but maybe making the DHCP agent segment aware is a better long-term solution? Here are the steps to reproduce: -=-=-=-=- Network: rpn_multisegment Segment 1: VLAN 106 10.106.0.0/24 Provider Mapping: physnet1:bond1 Segment 2: VLAN 206 10.206.0.0/24 Provider Mapping: physnet2:bond1 Two VMs: 🌕OpenStack Lab % openstack server list +--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+ | 40f94b68-7e38-45b6-855d-792399c2a9ff | vm-seg2 | ACTIVE | rpn_multisegment=10.206.0.53 | bionic-osa-master | osa-dev-8-8-60 | | 34f8ff53-e505-4267-a13a-b881dfcec240 | vm-seg1 | ACTIVE | rpn_multisegment=10.106.0.98 | bionic-osa-master | osa-dev-8-8-60 | +--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+ On compute01, we can see host file populated with entries for each subnet associated with the network: root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host fa:16:3e:07:f7:af,host-10-206-0-2.openstacklocal,10.206.0.2 fa:16:3e:2c:da:6d,host-10-106-0-2.openstacklocal,10.106.0.2 fa:16:3e:46:7b:d1,host-10-106-0-98.openstacklocal,10.106.0.98 fa:16:3e:ce:b1:b5,host-10-206-0-53.openstacklocal,10.206.0.53 Same on compute02: root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host fa:16:3e:07:f7:af,host-10-206-0-2.openstacklocal,10.206.0.2 fa:16:3e:2c:da:6d,host-10-106-0-2.openstacklocal,10.106.0.2 fa:16:3e:46:7b:d1,host-10-106-0-98.openstacklocal,10.106.0.98 fa:16:3e:ce:b1:b5,host-10-206-0-53.openstacklocal,10.206.0.53 The leases file, however, contains only those hosts that have obtained leases (expected): root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases 1606916842 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 ff:b5:5e:67:ff:00:02:00:00:ab:11:9e:a5:86:fd:ae:2f:49:ad 1606916738 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 * 1606916738 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 * root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases 1606916917 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 ff:b5:5e:67:ff:00:02:00:00:ab:11:9e:a5:86:fd:ae:2f:49:ad 1606916626 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 * Everything looks OK so far. When restarting the neutron-dhcp-agent, however, the leases file is bootstrapped and contains entries for all subnets associated with the network: root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases 1606917246 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 * 1606917246 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 * 1606917246 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 * 1606917246 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 * root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases 1606917254 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 * 1606917254 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 * 1606917254 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 * 1606917254 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 * This configuration becomes a problem when a VM is deleted and dhcp_release is executed, as the the namespaces on each host only have an IP from their respective segment and will not be able to delete a lease for what essentially is a non-connected subnet: root@lab-compute01:~# ip netns exec qdhcp-0e4fa560-1483-4ac5-be44-0542503f1e5a ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ns-5ccc6426-59@if102: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether fa:16:3e:2c:da:6d brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-5ccc6426-59 valid_lft forever preferred_lft forever inet 10.106.0.2/24 brd 10.106.0.255 scope global ns-5ccc6426-59 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe2c:da6d/64 scope link valid_lft forever preferred_lft forever root@lab-compute02:~# ip netns exec qdhcp-0e4fa560-1483-4ac5-be44-0542503f1e5a ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ns-0c51acd3-60@if85: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether fa:16:3e:07:f7:af brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.206.0.2/24 brd 10.206.0.255 scope global ns-0c51acd3-60 valid_lft forever preferred_lft forever inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-0c51acd3-60 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe07:f7af/64 scope link valid_lft forever preferred_lft forever Example: 🌕OpenStack Lab % openstack server delete vm-seg1 lab-compute01: Dec 01 13:58:12 lab-compute01 dnsmasq-dhcp[56028]: DHCPRELEASE(ns-5ccc6426-59) 10.106.0.98 fa:16:3e:46:7b:d1 Dec 01 13:58:13 lab-compute01 dnsmasq[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/addn_hosts - 3 addresses Dec 01 13:58:13 lab-compute01 dnsmasq-dhcp[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host Dec 01 13:58:13 lab-compute01 dnsmasq-dhcp[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/opts root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases 1606917246 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 * 1606917246 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 * 1606917246 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 * lab-compute02: Dec 01 13:58:13 lab-compute02 neutron-dhcp-agent[48564]: 2020-12-01 13:58:13.946 48564 WARNING neutron.agent.linux.dhcp [-] Could not release DHCP leases for these IP addresses after 3 tries: 10.106.0.98 Dec 01 13:58:14 lab-compute02 dnsmasq[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/addn_hosts - 3 addresses Dec 01 13:58:14 lab-compute02 dnsmasq-dhcp[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host Dec 01 13:58:14 lab-compute02 dnsmasq-dhcp[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/opts root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases 1606917254 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 * 1606917254 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 * 1606917254 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 * 1606917254 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 * As you can see, the lease for 10.106.0.98 was not deleted on compute02, as that segment/subnet is not configured on ns-0c51acd3-60 in the DHCP namespace like it would be in an ordinary provider network. [1] https://github.com/openstack/neutron/blob/5529b2f5cc6b451c771bc5134018e9dbd2cb6598/neutron/agent/linux/dhcp.py#L758 ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1906406 Title: [segments] dnsmasq can't delete lease for instance due to mismatch between client ip and local addr Status in neutron: New Bug description: Issue: The Neutron DHCP agent bootstraps the DHCP leases file for a network using all associated subnets[1]. In a multi-segment environment, however, a DHCP agent can only service a single segment/subnet of a given network. The DHCP namespace, then, is configured with an interface containing a single IP address for the respective segment/subnet it's servicing. When a VM from the same network but different segment/subnet is deleted, the DHCP release packet that would be issued by dhcp_release isn't sent due to a mismatch between client IP and local addr. Brian Haley patched dhcp_release.c recently to fix a similar issue here: http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=d9f882bea2806799bf3d1f73937f5e72d0bfc650;hp=fef2f1c75eba56b7355cbe729e4362474d558aa4;ds=sidebyside We can probably update dnsmasq-utils in the short term, but maybe making the DHCP agent segment aware is a better long-term solution? Here are the steps to reproduce: -=-=-=-=- Network: rpn_multisegment Segment 1: VLAN 106 10.106.0.0/24 Provider Mapping: physnet1:bond1 Segment 2: VLAN 206 10.206.0.0/24 Provider Mapping: physnet2:bond1 Two VMs: 🌕OpenStack Lab % openstack server list +--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+ | 40f94b68-7e38-45b6-855d-792399c2a9ff | vm-seg2 | ACTIVE | rpn_multisegment=10.206.0.53 | bionic-osa-master | osa-dev-8-8-60 | | 34f8ff53-e505-4267-a13a-b881dfcec240 | vm-seg1 | ACTIVE | rpn_multisegment=10.106.0.98 | bionic-osa-master | osa-dev-8-8-60 | +--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+ On compute01, we can see host file populated with entries for each subnet associated with the network: root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host fa:16:3e:07:f7:af,host-10-206-0-2.openstacklocal,10.206.0.2 fa:16:3e:2c:da:6d,host-10-106-0-2.openstacklocal,10.106.0.2 fa:16:3e:46:7b:d1,host-10-106-0-98.openstacklocal,10.106.0.98 fa:16:3e:ce:b1:b5,host-10-206-0-53.openstacklocal,10.206.0.53 Same on compute02: root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host fa:16:3e:07:f7:af,host-10-206-0-2.openstacklocal,10.206.0.2 fa:16:3e:2c:da:6d,host-10-106-0-2.openstacklocal,10.106.0.2 fa:16:3e:46:7b:d1,host-10-106-0-98.openstacklocal,10.106.0.98 fa:16:3e:ce:b1:b5,host-10-206-0-53.openstacklocal,10.206.0.53 The leases file, however, contains only those hosts that have obtained leases (expected): root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases 1606916842 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 ff:b5:5e:67:ff:00:02:00:00:ab:11:9e:a5:86:fd:ae:2f:49:ad 1606916738 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 * 1606916738 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 * root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases 1606916917 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 ff:b5:5e:67:ff:00:02:00:00:ab:11:9e:a5:86:fd:ae:2f:49:ad 1606916626 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 * Everything looks OK so far. When restarting the neutron-dhcp-agent, however, the leases file is bootstrapped and contains entries for all subnets associated with the network: root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases 1606917246 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 * 1606917246 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 * 1606917246 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 * 1606917246 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 * root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases 1606917254 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 * 1606917254 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 * 1606917254 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 * 1606917254 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 * This configuration becomes a problem when a VM is deleted and dhcp_release is executed, as the the namespaces on each host only have an IP from their respective segment and will not be able to delete a lease for what essentially is a non-connected subnet: root@lab-compute01:~# ip netns exec qdhcp-0e4fa560-1483-4ac5-be44-0542503f1e5a ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ns-5ccc6426-59@if102: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether fa:16:3e:2c:da:6d brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-5ccc6426-59 valid_lft forever preferred_lft forever inet 10.106.0.2/24 brd 10.106.0.255 scope global ns-5ccc6426-59 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe2c:da6d/64 scope link valid_lft forever preferred_lft forever root@lab-compute02:~# ip netns exec qdhcp-0e4fa560-1483-4ac5-be44-0542503f1e5a ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ns-0c51acd3-60@if85: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether fa:16:3e:07:f7:af brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.206.0.2/24 brd 10.206.0.255 scope global ns-0c51acd3-60 valid_lft forever preferred_lft forever inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-0c51acd3-60 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe07:f7af/64 scope link valid_lft forever preferred_lft forever Example: 🌕OpenStack Lab % openstack server delete vm-seg1 lab-compute01: Dec 01 13:58:12 lab-compute01 dnsmasq-dhcp[56028]: DHCPRELEASE(ns-5ccc6426-59) 10.106.0.98 fa:16:3e:46:7b:d1 Dec 01 13:58:13 lab-compute01 dnsmasq[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/addn_hosts - 3 addresses Dec 01 13:58:13 lab-compute01 dnsmasq-dhcp[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host Dec 01 13:58:13 lab-compute01 dnsmasq-dhcp[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/opts root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases 1606917246 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 * 1606917246 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 * 1606917246 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 * lab-compute02: Dec 01 13:58:13 lab-compute02 neutron-dhcp-agent[48564]: 2020-12-01 13:58:13.946 48564 WARNING neutron.agent.linux.dhcp [-] Could not release DHCP leases for these IP addresses after 3 tries: 10.106.0.98 Dec 01 13:58:14 lab-compute02 dnsmasq[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/addn_hosts - 3 addresses Dec 01 13:58:14 lab-compute02 dnsmasq-dhcp[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host Dec 01 13:58:14 lab-compute02 dnsmasq-dhcp[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/opts root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases 1606917254 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 * 1606917254 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 * 1606917254 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 * 1606917254 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 * As you can see, the lease for 10.106.0.98 was not deleted on compute02, as that segment/subnet is not configured on ns-0c51acd3-60 in the DHCP namespace like it would be in an ordinary provider network. [1] https://github.com/openstack/neutron/blob/5529b2f5cc6b451c771bc5134018e9dbd2cb6598/neutron/agent/linux/dhcp.py#L758 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1906406/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

