Public bug reported: Env: pike + ovs + vxlan + l2pop + iptables_hybrid. Dhcp agent on differnt node than compute.
Steps: 1. Boot 4 or more vms to same compute and same vxlan net. 2. Wait until they are fully running and reboot compute node. 3. After boot the vms are in status SHUTOFF. Start the vms. Vms don't get an ip address from neutron dhcp. The flood to tunnels flow (br-tun table 22) for the network is missing, so broadcasts like dhcp requests don't get on a tunnel to the node with dhcp agent. Neutron server did not send the flooding entry to the agent. It only does that for the first or second active port, or if the agent is restarted. After the compute boots, neutron-ovs-cleanup runs first and deletes the qvo ports from br-int [4]. Then the ovs-agent starts and nova-compute after it. Nova-compute destroys the domains and moves the vms to SHUTOFF status. It also (for some reason) recreates the qbr linux bridges and qvb/qvo veths connected to br-int. So neutron continues to see the ports as ACTIVE even though the vms are SHUTOFF, and agent_active_ports [1] never drops below 3. Also nova-compute might start a short time after the ovs-agent and the new ports are not detected in first iteration of the ovs agent loop, so agent_restarted will be false here [2]. Before [3] agent_restarted was true if the agent was running for less than agent_boot_time (default 180 sec) and the problem did not show. It does not happen if neutron-ovs-cleanup is disabled. Then the ovs agent first treats them as skipped_devices and they get status DOWN. [1] https://github.com/openstack/neutron/blob/21a52f7ae597f7992f32ff41cedff0c31e35c762/neutron/plugins/ml2/drivers/l2pop/mech_driver.py#L306 [2] https://github.com/openstack/neutron/blob/21a52f7ae597f7992f32ff41cedff0c31e35c762/neutron/plugins/ml2/drivers/l2pop/mech_driver.py#L310 [3] https://opendev.org/openstack/neutron/commit/62fe7852bbd70a24174853997096c52ee015e269 [4] https://bugs.launchpad.net/neutron/+bug/1853582 ** Affects: neutron Importance: Undecided Status: New ** Tags: l2-pop ovs ** Tags added: l2-pop ovs -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1853613 Title: VMs don't get ip from dhcp after compute restart Status in neutron: New Bug description: Env: pike + ovs + vxlan + l2pop + iptables_hybrid. Dhcp agent on differnt node than compute. Steps: 1. Boot 4 or more vms to same compute and same vxlan net. 2. Wait until they are fully running and reboot compute node. 3. After boot the vms are in status SHUTOFF. Start the vms. Vms don't get an ip address from neutron dhcp. The flood to tunnels flow (br-tun table 22) for the network is missing, so broadcasts like dhcp requests don't get on a tunnel to the node with dhcp agent. Neutron server did not send the flooding entry to the agent. It only does that for the first or second active port, or if the agent is restarted. After the compute boots, neutron-ovs-cleanup runs first and deletes the qvo ports from br-int [4]. Then the ovs-agent starts and nova- compute after it. Nova-compute destroys the domains and moves the vms to SHUTOFF status. It also (for some reason) recreates the qbr linux bridges and qvb/qvo veths connected to br-int. So neutron continues to see the ports as ACTIVE even though the vms are SHUTOFF, and agent_active_ports [1] never drops below 3. Also nova-compute might start a short time after the ovs-agent and the new ports are not detected in first iteration of the ovs agent loop, so agent_restarted will be false here [2]. Before [3] agent_restarted was true if the agent was running for less than agent_boot_time (default 180 sec) and the problem did not show. It does not happen if neutron-ovs-cleanup is disabled. Then the ovs agent first treats them as skipped_devices and they get status DOWN. [1] https://github.com/openstack/neutron/blob/21a52f7ae597f7992f32ff41cedff0c31e35c762/neutron/plugins/ml2/drivers/l2pop/mech_driver.py#L306 [2] https://github.com/openstack/neutron/blob/21a52f7ae597f7992f32ff41cedff0c31e35c762/neutron/plugins/ml2/drivers/l2pop/mech_driver.py#L310 [3] https://opendev.org/openstack/neutron/commit/62fe7852bbd70a24174853997096c52ee015e269 [4] https://bugs.launchpad.net/neutron/+bug/1853582 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1853613/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

