Reviewed: https://review.openstack.org/609440 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=020d745f5b859f93f0c550be221c350bc14e8d23 Submitter: Zuul Branch: stable/pike
commit 020d745f5b859f93f0c550be221c350bc14e8d23 Author: Swaminathan Vasudevan <[email protected]> Date: Thu Aug 23 05:54:17 2018 +0000 Revert "DVR: Inter Tenant Traffic between networks not possible with shared net" This reverts commit d019790fe436b72cb05b8d0ff1f3a62ebd9e9bee. Closes-Bug: #1783654 Change-Id: I4fd2610e185fb60cae62693cd4032ab700209b5f (cherry picked from commit fd72643a61f726145288b2a468b044e84d02c88e) (cherry picked from commit b70afb50138f9588a5165e1ca986f83856d5399d) ** Changed in: cloud-archive/pike Status: Invalid => Fix Committed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1783654 Title: DVR process flow not installed on physical bridge for shared tenant network Status in Ubuntu Cloud Archive: Fix Released Status in Ubuntu Cloud Archive pike series: Fix Committed Status in Ubuntu Cloud Archive queens series: Fix Committed Status in Ubuntu Cloud Archive rocky series: Fix Released Status in neutron: Fix Released Status in neutron package in Ubuntu: Fix Released Status in neutron source package in Bionic: Fix Committed Status in neutron source package in Cosmic: Fix Released Bug description: Seems like collateral from https://bugs.launchpad.net/neutron/+bug/1751396 In DVR, the distributed gateway port's IP and MAC are shared in the qrouter across all hosts. The dvr_process_flow on the physical bridge (which replaces the shared router_distributed MAC address with the unique per-host MAC when its the source), is missing, and so is the drop rule which instructs the bridge to drop all traffic destined for the shared distributed MAC. Because of this, we are seeing the router MAC on the network infrastructure, causing it on flap on br-int on every compute host: root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec 1 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec 2 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 1 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 1 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 1 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 1 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 1 4 fa:16:3e:42:a2:ec 1 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec 0 Where port 1 is phy-br-vlan, connecting to the physical bridge, and port 11 is the correct local qr-interface. Because these dvr flows are missing on br-vlan, pkts w/ source mac ingress into the host and br- int learns it upstream. The symptom is when pinging a VM's floating IP, we see occasional packet loss (10-30%), and sometimes the responses are sent upstream by br-int instead of the qrouter, so the ICMP replies come with fixed IP of the replier since no NAT'ing took place, and on the tenant network rather than external network. When I force net_shared_only to False here, the problem goes away: https://github.com/openstack/neutron/blob/stable/pike/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L436 It should we noted we *ONLY* need to do this on our dvr_snat host. The dvr process's are missing on every compute host. But if we shut qrouter on the snat host, FIP functionality works and DVR mac stops flapping on others. Or if we apply fix only to snat host, it works. Perhaps there is something on SNAT node that is unique Ubuntu SRU details: ------------------- [Impact] See above [Test Case] Deploy OpenStack with dvr enabled and then follow the steps above. [Regression Potential] The patches that are backported have already landed upstream in the corresponding stable branches, helping to minimize any regression potential. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1783654/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

