I'm reopening this because I believe the fix committed fixes only part of the problem. With firewall_driver=noop the unnecessary ingress flooding on br-int is gone. However we still have the same unnecessary flooding with firewall_driver=openvswitch. For details and a full reproduction please comments to bug #2048785:
https://bugs.launchpad.net/neutron/+bug/2048785/comments/2 https://bugs.launchpad.net/neutron/+bug/2048785/comments/6 ** Changed in: neutron Status: Fix Released => New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1884708 Title: explicity_egress_direct prevents learning of local MACs and causes flooding of ingress packets Status in neutron: New Bug description: We took this bug fix: https://bugs.launchpad.net/neutron/+bug/1732067 and then also backported ourselves https://bugs.launchpad.net/neutron/+bug/1866445 The latter is for iptables based firewall. We have VLAN based networks, and seeing ingress packets destined to local MACs being flooded. We are not seeing any local MACs present under ovs-appctl fdb/show br-int. Consider following example: HOST 1: MAC A = fa:16:3e:c1:01:43 MAC B = fa:16:3e:de:0b:8a HOST 2: MAC C = fa:16:3e:d6:3f:31 A is talking to C. Snooping on qvo interface of B, we are seeing all the traffic destined to MAC A (along with other unicast traffic not destined to or sourced from MAC B. Neither Mac A or B are present in br-int FDB, despite sending heavy traffic. Here is ofproto trace for such packet. in_port 8313 is qvo of MAC A: sudo ovs-appctl ofproto/trace br-int in_port=8313,tcp,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31 Flow: tcp,in_port=8313,vlan_tci=0x0000,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0 bridge("br-int") ---------------- 0. in_port=8313, priority 9, cookie 0x9a67096130ac45c2 goto_table:25 25. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 2, cookie 0x9a67096130ac45c2 goto_table:60 60. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 9, cookie 0x9a67096130ac45c2 resubmit(,61) 61. in_port=8313,dl_src=fa:16:3e:c1:01:43,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 10, cookie 0x9a67096130ac45c2 push_vlan:0x8100 set_field:4098->vlan_vid output:1 bridge("br-ext") ---------------- 0. in_port=2, priority 2, cookie 0xab09adf2af892674 goto_table:1 1. priority 0, cookie 0xab09adf2af892674 goto_table:2 2. in_port=2,dl_vlan=2, priority 4, cookie 0xab09adf2af892674 set_field:4240->vlan_vid NORMAL -> forwarding to learned port bridge("br-vlan") ----------------- 0. priority 1, cookie 0x651552fc69601a2d goto_table:3 3. priority 1, cookie 0x651552fc69601a2d NORMAL -> forwarding to learned port Final flow: tcp,in_port=8313,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0 Megaflow: recirc_id=0,eth,ip,in_port=8313,vlan_tci=0x0000/0x1fff,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_frag=no Datapath actions: push_vlan(vid=144,pcp=0),51 Because it took output: action from table=61, added by fix explicitly_egress_direct, the local MAC is not learned. But on ingress, the packet is hitting table=60's NORMAL action, causing it to be flooded because it never knows where to send the local MAC. sudo ovs-appctl ofproto/trace br-int in_port=1,dl_vlan=144,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43 Flow: in_port=1,dl_vlan=144,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000 bridge("br-int") ---------------- 0. in_port=1,dl_vlan=144, priority 3, cookie 0x9a67096130ac45c2 set_field:4098->vlan_vid goto_table:60 60. priority 3, cookie 0x9a67096130ac45c2 NORMAL -> no learned MAC for destination, flooding bridge("br-vlan") ----------------- 0. in_port=4, priority 2, cookie 0x651552fc69601a2d goto_table:1 1. priority 0, cookie 0x651552fc69601a2d goto_table:2 2. in_port=4, priority 2, cookie 0x651552fc69601a2d drop bridge("br-tun") ---------------- 0. in_port=1, priority 1, cookie 0xf1baf24d000c6f7c goto_table:1 1. priority 0, cookie 0xf1baf24d000c6f7c goto_table:2 2. dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 0, cookie 0xf1baf24d000c6f7c goto_table:20 20. priority 0, cookie 0xf1baf24d000c6f7c goto_table:22 22. priority 0, cookie 0xf1baf24d000c6f7c drop Final flow: in_port=1,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000 Megaflow: recirc_id=0,eth,in_port=1,dl_vlan=144,dl_vlan_pcp=0,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000 Datapath actions: pop_vlan,push_vlan(vid=2,pcp=0),7,pop_vlan,46,26,57,58,13,6,61,66,68,22,23,72,78,79,34,81,83,2,18,87,33,88,90,91,94,95,99,100,101,102,103,106,108,113,115,116,125,132,133,134,144,145,146,147,165,168,169,170,173,174,175,178,201,203,204,205,216,222,148,150,200,160,181,54,159,151,110,182,114,233,241,212,238,154,11,213,70,29,37,131,45,93,14,139,48,105,152,129,28,12,107,172,196,3,4,62,40,183,124,20,32,67,82,135,153,84,98,109,111,123,5,65,119,120,104,122,128,130,137,142,143,121,141,176,177,179,184,186,190 dump-flows br-int indicates it first hits this rule: cookie=0x6832197111786c03, duration=107845.507s, table=0, n_packets=98500552445, n_bytes=66585173373354, idle_age=0, hard_age=65534, priority=3,in_port=1,dl_vlan=144 actions=mod_vlan_vid:1,resubmit(,60) then at table=60, the only rule it matches is the final NORMAL rule: cookie=0x6832197111786c03, duration=107949.777s, table=60, n_packets=245019667777, n_bytes=135203331684577, idle_age=0, hard_age=65534, priority=3 actions=NORMAL I tried both attaching, and unattaching the subnet to a DVR router. If I attach to a DVR router, I *DO* see a bunch of table=60 output actions for my local VMs. The problem however, is they appear with the *external VLAN ID*, here is an example: cookie=0x6832197111786c03, duration=107840.054s, table=60, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=20,dl_vlan=144,dl_dst=fa:16:3e:59:d2:b1 actions=strip_vlan,output:5663 But as we saw, the ingress packet hits that first table=0 mod_vlan_vid:1,resubmit(,60), which changes VLAN 144 to the internal VLAN of 1. For a network not attached to DVR router, there is a similar table=0, rule to change from external VLAN to internal VLAN: cookie=0xbab0a875dbcda4a0, duration=25949.321s, table=0, n_packets=2618258, n_bytes=2851837213, idle_age=0, priority=3,in_port=1,dl_vlan=2505 actions=mod_vlan_vid:83,resubmit(,60) And because this is a provider network, there are no local DVR mac rules at table=60, so it always hits NORMAL action. So, how do we cover all bases and ensure we have the fix to prevent egress flooding (https://bugs.launchpad.net/neutron/+bug/1732067 and https://bugs.launchpad.net/neutron/+bug/1866445), but then also prevent ingress flooding? The fix for one seems to cause breakage in other direction To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1884708/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

