When a packet enters LR pipeline from a distributed gateway port with destination IP being a SNAT IP, it goes through the unSNAT stage and it is possible that the unSNAT fails to convert the dst IP when no conntrack entries are accociated with the packet. In this case, the packet is rerouted to the same DGP, and results in recirc loop in datapath. The packet would finally be dropped either due to ttl or the recirc limit, but it would have created unnecessary cost.
To reproduce the problem, simply configure SNAT on a LR with the SNAT IP being the DGP's IP, and then send a packet from external (DGP's LS) to the SNAT IP. Kernel logs like below will be seen: openvswitch: ovs-system: deferred action limit reached, drop recirc action DP flow dump would also show plenty of flows related to this packet, each with a different ttl match, indicating the packet has been looped many times. Commit 802f9275ac (ovn-northd: Drop IP packets destined to router owned IPs (after NAT)) already added flows to drop packets failed unSNAT for Gateway Routers. It added flows with a low priority (2) to drop the packets that fail ARP resolve, to avoid triggering ARP request for the SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve flows are added for thoses NAT IPs so that the packets can continue the pipeline and possibly redirect to redirect chassis. So, because of these ARP resolve flows, even the packets failed unSNAT would continue the pipeline and won't hit the low priority (2) flows, thus not get dropped. To fix the problem, for each of the ARP resolve flow added for the DGP NAT IPs, a higher priority (150) flow is added to check if the packet's inport is the DGP (same as the outport), then drop the packet directly. Test cases are updated to cover both Gateway Router and DGP scenarios, with packets from both directions (uplink and downlink). Reported-by: Krzysztof Klimonda <[email protected]> Reported-at: https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ Reported-by: Frode Nordahl <[email protected]> Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718 Reported-by: Roberto Bartzen Acosta <[email protected]> Reported-at: https://github.com/ovn-org/ovn/issues/134 Reported-by: Syed Ammad Ali <[email protected]> Reported-at: https://github.com/ovn-org/ovn/issues/153 Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html Signed-off-by: Han Zhou <[email protected]> --- northd/northd.c | 17 ++++++++ northd/ovn-northd.8.xml | 29 +++++++++---- tests/ovn.at | 92 ++++++++++++++++++++++++++++++++++------- 3 files changed, 116 insertions(+), 22 deletions(-) diff --git a/northd/northd.c b/northd/northd.c index 4751feab4f3c..ebd153b020fa 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -14341,6 +14341,23 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath *od, struct hmap *lflows, sset_add(&nat_entries, nat->external_ip); } else { if (!sset_contains(&nat_entries, nat->external_ip)) { + /* Drop packets coming in from external but still has + * destination IP equals to the NAT external IP, to avoid loop. + * The packets must have went through DNAT/unSNAT stage but + * failed to convert the destination. */ + ds_clear(match); + ds_put_format( + match, "inport == %s && outport == %s && ip%s.dst == %s", + l3dgw_port->json_key, l3dgw_port->json_key, + is_v6 ? "6" : "4", nat->external_ip); + ovn_lflow_add_with_hint(lflows, od, + S_ROUTER_IN_ARP_RESOLVE, + 150, ds_cstr(match), + "drop;", + &nat->header_); + /* Now for packets coming from other (downlink) LRPs, allow ARP + * resolve for the NAT IP, so that such packets can be + * forwarded for E/W NAT. */ ds_clear(match); ds_put_format( match, "outport == %s && %s == %s", diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml index 25a742c904c6..e06f49b26ea4 100644 --- a/northd/ovn-northd.8.xml +++ b/northd/ovn-northd.8.xml @@ -4262,13 +4262,17 @@ outport = <var>P</var> For each row in the <code>NAT</code> table with IPv4 address <var>A</var> in the <ref column="external_ip" table="NAT" db="OVN_Northbound"/> column of - <ref table="NAT" db="OVN_Northbound"/> table, a priority-100 - flow with the match <code>outport === <var>P</var> && - reg0 == <var>A</var></code> has actions <code>eth.dst = <var>E</var>; - next;</code>, where <code>P</code> is the distributed logical router - port, <var>E</var> is the Ethernet address if set in the - <ref column="external_mac" table="NAT" db="OVN_Northbound"/> column - of <ref table="NAT" db="OVN_Northbound"/> table for of type + <ref table="NAT" db="OVN_Northbound"/> table, below two flows are + programmed: + </p> + + <p> + A priority-100 flow with the match <code>outport == <var>P</var> + && reg0 == <var>A</var></code> has actions <code>eth.dst = + <var>E</var>; next;</code>, where <code>P</code> is the distributed + logical router port, <var>E</var> is the Ethernet address if set in + the <ref column="external_mac" table="NAT" db="OVN_Northbound"/> + column of <ref table="NAT" db="OVN_Northbound"/> table for of type <code>dnat_and_snat</code>, otherwise the Ethernet address of the distributed logical router port. Note that if the <ref column="external_ip" table="NAT" db="OVN_Northbound"/> is not @@ -4278,9 +4282,18 @@ outport = <var>P</var> will be added. </p> + <p> + Corresponding to the above flow, a priority-150 flow with the match + <code>inport == <var>P</var> && outport == <var>P</var> + && ip4.dst == <var>A</var></code> has actions + <code>drop;</code> to exclude packets that have went through + DNAT/unSNAT stage but failed to convert the destination, to avoid + loop. + </p> + <p> For IPv6 NAT entries, same flows are added, but using the register - <code>xxreg0</code> for the match. + <code>xxreg0</code> and field <code>ip6</code> for the match. </p> </li> diff --git a/tests/ovn.at b/tests/ovn.at index d2163d87d823..dc47efdfd934 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -28419,24 +28419,39 @@ wait_row_count Port_Binding 1 logical_port=lsp-cont1 chassis=$ch OVN_CLEANUP([hv1]) AT_CLEANUP +# TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS [ DGP | GR ] # Test dropping traffic destined to router owned IPs. -OVN_FOR_EACH_NORTHD([ -AT_SETUP([gateway router drop traffic for own IPs]) +m4_define([TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS], [ ovn_start -ovn-nbctl lr-add r1 -- set logical_router r1 options:chassis=hv1 -ovn-nbctl ls-add s1 +ovn-nbctl lr-add r1 # Gateway router or LR with DGP on the ext side +ovn-nbctl ls-add ext # simulate external LS +ovn-nbctl ls-add s2 # simulate internal LS -# Connnect r1 to s1. -ovn-nbctl lrp-add r1 lrp-r1-s1 00:00:00:00:01:01 10.0.1.1/24 -ovn-nbctl lsp-add s1 lsp-s1-r1 -- set Logical_Switch_Port lsp-s1-r1 type=router \ - options:router-port=lrp-r1-s1 addresses=router +# Connnect r1 to ext. +ovn-nbctl lrp-add r1 lrp-r1-ext 00:00:00:00:01:01 10.0.1.1/24 +if test X"$1" = X"DGP"; then + ovn-nbctl lrp-set-gateway-chassis lrp-r1-ext hv1 1 +else + ovn-nbctl set logical_router r1 options:chassis=hv1 +fi +ovn-nbctl lsp-add ext lsp-ext-r1 -- set Logical_Switch_Port lsp-ext-r1 type=router \ + options:router-port=lrp-r1-ext addresses=router -# Create logical port p1 in s1 -ovn-nbctl lsp-add s1 p1 \ +# Connnect r1 to s2. +ovn-nbctl lrp-add r1 lrp-r1-s2 00:00:00:00:02:01 10.0.2.1/24 +ovn-nbctl lsp-add s2 lsp-s2-r1 -- set Logical_Switch_Port lsp-s2-r1 type=router \ + options:router-port=lrp-r1-s2 addresses=router + +# Create logical port p1 in ext +ovn-nbctl lsp-add ext p1 \ -- lsp-set-addresses p1 "f0:00:00:00:01:02 10.0.1.2" \ -- lsp-set-port-security p1 "f0:00:00:00:01:02 10.0.1.2" +# Create logical port p2 in s2 +ovn-nbctl lsp-add s2 p2 \ +-- lsp-set-addresses p2 "f0:00:00:00:02:02 10.0.2.2" + # Create two hypervisor and create OVS ports corresponding to logical ports. net_add n1 @@ -28450,6 +28465,12 @@ ovs-vsctl -- add-port br-int hv1-vif1 -- \ options:rxq_pcap=hv1/vif1-rx.pcap \ ofport-request=1 +ovs-vsctl -- add-port br-int hv1-vif2 -- \ + set interface hv1-vif2 external-ids:iface-id=p2 \ + options:tx_pcap=hv1/vif2-tx.pcap \ + options:rxq_pcap=hv1/vif2-rx.pcap \ + ofport-request=2 + # Pre-populate the hypervisors' ARP tables so that we don't lose any # packets for ARP resolution (native tunneling doesn't queue packets # for ARP resolution). @@ -28460,9 +28481,10 @@ ovn-nbctl --wait=hv sync sw_key=$(ovn-sbctl --bare --columns tunnel_key list datapath_binding r1) +echo sw_key: $sw_key AT_CHECK([ovn-sbctl lflow-list | grep lr_in_arp_resolve | grep 10.0.1.1], [1], []) -# Send ip packets from p1 to lrp-r1-s1 +# Send ip packets from p1 to lrp-r1-ext src_mac="f00000000102" dst_mac="000000000101" src_ip=`ip_to_hex 10 0 1 2` @@ -28481,10 +28503,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=11, n_packets=1,.* ]) # Use the router IP as SNAT IP. -ovn-nbctl set logical_router r1 options:lb_force_snat_ip=10.0.1.1 +ovn-nbctl lr-nat-add r1 snat 10.0.1.1 10.8.8.0/24 ovn-nbctl --wait=hv sync -# Send ip packets from p1 to lrp-r1-s1 +# Send ip packets from p1 to lrp-r1-ext src_mac="f00000000102" dst_mac="000000000101" src_ip=`ip_to_hex 10 0 1 2` @@ -28499,11 +28521,53 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep "actions=controller" | grep ]) # The packet should've been dropped in the lr_in_arp_resolve stage. -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=1,.* priority=2,ip,metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl +if test X"$1" = X"DGP"; then + prio=150 + inport=reg14 + outport=reg15 +else + prio=2 +fi +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=1,.* priority=$prio,ip,$inport.*$outport.*metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl 1 ]) +# Send ip packets from p2 to lrp-r1-ext +src_mac="f00000000202" +dst_mac="000000000201" +src_ip=`ip_to_hex 10 0 2 2` +dst_ip=`ip_to_hex 10 0 1 1` +packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 +as hv1 ovs-appctl netdev-dummy/receive hv1-vif2 $packet + +# Still no packet-ins should reach ovn-controller. +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep "actions=controller" | grep -v n_packets=0 -c], [1], [dnl +0 +]) + +if test X"$1" = X"DGP"; then + # The packet dst should be resolved once for E/W centralized NAT purpose. + AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=1,.* priority=100,reg0=0xa000101,reg15=.*metadata=0x${sw_key} actions=mod_dl_dst:00:00:00:00:01:01,resubmit" -c], [0], [dnl +1 +]) +fi + +# The packet should've been finally dropped in the lr_in_arp_resolve stage. +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=2,.* priority=$prio,ip,$inport.*$outport.*metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl +1 +]) OVN_CLEANUP([hv1]) +]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([gateway router drop traffic for own IPs]) +TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS(GR) +AT_CLEANUP +]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([distributed gateway port drop traffic for own IPs]) +TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS(DGP) AT_CLEANUP ]) -- 2.30.2 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
