On Mon, Jan 9, 2023 at 7:59 AM Mark Michelson <[email protected]> wrote: > > Hi Han, thanks for this change and thanks for putting in the work to > debug the issue and to explain how the problem manifests. > > I only have a couple of small comments below. > > On 12/19/22 19:05, Han Zhou wrote: > > When a packet enters LR pipeline from a distributed gateway port with > > destination IP being a SNAT IP, it goes through the unSNAT stage and > > it is possible that the unSNAT fails to convert the dst IP when no > > conntrack entries are accociated with the packet. In this case, the > > packet is rerouted to the same DGP, and results in recirc loop in > > datapath. The packet would finally be dropped either due to ttl or > > the recirc limit, but it would have created unnecessary cost. > > > > To reproduce the problem, simply configure SNAT on a LR with the SNAT IP > > being the DGP's IP, and then send a packet from external (DGP's LS) to > > the SNAT IP. Kernel logs like below will be seen: > > > > openvswitch: ovs-system: deferred action limit reached, drop recirc action > > > > DP flow dump would also show plenty of flows related to this packet, > > each with a different ttl match, indicating the packet has been looped > > many times. > > > > Commit 802f9275ac (ovn-northd: Drop IP packets destined to router owned > > IPs (after NAT)) already added flows to drop packets failed unSNAT for > > Gateway Routers. It added flows with a low priority (2) to drop the > > packets that fail ARP resolve, to avoid triggering ARP request for the > > SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve > > flows are added for thoses NAT IPs so that the packets can continue the > > pipeline and possibly redirect to redirect chassis. So, because of these > > ARP resolve flows, even the packets failed unSNAT would continue the > > pipeline and won't hit the low priority (2) flows, thus not get dropped. > > > > To fix the problem, for each of the ARP resolve flow added for the DGP > > NAT IPs, a higher priority (150) flow is added to check if the packet's > > inport is the DGP (same as the outport), then drop the packet directly. > > > > Test cases are updated to cover both Gateway Router and DGP scenarios, > > with packets from both directions (uplink and downlink). > > > > Reported-by: Krzysztof Klimonda <[email protected]> > > Reported-at: https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ > > Reported-by: Frode Nordahl <[email protected]> > > Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718 > > Reported-by: Roberto Bartzen Acosta <[email protected]> > > Reported-at: https://github.com/ovn-org/ovn/issues/134 > > Reported-by: Syed Ammad Ali <[email protected]> > > Reported-at: https://github.com/ovn-org/ovn/issues/153 > > Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html > > Signed-off-by: Han Zhou <[email protected]> > > --- > > northd/northd.c | 17 ++++++++ > > northd/ovn-northd.8.xml | 29 +++++++++---- > > tests/ovn.at | 92 ++++++++++++++++++++++++++++++++++------- > > 3 files changed, 116 insertions(+), 22 deletions(-) > > > > diff --git a/northd/northd.c b/northd/northd.c > > index 4751feab4f3c..ebd153b020fa 100644 > > --- a/northd/northd.c > > +++ b/northd/northd.c > > @@ -14341,6 +14341,23 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath *od, struct hmap *lflows, > > sset_add(&nat_entries, nat->external_ip); > > } else { > > if (!sset_contains(&nat_entries, nat->external_ip)) { > > + /* Drop packets coming in from external but still has > > + * destination IP equals to the NAT external IP, to avoid loop. > > + * The packets must have went through DNAT/unSNAT stage but > > + * failed to convert the destination. */ > > + ds_clear(match); > > + ds_put_format( > > + match, "inport == %s && outport == %s && ip%s.dst == %s", > > + l3dgw_port->json_key, l3dgw_port->json_key, > > + is_v6 ? "6" : "4", nat->external_ip); > > + ovn_lflow_add_with_hint(lflows, od, > > + S_ROUTER_IN_ARP_RESOLVE, > > + 150, ds_cstr(match), > > + "drop;", > > With Adrian Moreno's series to sample all packet drops, we should not be > using a bare "drop;" action in ovn-northd any more. Instead, call the > debug_drop_action() function to fill in the proper drop action.
Hi Mark, thanks for pointing it out. I updated it in v2, but I do see some other places in northd.c still using the "drop" instead of debug_drop_action(). > > > + &nat->header_); > > + /* Now for packets coming from other (downlink) LRPs, allow ARP > > + * resolve for the NAT IP, so that such packets can be > > + * forwarded for E/W NAT. */ > > ds_clear(match); > > ds_put_format( > > match, "outport == %s && %s == %s", > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > > index 25a742c904c6..e06f49b26ea4 100644 > > --- a/northd/ovn-northd.8.xml > > +++ b/northd/ovn-northd.8.xml > > @@ -4262,13 +4262,17 @@ outport = <var>P</var> > > For each row in the <code>NAT</code> table with IPv4 address > > <var>A</var> in the <ref column="external_ip" > > table="NAT" db="OVN_Northbound"/> column of > > - <ref table="NAT" db="OVN_Northbound"/> table, a priority-100 > > - flow with the match <code>outport === <var>P</var> && > > - reg0 == <var>A</var></code> has actions <code>eth.dst = <var>E</var>; > > - next;</code>, where <code>P</code> is the distributed logical router > > - port, <var>E</var> is the Ethernet address if set in the > > - <ref column="external_mac" table="NAT" db="OVN_Northbound"/> column > > - of <ref table="NAT" db="OVN_Northbound"/> table for of type > > + <ref table="NAT" db="OVN_Northbound"/> table, below two flows are > > + programmed: > > + </p> > > + > > + <p> > > + A priority-100 flow with the match <code>outport == <var>P</var> > > + && reg0 == <var>A</var></code> has actions <code>eth.dst = > > + <var>E</var>; next;</code>, where <code>P</code> is the distributed > > + logical router port, <var>E</var> is the Ethernet address if set in > > + the <ref column="external_mac" table="NAT" db="OVN_Northbound"/> > > + column of <ref table="NAT" db="OVN_Northbound"/> table for of type > > <code>dnat_and_snat</code>, otherwise the Ethernet address of the > > distributed logical router port. Note that if the > > <ref column="external_ip" table="NAT" db="OVN_Northbound"/> is not > > @@ -4278,9 +4282,18 @@ outport = <var>P</var> > > will be added. > > </p> > > > > + <p> > > + Corresponding to the above flow, a priority-150 flow with the match > > + <code>inport == <var>P</var> && outport == <var>P</var> > > + && ip4.dst == <var>A</var></code> has actions > > + <code>drop;</code> to exclude packets that have went through > > Either "packets that went through" or "packets that have gone through" > would be suitable here. "Have went" is improper (I'd happily accept it > in code comments, but for documentation, we need to be more formal) > Sorry about this. Fixed in V2 (also fixed in comments). PTAL: https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ Thanks, Han > > + DNAT/unSNAT stage but failed to convert the destination, to avoid > > + loop. > > + </p> > > + > > <p> > > For IPv6 NAT entries, same flows are added, but using the register > > - <code>xxreg0</code> for the match. > > + <code>xxreg0</code> and field <code>ip6</code> for the match. > > </p> > > </li> > > > > diff --git a/tests/ovn.at b/tests/ovn.at > > index d2163d87d823..dc47efdfd934 100644 > > --- a/tests/ovn.at > > +++ b/tests/ovn.at > > @@ -28419,24 +28419,39 @@ wait_row_count Port_Binding 1 logical_port=lsp-cont1 chassis=$ch > > OVN_CLEANUP([hv1]) > > AT_CLEANUP > > > > +# TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS [ DGP | GR ] > > # Test dropping traffic destined to router owned IPs. > > -OVN_FOR_EACH_NORTHD([ > > -AT_SETUP([gateway router drop traffic for own IPs]) > > +m4_define([TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS], [ > > ovn_start > > > > -ovn-nbctl lr-add r1 -- set logical_router r1 options:chassis=hv1 > > -ovn-nbctl ls-add s1 > > +ovn-nbctl lr-add r1 # Gateway router or LR with DGP on the ext side > > +ovn-nbctl ls-add ext # simulate external LS > > +ovn-nbctl ls-add s2 # simulate internal LS > > > > -# Connnect r1 to s1. > > -ovn-nbctl lrp-add r1 lrp-r1-s1 00:00:00:00:01:01 10.0.1.1/24 > > -ovn-nbctl lsp-add s1 lsp-s1-r1 -- set Logical_Switch_Port lsp-s1-r1 type=router \ > > - options:router-port=lrp-r1-s1 addresses=router > > +# Connnect r1 to ext. > > +ovn-nbctl lrp-add r1 lrp-r1-ext 00:00:00:00:01:01 10.0.1.1/24 > > +if test X"$1" = X"DGP"; then > > + ovn-nbctl lrp-set-gateway-chassis lrp-r1-ext hv1 1 > > +else > > + ovn-nbctl set logical_router r1 options:chassis=hv1 > > +fi > > +ovn-nbctl lsp-add ext lsp-ext-r1 -- set Logical_Switch_Port lsp-ext-r1 type=router \ > > + options:router-port=lrp-r1-ext addresses=router > > > > -# Create logical port p1 in s1 > > -ovn-nbctl lsp-add s1 p1 \ > > +# Connnect r1 to s2. > > +ovn-nbctl lrp-add r1 lrp-r1-s2 00:00:00:00:02:01 10.0.2.1/24 > > +ovn-nbctl lsp-add s2 lsp-s2-r1 -- set Logical_Switch_Port lsp-s2-r1 type=router \ > > + options:router-port=lrp-r1-s2 addresses=router > > + > > +# Create logical port p1 in ext > > +ovn-nbctl lsp-add ext p1 \ > > -- lsp-set-addresses p1 "f0:00:00:00:01:02 10.0.1.2" \ > > -- lsp-set-port-security p1 "f0:00:00:00:01:02 10.0.1.2" > > > > +# Create logical port p2 in s2 > > +ovn-nbctl lsp-add s2 p2 \ > > +-- lsp-set-addresses p2 "f0:00:00:00:02:02 10.0.2.2" > > + > > # Create two hypervisor and create OVS ports corresponding to logical ports. > > net_add n1 > > > > @@ -28450,6 +28465,12 @@ ovs-vsctl -- add-port br-int hv1-vif1 -- \ > > options:rxq_pcap=hv1/vif1-rx.pcap \ > > ofport-request=1 > > > > +ovs-vsctl -- add-port br-int hv1-vif2 -- \ > > + set interface hv1-vif2 external-ids:iface-id=p2 \ > > + options:tx_pcap=hv1/vif2-tx.pcap \ > > + options:rxq_pcap=hv1/vif2-rx.pcap \ > > + ofport-request=2 > > + > > # Pre-populate the hypervisors' ARP tables so that we don't lose any > > # packets for ARP resolution (native tunneling doesn't queue packets > > # for ARP resolution). > > @@ -28460,9 +28481,10 @@ ovn-nbctl --wait=hv sync > > > > sw_key=$(ovn-sbctl --bare --columns tunnel_key list datapath_binding r1) > > > > +echo sw_key: $sw_key > > AT_CHECK([ovn-sbctl lflow-list | grep lr_in_arp_resolve | grep 10.0.1.1], [1], []) > > > > -# Send ip packets from p1 to lrp-r1-s1 > > +# Send ip packets from p1 to lrp-r1-ext > > src_mac="f00000000102" > > dst_mac="000000000101" > > src_ip=`ip_to_hex 10 0 1 2` > > @@ -28481,10 +28503,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=11, n_packets=1,.* > > ]) > > > > # Use the router IP as SNAT IP. > > -ovn-nbctl set logical_router r1 options:lb_force_snat_ip=10.0.1.1 > > +ovn-nbctl lr-nat-add r1 snat 10.0.1.1 10.8.8.0/24 > > ovn-nbctl --wait=hv sync > > > > -# Send ip packets from p1 to lrp-r1-s1 > > +# Send ip packets from p1 to lrp-r1-ext > > src_mac="f00000000102" > > dst_mac="000000000101" > > src_ip=`ip_to_hex 10 0 1 2` > > @@ -28499,11 +28521,53 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep "actions=controller" | grep > > ]) > > > > # The packet should've been dropped in the lr_in_arp_resolve stage. > > -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=1,.* priority=2,ip,metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl > > +if test X"$1" = X"DGP"; then > > + prio=150 > > + inport=reg14 > > + outport=reg15 > > +else > > + prio=2 > > +fi > > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=1,.* priority=$prio,ip,$inport.*$outport.*metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl > > 1 > > ]) > > > > +# Send ip packets from p2 to lrp-r1-ext > > +src_mac="f00000000202" > > +dst_mac="000000000201" > > +src_ip=`ip_to_hex 10 0 2 2` > > +dst_ip=`ip_to_hex 10 0 1 1` > > +packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 > > +as hv1 ovs-appctl netdev-dummy/receive hv1-vif2 $packet > > + > > +# Still no packet-ins should reach ovn-controller. > > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep "actions=controller" | grep -v n_packets=0 -c], [1], [dnl > > +0 > > +]) > > + > > +if test X"$1" = X"DGP"; then > > + # The packet dst should be resolved once for E/W centralized NAT purpose. > > + AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=1,.* priority=100,reg0=0xa000101,reg15=.*metadata=0x${sw_key} actions=mod_dl_dst:00:00:00:00:01:01,resubmit" -c], [0], [dnl > > +1 > > +]) > > +fi > > + > > +# The packet should've been finally dropped in the lr_in_arp_resolve stage. > > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=2,.* priority=$prio,ip,$inport.*$outport.*metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl > > +1 > > +]) > > OVN_CLEANUP([hv1]) > > +]) > > + > > +OVN_FOR_EACH_NORTHD([ > > +AT_SETUP([gateway router drop traffic for own IPs]) > > +TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS(GR) > > +AT_CLEANUP > > +]) > > + > > +OVN_FOR_EACH_NORTHD([ > > +AT_SETUP([distributed gateway port drop traffic for own IPs]) > > +TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS(DGP) > > AT_CLEANUP > > ]) > > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
