On 1/10/23 22:43, Han Zhou wrote:


On Mon, Jan 9, 2023 at 7:59 AM Mark Michelson <[email protected] <mailto:[email protected]>> wrote:
 >
 > Hi Han, thanks for this change and thanks for putting in the work to
 > debug the issue and to explain how the problem manifests.
 >
 > I only have a couple of small comments below.
 >
 > On 12/19/22 19:05, Han Zhou wrote:
 > > When a packet enters LR pipeline from a distributed gateway port with
 > > destination IP being a SNAT IP, it goes through the unSNAT stage and
 > > it is possible that the unSNAT fails to convert the dst IP when no
 > > conntrack entries are accociated with the packet. In this case, the
 > > packet is rerouted to the same DGP, and results in recirc loop in
 > > datapath. The packet would finally be dropped either due to ttl or
 > > the recirc limit, but it would have created unnecessary cost.
 > >
> > To reproduce the problem, simply configure SNAT on a LR with the SNAT IP
 > > being the DGP's IP, and then send a packet from external (DGP's LS) to
 > > the SNAT IP. Kernel logs like below will be seen:
 > >
> > openvswitch: ovs-system: deferred action limit reached, drop recirc action
 > >
 > > DP flow dump would also show plenty of flows related to this packet,
 > > each with a different ttl match, indicating the packet has been looped
 > > many times.
 > >
 > > Commit 802f9275ac (ovn-northd: Drop IP packets destined to router owned
 > > IPs (after NAT)) already added flows to drop packets failed unSNAT for
 > > Gateway Routers. It added flows with a low priority (2) to drop the
 > > packets that fail ARP resolve, to avoid triggering ARP request for the
 > > SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve
 > > flows are added for thoses NAT IPs so that the packets can continue the
> > pipeline and possibly redirect to redirect chassis. So, because of these
 > > ARP resolve flows, even the packets failed unSNAT would continue the
> > pipeline and won't hit the low priority (2) flows, thus not get dropped.
 > >
 > > To fix the problem, for each of the ARP resolve flow added for the DGP
 > > NAT IPs, a higher priority (150) flow is added to check if the packet's
 > > inport is the DGP (same as the outport), then drop the packet directly.
 > >
 > > Test cases are updated to cover both Gateway Router and DGP scenarios,
 > > with packets from both directions (uplink and downlink).
 > >
> > Reported-by: Krzysztof Klimonda <[email protected] <mailto:[email protected]>> > > Reported-at: https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ <https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/> > > Reported-by: Frode Nordahl <[email protected] <mailto:[email protected]>> > > Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718 <https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718> > > Reported-by: Roberto Bartzen Acosta <[email protected] <mailto:[email protected]>> > > Reported-at: https://github.com/ovn-org/ovn/issues/134 <https://github.com/ovn-org/ovn/issues/134> > > Reported-by: Syed Ammad Ali <[email protected] <mailto:[email protected]>> > > Reported-at: https://github.com/ovn-org/ovn/issues/153 <https://github.com/ovn-org/ovn/issues/153> > > Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html <https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html>
 > > Signed-off-by: Han Zhou <[email protected] <mailto:[email protected]>>
 > > ---
 > >   northd/northd.c         | 17 ++++++++
 > >   northd/ovn-northd.8.xml | 29 +++++++++----
> >   tests/ovn.at <http://ovn.at>            | 92 ++++++++++++++++++++++++++++++++++-------
 > >   3 files changed, 116 insertions(+), 22 deletions(-)
 > >
 > > diff --git a/northd/northd.c b/northd/northd.c
 > > index 4751feab4f3c..ebd153b020fa 100644
 > > --- a/northd/northd.c
 > > +++ b/northd/northd.c
> > @@ -14341,6 +14341,23 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath *od, struct hmap *lflows,
 > >               sset_add(&nat_entries, nat->external_ip);
 > >           } else {
 > >               if (!sset_contains(&nat_entries, nat->external_ip)) {
 > > +                /* Drop packets coming in from external but still has
> > +                 * destination IP equals to the NAT external IP, to avoid loop. > > +                 * The packets must have went through DNAT/unSNAT stage but
 > > +                 * failed to convert the destination. */
 > > +                ds_clear(match);
 > > +                ds_put_format(
> > +                    match, "inport == %s && outport == %s && ip%s.dst == %s",
 > > +                    l3dgw_port->json_key, l3dgw_port->json_key,
 > > +                    is_v6 ? "6" : "4", nat->external_ip);
 > > +                ovn_lflow_add_with_hint(lflows, od,
 > > +                                        S_ROUTER_IN_ARP_RESOLVE,
 > > +                                        150, ds_cstr(match),
 > > +                                        "drop;",
 >
 > With Adrian Moreno's series to sample all packet drops, we should not be
 > using a bare "drop;" action in ovn-northd any more. Instead, call the
 > debug_drop_action() function to fill in the proper drop action.

Hi Mark, thanks for pointing it out. I updated it in v2, but I do see some other places in northd.c still using the "drop" instead of debug_drop_action().

Hopefully those are all addressed by Adrian's follow-up series: https://patchwork.ozlabs.org/project/ovn/list/?series=333362 (specifically patch 2)


 >
 > > +                                        &nat->header_);
> > +                /* Now for packets coming from other (downlink) LRPs, allow ARP
 > > +                 * resolve for the NAT IP, so that such packets can be
 > > +                 * forwarded for E/W NAT. */
 > >                   ds_clear(match);
 > >                   ds_put_format(
 > >                       match, "outport == %s && %s == %s",
 > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
 > > index 25a742c904c6..e06f49b26ea4 100644
 > > --- a/northd/ovn-northd.8.xml
 > > +++ b/northd/ovn-northd.8.xml
 > > @@ -4262,13 +4262,17 @@ outport = <var>P</var>
> >             For each row in the <code>NAT</code> table with IPv4 address
 > >             <var>A</var> in the <ref column="external_ip"
 > >             table="NAT" db="OVN_Northbound"/> column of
 > > -          <ref table="NAT" db="OVN_Northbound"/> table, a priority-100
> > -          flow with the match <code>outport === <var>P</var> &amp;&amp; > > -          reg0 == <var>A</var></code> has actions <code>eth.dst = <var>E</var>; > > -          next;</code>, where <code>P</code> is the distributed logical router
 > > -          port, <var>E</var> is the Ethernet address if set in the
> > -          <ref column="external_mac" table="NAT" db="OVN_Northbound"/> column
 > > -          of <ref table="NAT" db="OVN_Northbound"/> table for of type
> > +          <ref table="NAT" db="OVN_Northbound"/> table, below two flows are
 > > +          programmed:
 > > +        </p>
 > > +
 > > +        <p>
> > +          A priority-100 flow with the match <code>outport == <var>P</var> > > +          &amp;&amp; reg0 == <var>A</var></code> has actions <code>eth.dst = > > +          <var>E</var>; next;</code>, where <code>P</code> is the distributed > > +          logical router port, <var>E</var> is the Ethernet address if set in > > +          the <ref column="external_mac" table="NAT" db="OVN_Northbound"/> > > +          column of <ref table="NAT" db="OVN_Northbound"/> table for of type > >             <code>dnat_and_snat</code>, otherwise the Ethernet address of the
 > >             distributed logical router port. Note that if the
> >             <ref column="external_ip" table="NAT" db="OVN_Northbound"/> is not
 > > @@ -4278,9 +4282,18 @@ outport = <var>P</var>
 > >             will be added.
 > >           </p>
 > >
 > > +        <p>
> > +          Corresponding to the above flow, a priority-150 flow with the match > > +          <code>inport == <var>P</var> &amp;&amp; outport == <var>P</var>
 > > +          &amp;&amp; ip4.dst == <var>A</var></code> has actions
 > > +          <code>drop;</code> to exclude packets that have went through
 >
 > Either "packets that went through" or "packets that have gone through"
 > would be suitable here. "Have went" is improper (I'd happily accept it
 > in code comments, but for documentation, we need to be more formal)
 >
Sorry about this. Fixed in V2 (also fixed in comments).

PTAL: https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ <https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/>

Thanks,
Han

> > +          DNAT/unSNAT stage but failed to convert the destination, to avoid
 > > +          loop.
 > > +        </p>
 > > +
 > >           <p>
> >             For IPv6 NAT entries, same flows are added, but using the register
 > > -          <code>xxreg0</code> for the match.
> > +          <code>xxreg0</code> and field <code>ip6</code> for the match.
 > >           </p>
 > >         </li>
 > >
> > diff --git a/tests/ovn.at <http://ovn.at> b/tests/ovn.at <http://ovn.at>
 > > index d2163d87d823..dc47efdfd934 100644
 > > --- a/tests/ovn.at <http://ovn.at>
 > > +++ b/tests/ovn.at <http://ovn.at>
> > @@ -28419,24 +28419,39 @@ wait_row_count Port_Binding 1 logical_port=lsp-cont1 chassis=$ch
 > >   OVN_CLEANUP([hv1])
 > >   AT_CLEANUP
 > >
 > > +# TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS [ DGP | GR ]
 > >   # Test dropping traffic destined to router owned IPs.
 > > -OVN_FOR_EACH_NORTHD([
 > > -AT_SETUP([gateway router drop traffic for own IPs])
 > > +m4_define([TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS], [
 > >   ovn_start
 > >
 > > -ovn-nbctl lr-add r1 -- set logical_router r1 options:chassis=hv1
 > > -ovn-nbctl ls-add s1
 > > +ovn-nbctl lr-add r1 # Gateway router or LR with DGP on the ext side
 > > +ovn-nbctl ls-add ext # simulate external LS
 > > +ovn-nbctl ls-add s2 # simulate internal LS
 > >
 > > -# Connnect r1 to s1.
> > -ovn-nbctl lrp-add r1 lrp-r1-s1 00:00:00:00:01:01 10.0.1.1/24 <http://10.0.1.1/24> > > -ovn-nbctl lsp-add s1 lsp-s1-r1 -- set Logical_Switch_Port lsp-s1-r1 type=router \
 > > -    options:router-port=lrp-r1-s1 addresses=router
 > > +# Connnect r1 to ext.
> > +ovn-nbctl lrp-add r1 lrp-r1-ext 00:00:00:00:01:01 10.0.1.1/24 <http://10.0.1.1/24>
 > > +if test X"$1" = X"DGP"; then
 > > +    ovn-nbctl lrp-set-gateway-chassis lrp-r1-ext hv1 1
 > > +else
 > > +    ovn-nbctl set logical_router r1 options:chassis=hv1
 > > +fi
> > +ovn-nbctl lsp-add ext lsp-ext-r1 -- set Logical_Switch_Port lsp-ext-r1 type=router \
 > > +    options:router-port=lrp-r1-ext addresses=router
 > >
 > > -# Create logical port p1 in s1
 > > -ovn-nbctl lsp-add s1 p1 \
 > > +# Connnect r1 to s2.
> > +ovn-nbctl lrp-add r1 lrp-r1-s2 00:00:00:00:02:01 10.0.2.1/24 <http://10.0.2.1/24> > > +ovn-nbctl lsp-add s2 lsp-s2-r1 -- set Logical_Switch_Port lsp-s2-r1 type=router \
 > > +    options:router-port=lrp-r1-s2 addresses=router
 > > +
 > > +# Create logical port p1 in ext
 > > +ovn-nbctl lsp-add ext p1 \
 > >   -- lsp-set-addresses p1 "f0:00:00:00:01:02 10.0.1.2" \
 > >   -- lsp-set-port-security p1 "f0:00:00:00:01:02 10.0.1.2"
 > >
 > > +# Create logical port p2 in s2
 > > +ovn-nbctl lsp-add s2 p2 \
 > > +-- lsp-set-addresses p2 "f0:00:00:00:02:02 10.0.2.2"
 > > +
> >   # Create two hypervisor and create OVS ports corresponding to logical ports.
 > >   net_add n1
 > >
 > > @@ -28450,6 +28465,12 @@ ovs-vsctl -- add-port br-int hv1-vif1 -- \
 > >       options:rxq_pcap=hv1/vif1-rx.pcap \
 > >       ofport-request=1
 > >
 > > +ovs-vsctl -- add-port br-int hv1-vif2 -- \
 > > +    set interface hv1-vif2 external-ids:iface-id=p2 \
 > > +    options:tx_pcap=hv1/vif2-tx.pcap \
 > > +    options:rxq_pcap=hv1/vif2-rx.pcap \
 > > +    ofport-request=2
 > > +
 > >   # Pre-populate the hypervisors' ARP tables so that we don't lose any
 > >   # packets for ARP resolution (native tunneling doesn't queue packets
 > >   # for ARP resolution).
 > > @@ -28460,9 +28481,10 @@ ovn-nbctl --wait=hv sync
 > >
> >   sw_key=$(ovn-sbctl --bare --columns tunnel_key list datapath_binding r1)
 > >
 > > +echo sw_key: $sw_key
> >   AT_CHECK([ovn-sbctl lflow-list | grep lr_in_arp_resolve | grep 10.0.1.1], [1], [])
 > >
 > > -# Send ip packets from p1 to lrp-r1-s1
 > > +# Send ip packets from p1 to lrp-r1-ext
 > >   src_mac="f00000000102"
 > >   dst_mac="000000000101"
 > >   src_ip=`ip_to_hex 10 0 1 2`
> > @@ -28481,10 +28503,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=11, n_packets=1,.*
 > >   ])
 > >
 > >   # Use the router IP as SNAT IP.
 > > -ovn-nbctl set logical_router r1 options:lb_force_snat_ip=10.0.1.1
 > > +ovn-nbctl lr-nat-add r1 snat 10.0.1.1 10.8.8.0/24 <http://10.8.8.0/24>
 > >   ovn-nbctl --wait=hv sync
 > >
 > > -# Send ip packets from p1 to lrp-r1-s1
 > > +# Send ip packets from p1 to lrp-r1-ext
 > >   src_mac="f00000000102"
 > >   dst_mac="000000000101"
 > >   src_ip=`ip_to_hex 10 0 1 2`
> > @@ -28499,11 +28521,53 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep "actions=controller" | grep
 > >   ])
 > >
 > >   # The packet should've been dropped in the lr_in_arp_resolve stage.
> > -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=1,.* priority=2,ip,metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl
 > > +if test X"$1" = X"DGP"; then
 > > +    prio=150
 > > +    inport=reg14
 > > +    outport=reg15
 > > +else
 > > +    prio=2
 > > +fi
> > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=1,.* priority=$prio,ip,$inport.*$outport.*metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl
 > >   1
 > >   ])
 > >
 > > +# Send ip packets from p2 to lrp-r1-ext
 > > +src_mac="f00000000202"
 > > +dst_mac="000000000201"
 > > +src_ip=`ip_to_hex 10 0 2 2`
 > > +dst_ip=`ip_to_hex 10 0 1 1`
> > +packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
 > > +as hv1 ovs-appctl netdev-dummy/receive hv1-vif2 $packet
 > > +
 > > +# Still no packet-ins should reach ovn-controller.
> > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep "actions=controller" | grep -v n_packets=0 -c], [1], [dnl
 > > +0
 > > +])
 > > +
 > > +if test X"$1" = X"DGP"; then
> > +    # The packet dst should be resolved once for E/W centralized NAT purpose. > > +    AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=1,.* priority=100,reg0=0xa000101,reg15=.*metadata=0x${sw_key} actions=mod_dl_dst:00:00:00:00:01:01,resubmit" -c], [0], [dnl
 > > +1
 > > +])
 > > +fi
 > > +
> > +# The packet should've been finally dropped in the lr_in_arp_resolve stage. > > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=2,.* priority=$prio,ip,$inport.*$outport.*metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl
 > > +1
 > > +])
 > >   OVN_CLEANUP([hv1])
 > > +])
 > > +
 > > +OVN_FOR_EACH_NORTHD([
 > > +AT_SETUP([gateway router drop traffic for own IPs])
 > > +TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS(GR)
 > > +AT_CLEANUP
 > > +])
 > > +
 > > +OVN_FOR_EACH_NORTHD([
 > > +AT_SETUP([distributed gateway port drop traffic for own IPs])
 > > +TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS(DGP)
 > >   AT_CLEANUP
 > >   ])
 > >
 >

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to