On 1/10/23 22:43, Han Zhou wrote:
On Mon, Jan 9, 2023 at 7:59 AM Mark Michelson <[email protected]
<mailto:[email protected]>> wrote:
>
> Hi Han, thanks for this change and thanks for putting in the work to
> debug the issue and to explain how the problem manifests.
>
> I only have a couple of small comments below.
>
> On 12/19/22 19:05, Han Zhou wrote:
> > When a packet enters LR pipeline from a distributed gateway port with
> > destination IP being a SNAT IP, it goes through the unSNAT stage and
> > it is possible that the unSNAT fails to convert the dst IP when no
> > conntrack entries are accociated with the packet. In this case, the
> > packet is rerouted to the same DGP, and results in recirc loop in
> > datapath. The packet would finally be dropped either due to ttl or
> > the recirc limit, but it would have created unnecessary cost.
> >
> > To reproduce the problem, simply configure SNAT on a LR with the
SNAT IP
> > being the DGP's IP, and then send a packet from external (DGP's LS) to
> > the SNAT IP. Kernel logs like below will be seen:
> >
> > openvswitch: ovs-system: deferred action limit reached, drop recirc
action
> >
> > DP flow dump would also show plenty of flows related to this packet,
> > each with a different ttl match, indicating the packet has been looped
> > many times.
> >
> > Commit 802f9275ac (ovn-northd: Drop IP packets destined to router owned
> > IPs (after NAT)) already added flows to drop packets failed unSNAT for
> > Gateway Routers. It added flows with a low priority (2) to drop the
> > packets that fail ARP resolve, to avoid triggering ARP request for the
> > SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve
> > flows are added for thoses NAT IPs so that the packets can continue the
> > pipeline and possibly redirect to redirect chassis. So, because of
these
> > ARP resolve flows, even the packets failed unSNAT would continue the
> > pipeline and won't hit the low priority (2) flows, thus not get
dropped.
> >
> > To fix the problem, for each of the ARP resolve flow added for the DGP
> > NAT IPs, a higher priority (150) flow is added to check if the packet's
> > inport is the DGP (same as the outport), then drop the packet directly.
> >
> > Test cases are updated to cover both Gateway Router and DGP scenarios,
> > with packets from both directions (uplink and downlink).
> >
> > Reported-by: Krzysztof Klimonda <[email protected]
<mailto:[email protected]>>
> > Reported-at:
https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ <https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/>
> > Reported-by: Frode Nordahl <[email protected]
<mailto:[email protected]>>
> > Reported-at:
https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718
<https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718>
> > Reported-by: Roberto Bartzen Acosta <[email protected]
<mailto:[email protected]>>
> > Reported-at: https://github.com/ovn-org/ovn/issues/134
<https://github.com/ovn-org/ovn/issues/134>
> > Reported-by: Syed Ammad Ali <[email protected]
<mailto:[email protected]>>
> > Reported-at: https://github.com/ovn-org/ovn/issues/153
<https://github.com/ovn-org/ovn/issues/153>
> > Reported-at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html <https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html>
> > Signed-off-by: Han Zhou <[email protected] <mailto:[email protected]>>
> > ---
> > northd/northd.c | 17 ++++++++
> > northd/ovn-northd.8.xml | 29 +++++++++----
> > tests/ovn.at <http://ovn.at> | 92
++++++++++++++++++++++++++++++++++-------
> > 3 files changed, 116 insertions(+), 22 deletions(-)
> >
> > diff --git a/northd/northd.c b/northd/northd.c
> > index 4751feab4f3c..ebd153b020fa 100644
> > --- a/northd/northd.c
> > +++ b/northd/northd.c
> > @@ -14341,6 +14341,23 @@ build_lrouter_nat_defrag_and_lb(struct
ovn_datapath *od, struct hmap *lflows,
> > sset_add(&nat_entries, nat->external_ip);
> > } else {
> > if (!sset_contains(&nat_entries, nat->external_ip)) {
> > + /* Drop packets coming in from external but still has
> > + * destination IP equals to the NAT external IP,
to avoid loop.
> > + * The packets must have went through DNAT/unSNAT
stage but
> > + * failed to convert the destination. */
> > + ds_clear(match);
> > + ds_put_format(
> > + match, "inport == %s && outport == %s &&
ip%s.dst == %s",
> > + l3dgw_port->json_key, l3dgw_port->json_key,
> > + is_v6 ? "6" : "4", nat->external_ip);
> > + ovn_lflow_add_with_hint(lflows, od,
> > + S_ROUTER_IN_ARP_RESOLVE,
> > + 150, ds_cstr(match),
> > + "drop;",
>
> With Adrian Moreno's series to sample all packet drops, we should not be
> using a bare "drop;" action in ovn-northd any more. Instead, call the
> debug_drop_action() function to fill in the proper drop action.
Hi Mark, thanks for pointing it out. I updated it in v2, but I do see
some other places in northd.c still using the "drop" instead of
debug_drop_action().
Hopefully those are all addressed by Adrian's follow-up series:
https://patchwork.ozlabs.org/project/ovn/list/?series=333362
(specifically patch 2)
>
> > + &nat->header_);
> > + /* Now for packets coming from other (downlink)
LRPs, allow ARP
> > + * resolve for the NAT IP, so that such packets can be
> > + * forwarded for E/W NAT. */
> > ds_clear(match);
> > ds_put_format(
> > match, "outport == %s && %s == %s",
> > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> > index 25a742c904c6..e06f49b26ea4 100644
> > --- a/northd/ovn-northd.8.xml
> > +++ b/northd/ovn-northd.8.xml
> > @@ -4262,13 +4262,17 @@ outport = <var>P</var>
> > For each row in the <code>NAT</code> table with IPv4
address
> > <var>A</var> in the <ref column="external_ip"
> > table="NAT" db="OVN_Northbound"/> column of
> > - <ref table="NAT" db="OVN_Northbound"/> table, a priority-100
> > - flow with the match <code>outport === <var>P</var>
&&
> > - reg0 == <var>A</var></code> has actions <code>eth.dst =
<var>E</var>;
> > - next;</code>, where <code>P</code> is the distributed
logical router
> > - port, <var>E</var> is the Ethernet address if set in the
> > - <ref column="external_mac" table="NAT"
db="OVN_Northbound"/> column
> > - of <ref table="NAT" db="OVN_Northbound"/> table for of type
> > + <ref table="NAT" db="OVN_Northbound"/> table, below two
flows are
> > + programmed:
> > + </p>
> > +
> > + <p>
> > + A priority-100 flow with the match <code>outport ==
<var>P</var>
> > + && reg0 == <var>A</var></code> has actions
<code>eth.dst =
> > + <var>E</var>; next;</code>, where <code>P</code> is the
distributed
> > + logical router port, <var>E</var> is the Ethernet
address if set in
> > + the <ref column="external_mac" table="NAT"
db="OVN_Northbound"/>
> > + column of <ref table="NAT" db="OVN_Northbound"/> table
for of type
> > <code>dnat_and_snat</code>, otherwise the Ethernet
address of the
> > distributed logical router port. Note that if the
> > <ref column="external_ip" table="NAT"
db="OVN_Northbound"/> is not
> > @@ -4278,9 +4282,18 @@ outport = <var>P</var>
> > will be added.
> > </p>
> >
> > + <p>
> > + Corresponding to the above flow, a priority-150 flow
with the match
> > + <code>inport == <var>P</var> && outport ==
<var>P</var>
> > + && ip4.dst == <var>A</var></code> has actions
> > + <code>drop;</code> to exclude packets that have went through
>
> Either "packets that went through" or "packets that have gone through"
> would be suitable here. "Have went" is improper (I'd happily accept it
> in code comments, but for documentation, we need to be more formal)
>
Sorry about this. Fixed in V2 (also fixed in comments).
PTAL:
https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ <https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/>
Thanks,
Han
> > + DNAT/unSNAT stage but failed to convert the destination,
to avoid
> > + loop.
> > + </p>
> > +
> > <p>
> > For IPv6 NAT entries, same flows are added, but using
the register
> > - <code>xxreg0</code> for the match.
> > + <code>xxreg0</code> and field <code>ip6</code> for the
match.
> > </p>
> > </li>
> >
> > diff --git a/tests/ovn.at <http://ovn.at> b/tests/ovn.at
<http://ovn.at>
> > index d2163d87d823..dc47efdfd934 100644
> > --- a/tests/ovn.at <http://ovn.at>
> > +++ b/tests/ovn.at <http://ovn.at>
> > @@ -28419,24 +28419,39 @@ wait_row_count Port_Binding 1
logical_port=lsp-cont1 chassis=$ch
> > OVN_CLEANUP([hv1])
> > AT_CLEANUP
> >
> > +# TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS [ DGP | GR ]
> > # Test dropping traffic destined to router owned IPs.
> > -OVN_FOR_EACH_NORTHD([
> > -AT_SETUP([gateway router drop traffic for own IPs])
> > +m4_define([TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS], [
> > ovn_start
> >
> > -ovn-nbctl lr-add r1 -- set logical_router r1 options:chassis=hv1
> > -ovn-nbctl ls-add s1
> > +ovn-nbctl lr-add r1 # Gateway router or LR with DGP on the ext side
> > +ovn-nbctl ls-add ext # simulate external LS
> > +ovn-nbctl ls-add s2 # simulate internal LS
> >
> > -# Connnect r1 to s1.
> > -ovn-nbctl lrp-add r1 lrp-r1-s1 00:00:00:00:01:01 10.0.1.1/24
<http://10.0.1.1/24>
> > -ovn-nbctl lsp-add s1 lsp-s1-r1 -- set Logical_Switch_Port
lsp-s1-r1 type=router \
> > - options:router-port=lrp-r1-s1 addresses=router
> > +# Connnect r1 to ext.
> > +ovn-nbctl lrp-add r1 lrp-r1-ext 00:00:00:00:01:01 10.0.1.1/24
<http://10.0.1.1/24>
> > +if test X"$1" = X"DGP"; then
> > + ovn-nbctl lrp-set-gateway-chassis lrp-r1-ext hv1 1
> > +else
> > + ovn-nbctl set logical_router r1 options:chassis=hv1
> > +fi
> > +ovn-nbctl lsp-add ext lsp-ext-r1 -- set Logical_Switch_Port
lsp-ext-r1 type=router \
> > + options:router-port=lrp-r1-ext addresses=router
> >
> > -# Create logical port p1 in s1
> > -ovn-nbctl lsp-add s1 p1 \
> > +# Connnect r1 to s2.
> > +ovn-nbctl lrp-add r1 lrp-r1-s2 00:00:00:00:02:01 10.0.2.1/24
<http://10.0.2.1/24>
> > +ovn-nbctl lsp-add s2 lsp-s2-r1 -- set Logical_Switch_Port
lsp-s2-r1 type=router \
> > + options:router-port=lrp-r1-s2 addresses=router
> > +
> > +# Create logical port p1 in ext
> > +ovn-nbctl lsp-add ext p1 \
> > -- lsp-set-addresses p1 "f0:00:00:00:01:02 10.0.1.2" \
> > -- lsp-set-port-security p1 "f0:00:00:00:01:02 10.0.1.2"
> >
> > +# Create logical port p2 in s2
> > +ovn-nbctl lsp-add s2 p2 \
> > +-- lsp-set-addresses p2 "f0:00:00:00:02:02 10.0.2.2"
> > +
> > # Create two hypervisor and create OVS ports corresponding to
logical ports.
> > net_add n1
> >
> > @@ -28450,6 +28465,12 @@ ovs-vsctl -- add-port br-int hv1-vif1 -- \
> > options:rxq_pcap=hv1/vif1-rx.pcap \
> > ofport-request=1
> >
> > +ovs-vsctl -- add-port br-int hv1-vif2 -- \
> > + set interface hv1-vif2 external-ids:iface-id=p2 \
> > + options:tx_pcap=hv1/vif2-tx.pcap \
> > + options:rxq_pcap=hv1/vif2-rx.pcap \
> > + ofport-request=2
> > +
> > # Pre-populate the hypervisors' ARP tables so that we don't lose any
> > # packets for ARP resolution (native tunneling doesn't queue packets
> > # for ARP resolution).
> > @@ -28460,9 +28481,10 @@ ovn-nbctl --wait=hv sync
> >
> > sw_key=$(ovn-sbctl --bare --columns tunnel_key list
datapath_binding r1)
> >
> > +echo sw_key: $sw_key
> > AT_CHECK([ovn-sbctl lflow-list | grep lr_in_arp_resolve | grep
10.0.1.1], [1], [])
> >
> > -# Send ip packets from p1 to lrp-r1-s1
> > +# Send ip packets from p1 to lrp-r1-ext
> > src_mac="f00000000102"
> > dst_mac="000000000101"
> > src_ip=`ip_to_hex 10 0 1 2`
> > @@ -28481,10 +28503,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
br-int | grep -E "table=11, n_packets=1,.*
> > ])
> >
> > # Use the router IP as SNAT IP.
> > -ovn-nbctl set logical_router r1 options:lb_force_snat_ip=10.0.1.1
> > +ovn-nbctl lr-nat-add r1 snat 10.0.1.1 10.8.8.0/24 <http://10.8.8.0/24>
> > ovn-nbctl --wait=hv sync
> >
> > -# Send ip packets from p1 to lrp-r1-s1
> > +# Send ip packets from p1 to lrp-r1-ext
> > src_mac="f00000000102"
> > dst_mac="000000000101"
> > src_ip=`ip_to_hex 10 0 1 2`
> > @@ -28499,11 +28521,53 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
br-int | grep "actions=controller" | grep
> > ])
> >
> > # The packet should've been dropped in the lr_in_arp_resolve stage.
> > -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25,
n_packets=1,.* priority=2,ip,metadata=0x${sw_key},nw_dst=10.0.1.1
actions=drop" -c], [0], [dnl
> > +if test X"$1" = X"DGP"; then
> > + prio=150
> > + inport=reg14
> > + outport=reg15
> > +else
> > + prio=2
> > +fi
> > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25,
n_packets=1,.*
priority=$prio,ip,$inport.*$outport.*metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl
> > 1
> > ])
> >
> > +# Send ip packets from p2 to lrp-r1-ext
> > +src_mac="f00000000202"
> > +dst_mac="000000000201"
> > +src_ip=`ip_to_hex 10 0 2 2`
> > +dst_ip=`ip_to_hex 10 0 1 1`
> >
+packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
> > +as hv1 ovs-appctl netdev-dummy/receive hv1-vif2 $packet
> > +
> > +# Still no packet-ins should reach ovn-controller.
> > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep
"actions=controller" | grep -v n_packets=0 -c], [1], [dnl
> > +0
> > +])
> > +
> > +if test X"$1" = X"DGP"; then
> > + # The packet dst should be resolved once for E/W centralized
NAT purpose.
> > + AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E
"table=25, n_packets=1,.*
priority=100,reg0=0xa000101,reg15=.*metadata=0x${sw_key}
actions=mod_dl_dst:00:00:00:00:01:01,resubmit" -c], [0], [dnl
> > +1
> > +])
> > +fi
> > +
> > +# The packet should've been finally dropped in the
lr_in_arp_resolve stage.
> > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25,
n_packets=2,.*
priority=$prio,ip,$inport.*$outport.*metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl
> > +1
> > +])
> > OVN_CLEANUP([hv1])
> > +])
> > +
> > +OVN_FOR_EACH_NORTHD([
> > +AT_SETUP([gateway router drop traffic for own IPs])
> > +TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS(GR)
> > +AT_CLEANUP
> > +])
> > +
> > +OVN_FOR_EACH_NORTHD([
> > +AT_SETUP([distributed gateway port drop traffic for own IPs])
> > +TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS(DGP)
> > AT_CLEANUP
> > ])
> >
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev