On Mon, Mar 22, 2021 at 2:16 AM Krzysztof Klimonda <
kklimo...@syntaxhighlighted.com> wrote:
>
> If there are snat entries on the router, and some logical_ip are set to
> network instead of an IP address then given SNAT is masquerade. In such
> case ct_snat action is used in lr_in_unsnat table to ensure that the
> packet is matched against conntrack and destination IP is replaced with
> one from matching conntrack entry.
>
> This however breaks down for new connections sent to router's external IP
> address. In such case, when packet is checked against conntrack table,
> there is no match, and its destination IP remains unchanged. This causes a
> loop in lr_in_ip_routing.
>
> This commit installs a new logical flow in lr_in_ip_routing table for
> routers that have SNAT entry with logical_ip set to network (that being
> masquerade). This flow drops packets that, after going through conntrack
> via ct_snat action in lr_in_unsnat table, are not in established or
> related state (!ct.est && !ct.rel) and which destination IP still matches
> router's external IP. This prevents vswitchd from looping such packets
> until their TTL reaches zero, as well as installing bogus flows in
> datapath that lead to ovs module dropping such packages with "deferred
> action limit reached, drop recirc action" message.
>
> Signed-off-by: Krzysztof Klimonda <kklimo...@syntaxhighlighted.com>

Thanks for the patch. As mentioned in the discussion ML could you post more
details on how the loop happens?
Please see some more comments below.

> ---
>  northd/ovn-northd.c |  57 +++++++++++++++++++++++
>  tests/ovn.at        |  45 ++++++++++++++++++
>  tests/system-ovn.at | 108 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 210 insertions(+)
>
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 4783e43d7..ea7db3d47 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -11304,6 +11304,7 @@ build_lrouter_nat_defrag_and_lb(struct
ovn_datapath *od,
>          ovs_be32 ip, mask;
>          struct in6_addr ipv6, mask_v6, v6_exact = IN6ADDR_EXACT_INIT;
>          bool is_v6 = false;
> +        bool is_masquerade = false;
>          bool stateless = lrouter_nat_is_stateless(nat);
>          struct nbrec_address_set *allowed_ext_ips =
>                                    nat->allowed_ext_ips;
> @@ -11343,9 +11344,15 @@ build_lrouter_nat_defrag_and_lb(struct
ovn_datapath *od,
>          if (is_v6) {
>              error = ipv6_parse_masked(nat->logical_ip, &ipv6, &mask_v6);
>              cidr_bits = ipv6_count_cidr_bits(&mask_v6);
> +            if (cidr_bits < 128) {
> +                is_masquerade = true;
> +            }
>          } else {
>              error = ip_parse_masked(nat->logical_ip, &ip, &mask);
>              cidr_bits = ip_count_cidr_bits(mask);
> +            if (cidr_bits < 32) {
> +                is_masquerade = 32;

Typo? s/32/true

> +            }
>          }
>          if (!strcmp(nat->type, "snat")) {
>              if (error) {
> @@ -11396,6 +11403,56 @@ build_lrouter_nat_defrag_and_lb(struct
ovn_datapath *od,
>          build_lrouter_in_dnat_flow(lflows, od, nat, match, actions,
distributed,
>                                     mask, is_v6);
>
> +        /* When router have SNAT enabled, and logical_ip is a network
(router
> +         * is doing masquerade), then we need to make sure that packets
> +         * unrelated to any established connection that still have
router's
> +         * external IP as a next hop after going through lr_in_unsnat
table
> +         * are dropped properly. Otherwise such packets will loop around
> +         * between tables until their ttl reaches zero - this
additionally
> +         * causes kernel module to drop such packages due to
recirculation
> +         * limit being exceeded.
> +         *
> +         * Install a logical flow in lr_in_ip_routing table that will
> +         * match packet with router's external IP that have no related
> +         * conntrack entries and drop them. Flow priority must be higher
> +         * than any other flow in lr_in_ip_routing that matches router's
> +         * external IP.
> +         *
> +         * The priority for destination routes is calculated as
> +         * (prefix length * 2) + 1, and there is an additional flow
> +         * for when BFD is in play with priority + 1. Set priority that
> +         * is higher than any other potential routing flow for that
> +         * network, that is (prefix length * 2) + offset, where offset
> +         * is 1 (dst) + 1 (bfd) + 1. */
> +        if (is_masquerade) {
> +            uint16_t priority, prefix_length, offset;
> +
> +            if (is_v6) {
> +                prefix_length = 128;
> +            } else {
> +                prefix_length = 32;
> +            }
> +            offset = 3;
> +            priority = (prefix_length * 2) + offset;
> +
> +            ds_clear(match);
> +
> +            if (is_v6) {
> +                ds_put_format(match,
> +                              "ct.new && ip6.dst == %s",
> +                              nat->external_ip);
> +            } else {
> +                ds_put_format(match,
> +                              "ct.new && ip4.dst == %s",
> +                              nat->external_ip);
> +            }
> +
In the current upstream code UNSNAT and CT is combined into the same stage.
So if CT entry is matched it will perform UNSNAT directly. In that case I
think we don't need to match ct.new or any CT state here, because if there
is CT entry matched it means the ip.dst is UNSNATted already and it won't
match the nat->external_ip.

Thanks,
Han

> +            ovn_lflow_add_unique_with_hint(lflows, od,
> +                                           S_ROUTER_IN_IP_ROUTING,
priority,
> +                                           ds_cstr(match), "drop;",
> +                                           &nat->header_);
> +        }
> +
>          /* ARP resolve for NAT IPs. */
>          if (od->l3dgw_port) {
>              if (!strcmp(nat->type, "snat")) {
> diff --git a/tests/ovn.at b/tests/ovn.at
> index b751d6db2..b40eaeaf6 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -23220,6 +23220,51 @@ OVN_CLEANUP([hv1])
>  AT_CLEANUP
>  ])
>
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([ovn -- SNAT handling of unrelated packets with masquerade"])
> +ovn_start
> +
> +ovn-nbctl lr-add R1
> +
> +ovn-nbctl ls-add sw0
> +ovn-nbctl ls-add public
> +
> +ovn-nbctl lsp-add sw0 sw0-port1 \
> +          -- lsp-set-addresses sw0-port1 "00:00:00:01:02:03 192.168.1.3"
> +
> +ovn-nbctl lrp-add R1 rp-sw0 00:00:01:01:02:03 192.168.1.1/24
> +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.254/24 \
> +    -- lrp-set-gateway-chassis rp-public hv1
> +
> +ovn-nbctl lsp-add sw0 sw0-rp -- set Logical_Switch_Port sw0-rp \
> +    type=router options:router-port=rp-sw0 \
> +    -- lsp-set-addresses sw0-rp router
> +
> +ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port public-rp \
> +    type=router options:router-port=rp-public \
> +    -- lsp-set-addresses public-rp router
> +
> +AT_CHECK([ovn-nbctl lr-nat-add R1 dnat_and_snat 172.16.1.2 192.168.1.3
sw0-port1 00:00:00:01:02:03])
> +AT_CHECK([ovn-nbctl lr-nat-add R1 dnat 172.16.1.3 192.168.1.4])
> +AT_CHECK([check ovn-nbctl --wait=sb sync])
> +
> +# Before we add second NAT (masquerade) there should be no flows
> +# in lr_in_ip_routing that match against conntrack state.
> +AT_CHECK([ovn-sbctl lflow-list | grep -E "lr_in_ip_routing.*ct.new"],
[1], [])
> +
> +ovn-nbctl lr-nat-add R1 snat 172.16.1.254 192.168.1.0/24
> +AT_CHECK([check ovn-nbctl --wait=sb sync])
> +
> +AT_CHECK([ovn-sbctl lflow-list | grep -E
"lr_in_ip_routing.*ct.new.*drop"],
> +[0], [dnl
> +  table=10(lr_in_ip_routing   ), priority=67   dnl
> +, match=(ct.new && ip4.dst == 172.16.1.254), dnl
> +action=(drop;)
> +])
> +
> +AT_CLEANUP
> +])
> +
>  OVN_FOR_EACH_NORTHD([
>  AT_SETUP([ovn -- ARP replies for SNAT external ips])
>  ovn_start
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> index 4885303d1..3edd993cd 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at
> @@ -5600,6 +5600,114 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port
patch-.*/d
>  AT_CLEANUP
>  ])
>
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([ovn -- SNAT handling of unrelated packets with masquerade"])
> +AT_SKIP_IF([test $HAVE_NC = no])
> +
> +ovn_start
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_BR([br-int])
> +ADD_BR([br-ext])
> +
> +ovs-ofctl add-flow br-ext action=normal
> +# Set external-ids in br-int needed for ovn-controller
> +ovs-vsctl \
> +        -- set Open_vSwitch . external-ids:system-id=hv1 \
> +        -- set Open_vSwitch .
external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
> +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
> +        -- set bridge br-int fail-mode=secure
other-config:disable-in-band=true
> +
> +# Start ovn-controller
> +start_daemon ovn-controller
> +
> +ovn-nbctl lr-add R1
> +
> +ovn-nbctl ls-add sw0
> +ovn-nbctl ls-add public
> +
> +ovn-nbctl lrp-add R1 rp-sw0 00:00:01:01:02:03 192.168.1.1/24
> +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.254/24 \
> +    -- lrp-set-gateway-chassis rp-public hv1
> +
> +ovn-nbctl lsp-add sw0 sw0-rp -- set Logical_Switch_Port sw0-rp \
> +    type=router options:router-port=rp-sw0 \
> +    -- lsp-set-addresses sw0-rp router
> +
> +ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port public-rp \
> +    type=router options:router-port=rp-public \
> +    -- lsp-set-addresses public-rp router
> +
> +ovn-nbctl lr-nat-add R1 snat 172.16.1.254 192.168.1.0/24
> +
> +ADD_NAMESPACES(sw01-x)
> +ADD_VETH(sw01-x, sw01-x, br-int, "192.168.1.2/24", "f0:00:00:01:02:03", \
> +         "192.168.1.1")
> +ovn-nbctl lsp-add sw0 sw01-x \
> +    -- lsp-set-addresses sw01-x "f0:00:00:01:02:03 192.168.1.2"
> +
> +OVS_WAIT_UNTIL([test x$(ovn-nbctl lsp-get-up sw01-x) = xup])
> +
> +ADD_NAMESPACES(ext-foo)
> +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24",
"00:10:10:01:02:13", \
> +         "172.16.1.254")
> +
> +OVS_WAIT_UNTIL([test "$(ip netns exec ext-foo ip a | grep 172.16 | grep
tentative)" = ""])
> +
> +AT_CHECK([ovs-vsctl set Open_vSwitch .
external-ids:ovn-bridge-mappings=phynet:br-ext])
> +ovn-nbctl lsp-add public public1 \
> +        -- lsp-set-addresses public1 unknown \
> +        -- lsp-set-type public1 localnet \
> +        -- lsp-set-options public1 network_name=phynet
> +
> +ovn-nbctl --wait=hv sync
> +
> +# Send ping from ext-foo to router's external IP to verify that incoming
ICMP is working
> +NS_CHECK_EXEC([ext-foo], [ping -q -c 3 -i 0.3 -w 2 172.16.1.254 |
FORMAT_PING], \
> +[0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +# Verify that we can establish an outgoing connection from host behind
> +# SNAT to one on the external network.
> +NS_CHECK_EXEC([ext-foo], [echo response | nc -l 80 &], [], [])
> +
> +NS_CHECK_EXEC([sw01-x], [nc -w 1 172.16.1.100 80], [0], [dnl
> +response
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> +
> +# Check that incoming packets that are not part of any established
> +# connection are dropped.
> +NS_CHECK_EXEC([ext-foo], [nc -w 1 172.16.1.254 80], [1], [])
> +
> +# There shouldn't be any entry in conntrack table for that flow.
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack], [0], [])
> +
> +# vswitchd should not complaing about recirculation depth being exceeded,
> +# which would indicate that packet is not properly dropped, but instead
> +# loops between vswitchd flow tables until its ttl is down to 0.
> +AT_CHECK([grep -qE 'Packet dropped. Max recirculation depth exceeded.'
ovs-vswitchd.log], [1])
> +
> +OVS_APP_EXIT_AND_WAIT([ovn-controller])
> +
> +as ovn-sb
> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> +
> +as ovn-nb
> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> +
> +as northd
> +OVS_APP_EXIT_AND_WAIT([ovn-northd])
> +
> +as
> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d
> +/.*terminating with signal 15.*/d"])
> +AT_CLEANUP
> +])
> +
>  OVN_FOR_EACH_NORTHD([
>  AT_SETUP([ovn -- ARP resolution for SNAT IP])
>  ovn_start
> --
> 2.28.0
>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to