On 5/19/20 6:42 PM, Lorenzo Bianconi wrote:
> In order to fix the issues introduced by commit
> c0bf32d72f8b ("Manage ARP process locally in a DVR scenario "), restore
> previous configuration of table 9 in ingress router pipeline and
> introduce a new stage called 'ip_src_policy' used to set the src address
> info in order to not distribute FIP traffic if DVR is enabled
> 
> Fixes: c0bf32d72f8b ("Manage ARP process locally in a DVR scenario ")
> Tested-by: Jakub Libosvar <[email protected]>
> Signed-off-by: Lorenzo Bianconi <[email protected]>

Hi Lorenzo,

As I mentioned on the RFC patch, if there is a way that we can fix this
issue without the need for a new pipeline stage I think that would be
preferable. I don't see an easy way to do that but maybe Han and others
have suggestions.

> ---
> Changes since RFC:
> - added unit-tests
> ---
>  northd/ovn-northd.8.xml | 65 ++++++++++++++++++++---------------------
>  northd/ovn-northd.c     | 38 ++++++++++--------------
>  tests/ovn.at            | 28 +++++-------------
>  tests/system-ovn.at     | 28 ++++++++++++++++++
>  4 files changed, 82 insertions(+), 77 deletions(-)
> 
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index 8f224b07f..09dbb52b4 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -2484,37 +2484,6 @@ output;
>          </p>
>        </li>
>  
> -      <li>
> -        <p>
> -          For distributed logical routers where one of the logical router 
> ports
> -          specifies a <code>redirect-chassis</code>, a priority-400 logical
> -          flow for each <code>dnat_and_snat</code> NAT rules configured.
> -          These flows will allow to properly forward traffic to the external
> -          connections if available and avoid sending it through the tunnel.
> -          Assuming the following NAT rule has been configured:
> -        </p>
> -
> -        <pre>
> -external_ip = <var>A</var>;
> -external_mac = <var>B</var>;
> -logical_ip = <var>C</var>;
> -        </pre>
> -
> -        <p>
> -          the following action will be applied:
> -        </p>
> -
> -        <pre>
> -ip.ttl--;
> -reg0 = <var>ip.dst</var>;
> -reg1 = <var>A</var>;
> -eth.src = <var>B</var>;
> -outport = <var>router-port</var>;
> -next;
> -        </pre>
> -
> -      </li>
> -
>        <li>
>          <p>
>            IPv4 routing table.  For each route to IPv4 network <var>N</var> 
> with
> @@ -2660,7 +2629,35 @@ outport = <var>P</var>;
>        </li>
>      </ul>
>  
> -    <h3>Ingress Table 12: ARP/ND Resolution</h3>
> +    <h3>Ingress Table 12: IP Source Policy</h3>
> +
> +    <p>
> +      This table contains for distributed logical routers where one of
> +      the logical router ports specifies a <code>redirect-chassis</code>,
> +      a priority-100 logical flow for each <code>dnat_and_snat</code>
> +      NAT rules configured.
> +      These flows will allow to properly forward traffic to the external
> +      connections if available and avoid sending it through the tunnel.
> +      Assuming the following NAT rule has been configured:
> +    </p>
> +
> +    <pre>
> +external_ip = <var>A</var>;
> +external_mac = <var>B</var>;
> +logical_ip = <var>C</var>;
> +    </pre>
> +
> +    <p>
> +      the following action will be applied:
> +    </p>
> +
> +    <pre>
> +reg1 = <var>A</var>;
> +eth.src = <var>B</var>;
> +next;
> +    </pre>
> +

This section of the man page will show what action we perform on
traffic, i.e., move external_ip in reg1 (for SNAT) and move external_mac
in eth.src but there's no mention about what kind of traffic will match
these flows, that is: "ip.src == logical_ip &&
is_chassis_resident(logical_port)".

> +    <h3>Ingress Table 13: ARP/ND Resolution</h3>
>  
>      <p>
>        Any packet that reaches this table is an IP packet whose next-hop
> @@ -2819,7 +2816,7 @@ outport = <var>P</var>;
>  
>      </ul>
>  
> -    <h3>Ingress Table 13: Check packet length</h3>
> +    <h3>Ingress Table 14: Check packet length</h3>
>  
>      <p>
>        For distributed logical routers with distributed gateway port 
> configured
> @@ -2849,7 +2846,7 @@ REGBIT_PKT_LARGER = check_pkt_larger(<var>L</var>); 
> next;
>        and advances to the next table.
>      </p>
>  
> -    <h3>Ingress Table 14: Handle larger packets</h3>
> +    <h3>Ingress Table 15: Handle larger packets</h3>
>  

Indices for tables "Gateway Redirect" and "ARP Request" should be
updated too.

>      <p>
>        For distributed logical routers with distributed gateway port 
> configured
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 3c0070ea7..d5f3997a9 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -175,11 +175,12 @@ enum ovn_stage {
>      PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,      9, "lr_in_ip_routing")   \
>      PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING_ECMP, 10, 
> "lr_in_ip_routing_ecmp") \
>      PIPELINE_STAGE(ROUTER, IN,  POLICY,          11, "lr_in_policy")       \
> -    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE,     12, "lr_in_arp_resolve")  \
> -    PIPELINE_STAGE(ROUTER, IN,  CHK_PKT_LEN   ,  13, "lr_in_chk_pkt_len")   \
> -    PIPELINE_STAGE(ROUTER, IN,  LARGER_PKTS,     14,"lr_in_larger_pkts")   \
> -    PIPELINE_STAGE(ROUTER, IN,  GW_REDIRECT,     15, "lr_in_gw_redirect")  \
> -    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST,     16, "lr_in_arp_request")  \
> +    PIPELINE_STAGE(ROUTER, IN,  IP_SRC_POLICY,   12, "lr_in_ip_src_policy") \
> +    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE,     13, "lr_in_arp_resolve")  \
> +    PIPELINE_STAGE(ROUTER, IN,  CHK_PKT_LEN   ,  14, "lr_in_chk_pkt_len")   \
> +    PIPELINE_STAGE(ROUTER, IN,  LARGER_PKTS,     15,"lr_in_larger_pkts")   \
> +    PIPELINE_STAGE(ROUTER, IN,  GW_REDIRECT,     16, "lr_in_gw_redirect")  \
> +    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST,     17, "lr_in_arp_request")  \
>                                                                        \
>      /* Logical router egress stages. */                               \
>      PIPELINE_STAGE(ROUTER, OUT, UNDNAT,    0, "lr_out_undnat")        \
> @@ -7103,8 +7104,6 @@ build_routing_policy_flow(struct hmap *lflows, struct 
> ovn_datapath *od,
>      ds_destroy(&actions);
>  }
>  
> -/* default logical flow prioriry for distributed routes */
> -#define DROUTE_PRIO 400
>  struct parsed_route {
>      struct ovs_list list_node;
>      struct v46_ip prefix;
> @@ -7493,7 +7492,7 @@ build_ecmp_route_flow(struct hmap *lflows, struct 
> ovn_datapath *od,
>  }
>  
>  static void
> -add_distributed_routes(struct hmap *lflows, struct ovn_datapath *od)
> +add_ip_src_policy_flows(struct hmap *lflows, struct ovn_datapath *od)
>  {
>      struct ds actions = DS_EMPTY_INITIALIZER;
>      struct ds match = DS_EMPTY_INITIALIZER;
> @@ -7511,12 +7510,9 @@ add_distributed_routes(struct hmap *lflows, struct 
> ovn_datapath *od)
>                        is_ipv4 ? "4" : "6", nat->logical_ip,
>                        nat->logical_port);
>          char *prefix = is_ipv4 ? "" : "xx";
> -        ds_put_format(&actions, "outport = %s; eth.src = %s; "
> -                      "%sreg0 = ip%s.dst; %sreg1 = %s; next;",
> -                      od->l3dgw_port->json_key, nat->external_mac,
> -                      prefix, is_ipv4 ? "4" : "6",
> -                      prefix, nat->external_ip);
> -        ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING, DROUTE_PRIO,
> +        ds_put_format(&actions, "eth.src = %s; %sreg1 = %s; next;",
> +                      nat->external_mac, prefix, nat->external_ip);
> +        ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_SRC_POLICY, 100,
>                        ds_cstr(&match), ds_cstr(&actions));
>          ds_clear(&match);
>          ds_clear(&actions);
> @@ -7547,12 +7543,6 @@ add_route(struct hmap *lflows, const struct ovn_port 
> *op,
>      }
>      build_route_match(op_inport, network_s, plen, is_src_route, is_ipv4,
>                        &match, &priority);
> -    /* traffic for internal IPs of logical switch ports must be sent to
> -     * the gw controller through the overlay tunnels
> -     */
> -    if (op->nbrp && !op->nbrp->n_gateway_chassis) {
> -        priority += DROUTE_PRIO;
> -    }
>  
>      struct ds actions = DS_EMPTY_INITIALIZER;
>      ds_put_format(&actions, "ip.ttl--; "REG_ECMP_GROUP_ID" = 0; %sreg0 = ",
> @@ -9519,9 +9509,13 @@ build_lrouter_flows(struct hmap *datapaths, struct 
> hmap *ports,
>       * logical router
>       */
>      HMAP_FOR_EACH (od, key_node, datapaths) {
> -        if (od->nbr && od->l3dgw_port) {
> -            add_distributed_routes(lflows, od);
> +        if (!od->nbr) {
> +            continue;
> +        }
> +        if (od->l3dgw_port) {
> +            add_ip_src_policy_flows(lflows, od);
>          }
> +        ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_SRC_POLICY, 0, "1", 
> "next;");
>      }
>  
>      /* Logical router ingress table IP_ROUTING & IP_ROUTING_ECMP: IP Routing.
> diff --git a/tests/ovn.at b/tests/ovn.at
> index f39fda2e4..fcc34fd5d 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -9637,20 +9637,6 @@ AT_CHECK([as hv3 ovs-vsctl set Open_vSwitch . 
> external-ids:ovn-bridge-mappings=p
>  OVS_WAIT_UNTIL([test 1 = `as hv3 ovs-vsctl show | \
>  grep "Port patch-br-int-to-ln_port" | wc -l`])
>  
> -AT_CHECK([test 1 = `ovn-sbctl dump-flows lr0 | grep lr_in_ip_routing | \
> -grep "ip4.src == 10.0.0.3 && is_chassis_resident(\"foo1\")" -c`])
> -AT_CHECK([test 1 = `ovn-sbctl dump-flows lr0 | grep lr_in_ip_routing | \
> -grep "ip4.src == 10.0.0.4 && is_chassis_resident(\"foo2\")" -c`])
> -
> -key=`ovn-sbctl --bare --columns tunnel_key list datapath_Binding lr0`
> -# Check that the OVS flows appear for the dnat_and_snat entries in
> -# lr_in_ip_routing table.
> -OVS_WAIT_UNTIL([test 1 = `as hv3 ovs-ofctl dump-flows br-int table=17 | \
> -grep "priority=400,ip,metadata=0x$key,nw_src=10.0.0.3" -c`])
> -
> -OVS_WAIT_UNTIL([test 1 = `as hv3 ovs-ofctl dump-flows br-int table=17 | \
> -grep "priority=400,ip,metadata=0x$key,nw_src=10.0.0.4" -c`])
> -
>  # Re-add nat-addresses option
>  ovn-nbctl lsp-set-options lrp0-rp router-port=lrp0 nat-addresses="router"
>  
> @@ -15141,7 +15127,7 @@ ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | 
> grep "reg0 == 10.0.0.10" \
>  # Since the sw0-vir is not claimed by any chassis, eth.dst should be set to
>  # zero if the ip4.dst is the virtual ip in the router pipeline.
>  AT_CHECK([cat lflows.txt], [0], [dnl
> -  table=12(lr_in_arp_resolve  ), priority=100  , match=(outport == "lr0-sw0" 
> && reg0 == 10.0.0.10), action=(eth.dst = 00:00:00:00:00:00; next;)
> +  table=13(lr_in_arp_resolve  ), priority=100  , match=(outport == "lr0-sw0" 
> && reg0 == 10.0.0.10), action=(eth.dst = 00:00:00:00:00:00; next;)
>  ])
>  
>  ip_to_hex() {
> @@ -15192,7 +15178,7 @@ ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | 
> grep "reg0 == 10.0.0.10" \
>  # There should be an arp resolve flow to resolve the virtual_ip with the
>  # sw0-p1's MAC.
>  AT_CHECK([cat lflows.txt], [0], [dnl
> -  table=12(lr_in_arp_resolve  ), priority=100  , match=(outport == "lr0-sw0" 
> && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:03; next;)
> +  table=13(lr_in_arp_resolve  ), priority=100  , match=(outport == "lr0-sw0" 
> && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:03; next;)
>  ])
>  
>  # Forcibly clear virtual_parent. ovn-controller should release the binding
> @@ -15233,7 +15219,7 @@ ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | 
> grep "reg0 == 10.0.0.10" \
>  # There should be an arp resolve flow to resolve the virtual_ip with the
>  # sw0-p2's MAC.
>  AT_CHECK([cat lflows.txt], [0], [dnl
> -  table=12(lr_in_arp_resolve  ), priority=100  , match=(outport == "lr0-sw0" 
> && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:05; next;)
> +  table=13(lr_in_arp_resolve  ), priority=100  , match=(outport == "lr0-sw0" 
> && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:05; next;)
>  ])
>  
>  # send the garp from sw0-p2 (in hv2). hv2 should claim sw0-vir
> @@ -15256,7 +15242,7 @@ ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | 
> grep "reg0 == 10.0.0.10" \
>  # There should be an arp resolve flow to resolve the virtual_ip with the
>  # sw0-p3's MAC.
>  AT_CHECK([cat lflows.txt], [0], [dnl
> -  table=12(lr_in_arp_resolve  ), priority=100  , match=(outport == "lr0-sw0" 
> && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:04; next;)
> +  table=13(lr_in_arp_resolve  ), priority=100  , match=(outport == "lr0-sw0" 
> && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:04; next;)
>  ])
>  
>  # Now send arp reply from sw0-p1. hv1 should claim sw0-vir
> @@ -15277,7 +15263,7 @@ ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | 
> grep "reg0 == 10.0.0.10" \
>  > lflows.txt
>  
>  AT_CHECK([cat lflows.txt], [0], [dnl
> -  table=12(lr_in_arp_resolve  ), priority=100  , match=(outport == "lr0-sw0" 
> && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:03; next;)
> +  table=13(lr_in_arp_resolve  ), priority=100  , match=(outport == "lr0-sw0" 
> && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:03; next;)
>  ])
>  
>  # Delete hv1-vif1 port. hv1 should release sw0-vir
> @@ -15295,7 +15281,7 @@ ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | 
> grep "reg0 == 10.0.0.10" \
>  > lflows.txt
>  
>  AT_CHECK([cat lflows.txt], [0], [dnl
> -  table=12(lr_in_arp_resolve  ), priority=100  , match=(outport == "lr0-sw0" 
> && reg0 == 10.0.0.10), action=(eth.dst = 00:00:00:00:00:00; next;)
> +  table=13(lr_in_arp_resolve  ), priority=100  , match=(outport == "lr0-sw0" 
> && reg0 == 10.0.0.10), action=(eth.dst = 00:00:00:00:00:00; next;)
>  ])
>  
>  # Now send arp reply from sw0-p2. hv2 should claim sw0-vir
> @@ -15316,7 +15302,7 @@ ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | 
> grep "reg0 == 10.0.0.10" \
>  > lflows.txt
>  
>  AT_CHECK([cat lflows.txt], [0], [dnl
> -  table=12(lr_in_arp_resolve  ), priority=100  , match=(outport == "lr0-sw0" 
> && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:04; next;)
> +  table=13(lr_in_arp_resolve  ), priority=100  , match=(outport == "lr0-sw0" 
> && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:04; next;)
>  ])
>  
>  # Delete sw0-p2 logical port
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at

I think it's preferable to have the unit tests in ovn.at whenever
possible so that they get run regularly.

> index 9ae6c6b1f..1e4f147b4 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at
> @@ -2747,6 +2747,17 @@ ADD_VETH(alice1, alice1, br-int, "172.16.1.2/24", 
> "f0:00:00:01:02:05", \
>  ovn-nbctl lsp-add alice alice1 \
>  -- lsp-set-addresses alice1 "f0:00:00:01:02:05 172.16.1.2"
>  
> +# Add external network
> +ADD_NAMESPACES(ext-net)
> +ip link add alice-ext netns alice1 type veth peer name ext-veth netns ext-net
> +ip -n ext-net link set dev ext-veth up
> +ip -n ext-net addr add 10.0.0.1/24 dev ext-veth
> +ip -n ext-net route add default via 10.0.0.2
> +
> +ip -n alice1 link set dev alice-ext up
> +ip -n alice1 addr add 10.0.0.2/24 dev alice-ext
> +ip netns exec alice1 sysctl -w net.ipv4.conf.all.forwarding=1
> +
>  # Add DNAT rules
>  AT_CHECK([ovn-nbctl lr-nat-add R1 dnat_and_snat 172.16.1.3 192.168.1.2 foo1 
> 00:00:02:02:03:04])
>  AT_CHECK([ovn-nbctl lr-nat-add R1 dnat_and_snat 172.16.1.4 192.168.1.3 foo2 
> 00:00:02:02:03:05])
> @@ -2754,6 +2765,9 @@ AT_CHECK([ovn-nbctl lr-nat-add R1 dnat_and_snat 
> 172.16.1.4 192.168.1.3 foo2 00:0
>  # Add a SNAT rule
>  AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.1 192.168.0.0/16])
>  
> +# Add default route to ext-net
> +AT_CHECK([ovn-nbctl lr-route-add R1 10.0.0.0/24 172.16.1.2])
> +
>  ovn-nbctl --wait=hv sync
>  OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep 'nat(src=172.16.1.1)'])
>  
> @@ -2776,6 +2790,20 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2 
> 172.16.1.2 | FORMAT_PING], \
>  3 packets transmitted, 3 received, 0% packet loss, time 0ms
>  ])
>  
> +# Try to ping external network
> +NS_CHECK_EXEC([ext-net], [tcpdump -n -c 3 -i ext-veth dst 172.16.1.3 and 
> icmp > ext-net.pcap &])
> +sleep 1
> +AT_CHECK([ovn-nbctl lr-nat-del R1 snat])
> +NS_CHECK_EXEC([foo1], [ping -q -c 3 -i 0.3 -w 2 10.0.0.1 | FORMAT_PING], \
> +[0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_WAIT_UNTIL([
> +    total_pkts=$(cat ext-net.pcap | wc -l)
> +    test "${total_pkts}" = "3"
> +])
> +
>  # We verify that SNAT indeed happened via 'dump-conntrack' command.
>  AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.1.1) | \
>  sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> 

The system test fails on my test machine (Fedora 32 with OVS master and
OVN master + your patch applied):

make check-system-userspace TESTSUITEFLAGS="17"
[...]
 17: ovn -- DNAT and SNAT on distributed router - N/S FAILED
(system-ovn.at:2823)

[...]
system-ovn.at:2802: wait succeeded after 1 seconds
./system-ovn.at:2808: ovs-appctl dpctl/dump-conntrack | grep
"dst=172.16.1.1" | sed -e 's/port=[0-9]*/port=<cleared>/g' -e
's/id=[0-9]*/id=<cleared>/g' -e 's/state=[0-9_A-Z]*/state=<cleared>/g' |
sort | uniq | \
sed -e 's/zone=[0-9]*/zone=<cleared>/'
./system-ovn.at:2813: ovs-appctl dpctl/flush-conntrack
./system-ovn.at:2817: ip netns exec bar1 sh << NS_EXEC_HEREDOC ping -q
-c 3 -i 0.3 -w 2 172.16.1.2 | grep "transmitted" | sed 's/time.*ms$/time
0ms/'
NS_EXEC_HEREDOC
./system-ovn.at:2823: ovs-appctl dpctl/dump-conntrack | grep
"dst=172.16.1.1" | sed -e 's/port=[0-9]*/port=<cleared>/g' -e
's/id=[0-9]*/id=<cleared>/g' -e 's/state=[0-9_A-Z]*/state=<cleared>/g' |
sort | uniq | \
sed -e 's/zone=[0-9]*/zone=<cleared>/'
--- -   2020-05-20 10:28:50.053113201 -0400
+++ /root/ovn/tests/system-userspace-testsuite.dir/at-groups/17/stdout
2020-05-20 10:28:50.050668409 -0400
@@ -1,2 +1 @@
-icmp,orig=(src=192.168.2.2,dst=172.16.1.2,id=<cleared>,type=8,code=0),reply=(src=172.16.1.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared>
[...]

Same happens when running the system test with the kernel datapath:
make check-kernel TESTSUITEFLAGS="17"
[...]
17: ovn -- DNAT and SNAT on distributed router - N/S FAILED
(system-ovn.at:2823)

Regards,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to