On Wed, May 20, 2020 at 7:38 AM Dumitru Ceara <dce...@redhat.com> wrote: > > On 5/19/20 6:42 PM, Lorenzo Bianconi wrote: > > In order to fix the issues introduced by commit > > c0bf32d72f8b ("Manage ARP process locally in a DVR scenario "), restore > > previous configuration of table 9 in ingress router pipeline and > > introduce a new stage called 'ip_src_policy' used to set the src address > > info in order to not distribute FIP traffic if DVR is enabled > > > > Fixes: c0bf32d72f8b ("Manage ARP process locally in a DVR scenario ") > > Tested-by: Jakub Libosvar <jlibo...@redhat.com> > > Signed-off-by: Lorenzo Bianconi <lorenzo.bianc...@redhat.com> > > Hi Lorenzo, > > As I mentioned on the RFC patch, if there is a way that we can fix this > issue without the need for a new pipeline stage I think that would be > preferable. I don't see an easy way to do that but maybe Han and others > have suggestions. >
Hi, I'm reviewing this but could you educate me the detailed scenario that the original patch c0bf32d72f8b ("Manage ARP process locally in a DVR scenario ") was trying to solve? From the description added in this patch I still can't understand: why do we need to avoid sending traffic through tunnel, and why setting the src-IP and src-Mac would avoid that? > > --- > > Changes since RFC: > > - added unit-tests > > --- > > northd/ovn-northd.8.xml | 65 ++++++++++++++++++++--------------------- > > northd/ovn-northd.c | 38 ++++++++++-------------- > > tests/ovn.at | 28 +++++------------- > > tests/system-ovn.at | 28 ++++++++++++++++++ > > 4 files changed, 82 insertions(+), 77 deletions(-) > > > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > > index 8f224b07f..09dbb52b4 100644 > > --- a/northd/ovn-northd.8.xml > > +++ b/northd/ovn-northd.8.xml > > @@ -2484,37 +2484,6 @@ output; > > </p> > > </li> > > > > - <li> > > - <p> > > - For distributed logical routers where one of the logical router ports > > - specifies a <code>redirect-chassis</code>, a priority-400 logical > > - flow for each <code>dnat_and_snat</code> NAT rules configured. > > - These flows will allow to properly forward traffic to the external > > - connections if available and avoid sending it through the tunnel. > > - Assuming the following NAT rule has been configured: > > - </p> > > - > > - <pre> > > -external_ip = <var>A</var>; > > -external_mac = <var>B</var>; > > -logical_ip = <var>C</var>; > > - </pre> > > - > > - <p> > > - the following action will be applied: > > - </p> > > - > > - <pre> > > -ip.ttl--; > > -reg0 = <var>ip.dst</var>; > > -reg1 = <var>A</var>; > > -eth.src = <var>B</var>; > > -outport = <var>router-port</var>; > > -next; > > - </pre> > > - > > - </li> > > - > > <li> > > <p> > > IPv4 routing table. For each route to IPv4 network <var>N</var> with > > @@ -2660,7 +2629,35 @@ outport = <var>P</var>; > > </li> > > </ul> > > > > - <h3>Ingress Table 12: ARP/ND Resolution</h3> > > + <h3>Ingress Table 12: IP Source Policy</h3> > > + > > + <p> > > + This table contains for distributed logical routers where one of > > + the logical router ports specifies a <code>redirect-chassis</code>, > > + a priority-100 logical flow for each <code>dnat_and_snat</code> > > + NAT rules configured. > > + These flows will allow to properly forward traffic to the external > > + connections if available and avoid sending it through the tunnel. > > + Assuming the following NAT rule has been configured: > > + </p> > > + > > + <pre> > > +external_ip = <var>A</var>; > > +external_mac = <var>B</var>; > > +logical_ip = <var>C</var>; > > + </pre> > > + > > + <p> > > + the following action will be applied: > > + </p> > > + > > + <pre> > > +reg1 = <var>A</var>; > > +eth.src = <var>B</var>; > > +next; > > + </pre> > > + > > This section of the man page will show what action we perform on > traffic, i.e., move external_ip in reg1 (for SNAT) and move external_mac > in eth.src but there's no mention about what kind of traffic will match > these flows, that is: "ip.src == logical_ip && > is_chassis_resident(logical_port)". > > > + <h3>Ingress Table 13: ARP/ND Resolution</h3> > > > > <p> > > Any packet that reaches this table is an IP packet whose next-hop > > @@ -2819,7 +2816,7 @@ outport = <var>P</var>; > > > > </ul> > > > > - <h3>Ingress Table 13: Check packet length</h3> > > + <h3>Ingress Table 14: Check packet length</h3> > > > > <p> > > For distributed logical routers with distributed gateway port configured > > @@ -2849,7 +2846,7 @@ REGBIT_PKT_LARGER = check_pkt_larger(<var>L</var>); next; > > and advances to the next table. > > </p> > > > > - <h3>Ingress Table 14: Handle larger packets</h3> > > + <h3>Ingress Table 15: Handle larger packets</h3> > > > > Indices for tables "Gateway Redirect" and "ARP Request" should be > updated too. > > > <p> > > For distributed logical routers with distributed gateway port configured > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > > index 3c0070ea7..d5f3997a9 100644 > > --- a/northd/ovn-northd.c > > +++ b/northd/ovn-northd.c > > @@ -175,11 +175,12 @@ enum ovn_stage { > > PIPELINE_STAGE(ROUTER, IN, IP_ROUTING, 9, "lr_in_ip_routing") \ > > PIPELINE_STAGE(ROUTER, IN, IP_ROUTING_ECMP, 10, "lr_in_ip_routing_ecmp") \ > > PIPELINE_STAGE(ROUTER, IN, POLICY, 11, "lr_in_policy") \ > > - PIPELINE_STAGE(ROUTER, IN, ARP_RESOLVE, 12, "lr_in_arp_resolve") \ > > - PIPELINE_STAGE(ROUTER, IN, CHK_PKT_LEN , 13, "lr_in_chk_pkt_len") \ > > - PIPELINE_STAGE(ROUTER, IN, LARGER_PKTS, 14,"lr_in_larger_pkts") \ > > - PIPELINE_STAGE(ROUTER, IN, GW_REDIRECT, 15, "lr_in_gw_redirect") \ > > - PIPELINE_STAGE(ROUTER, IN, ARP_REQUEST, 16, "lr_in_arp_request") \ > > + PIPELINE_STAGE(ROUTER, IN, IP_SRC_POLICY, 12, "lr_in_ip_src_policy") \ > > + PIPELINE_STAGE(ROUTER, IN, ARP_RESOLVE, 13, "lr_in_arp_resolve") \ > > + PIPELINE_STAGE(ROUTER, IN, CHK_PKT_LEN , 14, "lr_in_chk_pkt_len") \ > > + PIPELINE_STAGE(ROUTER, IN, LARGER_PKTS, 15,"lr_in_larger_pkts") \ > > + PIPELINE_STAGE(ROUTER, IN, GW_REDIRECT, 16, "lr_in_gw_redirect") \ > > + PIPELINE_STAGE(ROUTER, IN, ARP_REQUEST, 17, "lr_in_arp_request") \ > > \ > > /* Logical router egress stages. */ \ > > PIPELINE_STAGE(ROUTER, OUT, UNDNAT, 0, "lr_out_undnat") \ > > @@ -7103,8 +7104,6 @@ build_routing_policy_flow(struct hmap *lflows, struct ovn_datapath *od, > > ds_destroy(&actions); > > } > > > > -/* default logical flow prioriry for distributed routes */ > > -#define DROUTE_PRIO 400 > > struct parsed_route { > > struct ovs_list list_node; > > struct v46_ip prefix; > > @@ -7493,7 +7492,7 @@ build_ecmp_route_flow(struct hmap *lflows, struct ovn_datapath *od, > > } > > > > static void > > -add_distributed_routes(struct hmap *lflows, struct ovn_datapath *od) > > +add_ip_src_policy_flows(struct hmap *lflows, struct ovn_datapath *od) > > { > > struct ds actions = DS_EMPTY_INITIALIZER; > > struct ds match = DS_EMPTY_INITIALIZER; > > @@ -7511,12 +7510,9 @@ add_distributed_routes(struct hmap *lflows, struct ovn_datapath *od) > > is_ipv4 ? "4" : "6", nat->logical_ip, > > nat->logical_port); > > char *prefix = is_ipv4 ? "" : "xx"; > > - ds_put_format(&actions, "outport = %s; eth.src = %s; " > > - "%sreg0 = ip%s.dst; %sreg1 = %s; next;", > > - od->l3dgw_port->json_key, nat->external_mac, > > - prefix, is_ipv4 ? "4" : "6", > > - prefix, nat->external_ip); > > - ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING, DROUTE_PRIO, > > + ds_put_format(&actions, "eth.src = %s; %sreg1 = %s; next;", > > + nat->external_mac, prefix, nat->external_ip); > > + ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_SRC_POLICY, 100, > > ds_cstr(&match), ds_cstr(&actions)); > > ds_clear(&match); > > ds_clear(&actions); > > @@ -7547,12 +7543,6 @@ add_route(struct hmap *lflows, const struct ovn_port *op, > > } > > build_route_match(op_inport, network_s, plen, is_src_route, is_ipv4, > > &match, &priority); > > - /* traffic for internal IPs of logical switch ports must be sent to > > - * the gw controller through the overlay tunnels > > - */ > > - if (op->nbrp && !op->nbrp->n_gateway_chassis) { > > - priority += DROUTE_PRIO; > > - } > > > > struct ds actions = DS_EMPTY_INITIALIZER; > > ds_put_format(&actions, "ip.ttl--; "REG_ECMP_GROUP_ID" = 0; %sreg0 = ", > > @@ -9519,9 +9509,13 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, > > * logical router > > */ > > HMAP_FOR_EACH (od, key_node, datapaths) { > > - if (od->nbr && od->l3dgw_port) { > > - add_distributed_routes(lflows, od); > > + if (!od->nbr) { > > + continue; > > + } > > + if (od->l3dgw_port) { > > + add_ip_src_policy_flows(lflows, od); > > } > > + ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_SRC_POLICY, 0, "1", "next;"); > > } > > > > /* Logical router ingress table IP_ROUTING & IP_ROUTING_ECMP: IP Routing. > > diff --git a/tests/ovn.at b/tests/ovn.at > > index f39fda2e4..fcc34fd5d 100644 > > --- a/tests/ovn.at > > +++ b/tests/ovn.at > > @@ -9637,20 +9637,6 @@ AT_CHECK([as hv3 ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=p > > OVS_WAIT_UNTIL([test 1 = `as hv3 ovs-vsctl show | \ > > grep "Port patch-br-int-to-ln_port" | wc -l`]) > > > > -AT_CHECK([test 1 = `ovn-sbctl dump-flows lr0 | grep lr_in_ip_routing | \ > > -grep "ip4.src == 10.0.0.3 && is_chassis_resident(\"foo1\")" -c`]) > > -AT_CHECK([test 1 = `ovn-sbctl dump-flows lr0 | grep lr_in_ip_routing | \ > > -grep "ip4.src == 10.0.0.4 && is_chassis_resident(\"foo2\")" -c`]) > > - > > -key=`ovn-sbctl --bare --columns tunnel_key list datapath_Binding lr0` > > -# Check that the OVS flows appear for the dnat_and_snat entries in > > -# lr_in_ip_routing table. > > -OVS_WAIT_UNTIL([test 1 = `as hv3 ovs-ofctl dump-flows br-int table=17 | \ > > -grep "priority=400,ip,metadata=0x$key,nw_src=10.0.0.3" -c`]) > > - > > -OVS_WAIT_UNTIL([test 1 = `as hv3 ovs-ofctl dump-flows br-int table=17 | \ > > -grep "priority=400,ip,metadata=0x$key,nw_src=10.0.0.4" -c`]) > > - > > # Re-add nat-addresses option > > ovn-nbctl lsp-set-options lrp0-rp router-port=lrp0 nat-addresses="router" > > > > @@ -15141,7 +15127,7 @@ ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | grep "reg0 == 10.0.0.10" \ > > # Since the sw0-vir is not claimed by any chassis, eth.dst should be set to > > # zero if the ip4.dst is the virtual ip in the router pipeline. > > AT_CHECK([cat lflows.txt], [0], [dnl > > - table=12(lr_in_arp_resolve ), priority=100 , match=(outport == "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 00:00:00:00:00:00; next;) > > + table=13(lr_in_arp_resolve ), priority=100 , match=(outport == "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 00:00:00:00:00:00; next;) > > ]) > > > > ip_to_hex() { > > @@ -15192,7 +15178,7 @@ ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | grep "reg0 == 10.0.0.10" \ > > # There should be an arp resolve flow to resolve the virtual_ip with the > > # sw0-p1's MAC. > > AT_CHECK([cat lflows.txt], [0], [dnl > > - table=12(lr_in_arp_resolve ), priority=100 , match=(outport == "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:03; next;) > > + table=13(lr_in_arp_resolve ), priority=100 , match=(outport == "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:03; next;) > > ]) > > > > # Forcibly clear virtual_parent. ovn-controller should release the binding > > @@ -15233,7 +15219,7 @@ ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | grep "reg0 == 10.0.0.10" \ > > # There should be an arp resolve flow to resolve the virtual_ip with the > > # sw0-p2's MAC. > > AT_CHECK([cat lflows.txt], [0], [dnl > > - table=12(lr_in_arp_resolve ), priority=100 , match=(outport == "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:05; next;) > > + table=13(lr_in_arp_resolve ), priority=100 , match=(outport == "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:05; next;) > > ]) > > > > # send the garp from sw0-p2 (in hv2). hv2 should claim sw0-vir > > @@ -15256,7 +15242,7 @@ ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | grep "reg0 == 10.0.0.10" \ > > # There should be an arp resolve flow to resolve the virtual_ip with the > > # sw0-p3's MAC. > > AT_CHECK([cat lflows.txt], [0], [dnl > > - table=12(lr_in_arp_resolve ), priority=100 , match=(outport == "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:04; next;) > > + table=13(lr_in_arp_resolve ), priority=100 , match=(outport == "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:04; next;) > > ]) > > > > # Now send arp reply from sw0-p1. hv1 should claim sw0-vir > > @@ -15277,7 +15263,7 @@ ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | grep "reg0 == 10.0.0.10" \ > > > lflows.txt > > > > AT_CHECK([cat lflows.txt], [0], [dnl > > - table=12(lr_in_arp_resolve ), priority=100 , match=(outport == "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:03; next;) > > + table=13(lr_in_arp_resolve ), priority=100 , match=(outport == "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:03; next;) > > ]) > > > > # Delete hv1-vif1 port. hv1 should release sw0-vir > > @@ -15295,7 +15281,7 @@ ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | grep "reg0 == 10.0.0.10" \ > > > lflows.txt > > > > AT_CHECK([cat lflows.txt], [0], [dnl > > - table=12(lr_in_arp_resolve ), priority=100 , match=(outport == "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 00:00:00:00:00:00; next;) > > + table=13(lr_in_arp_resolve ), priority=100 , match=(outport == "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 00:00:00:00:00:00; next;) > > ]) > > > > # Now send arp reply from sw0-p2. hv2 should claim sw0-vir > > @@ -15316,7 +15302,7 @@ ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | grep "reg0 == 10.0.0.10" \ > > > lflows.txt > > > > AT_CHECK([cat lflows.txt], [0], [dnl > > - table=12(lr_in_arp_resolve ), priority=100 , match=(outport == "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:04; next;) > > + table=13(lr_in_arp_resolve ), priority=100 , match=(outport == "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst = 50:54:00:00:00:04; next;) > > ]) > > > > # Delete sw0-p2 logical port > > diff --git a/tests/system-ovn.at b/tests/system-ovn.at > > I think it's preferable to have the unit tests in ovn.at whenever > possible so that they get run regularly. > > > index 9ae6c6b1f..1e4f147b4 100644 > > --- a/tests/system-ovn.at > > +++ b/tests/system-ovn.at > > @@ -2747,6 +2747,17 @@ ADD_VETH(alice1, alice1, br-int, "172.16.1.2/24", "f0:00:00:01:02:05", \ > > ovn-nbctl lsp-add alice alice1 \ > > -- lsp-set-addresses alice1 "f0:00:00:01:02:05 172.16.1.2" > > > > +# Add external network > > +ADD_NAMESPACES(ext-net) > > +ip link add alice-ext netns alice1 type veth peer name ext-veth netns ext-net > > +ip -n ext-net link set dev ext-veth up > > +ip -n ext-net addr add 10.0.0.1/24 dev ext-veth > > +ip -n ext-net route add default via 10.0.0.2 > > + > > +ip -n alice1 link set dev alice-ext up > > +ip -n alice1 addr add 10.0.0.2/24 dev alice-ext > > +ip netns exec alice1 sysctl -w net.ipv4.conf.all.forwarding=1 > > + > > # Add DNAT rules > > AT_CHECK([ovn-nbctl lr-nat-add R1 dnat_and_snat 172.16.1.3 192.168.1.2 foo1 00:00:02:02:03:04]) > > AT_CHECK([ovn-nbctl lr-nat-add R1 dnat_and_snat 172.16.1.4 192.168.1.3 foo2 00:00:02:02:03:05]) > > @@ -2754,6 +2765,9 @@ AT_CHECK([ovn-nbctl lr-nat-add R1 dnat_and_snat 172.16.1.4 192.168.1.3 foo2 00:0 > > # Add a SNAT rule > > AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.1 192.168.0.0/16]) > > > > +# Add default route to ext-net > > +AT_CHECK([ovn-nbctl lr-route-add R1 10.0.0.0/24 172.16.1.2]) > > + > > ovn-nbctl --wait=hv sync > > OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep 'nat(src=172.16.1.1)']) > > > > @@ -2776,6 +2790,20 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2 172.16.1.2 | FORMAT_PING], \ > > 3 packets transmitted, 3 received, 0% packet loss, time 0ms > > ]) > > > > +# Try to ping external network > > +NS_CHECK_EXEC([ext-net], [tcpdump -n -c 3 -i ext-veth dst 172.16.1.3 and icmp > ext-net.pcap &]) > > +sleep 1 > > +AT_CHECK([ovn-nbctl lr-nat-del R1 snat]) > > +NS_CHECK_EXEC([foo1], [ping -q -c 3 -i 0.3 -w 2 10.0.0.1 | FORMAT_PING], \ > > +[0], [dnl > > +3 packets transmitted, 3 received, 0% packet loss, time 0ms > > +]) > > + > > +OVS_WAIT_UNTIL([ > > + total_pkts=$(cat ext-net.pcap | wc -l) > > + test "${total_pkts}" = "3" > > +]) > > + > > # We verify that SNAT indeed happened via 'dump-conntrack' command. > > AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.1.1) | \ > > sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl > > > > The system test fails on my test machine (Fedora 32 with OVS master and > OVN master + your patch applied): > > make check-system-userspace TESTSUITEFLAGS="17" > [...] > 17: ovn -- DNAT and SNAT on distributed router - N/S FAILED > (system-ovn.at:2823) > > [...] > system-ovn.at:2802: wait succeeded after 1 seconds > ./system-ovn.at:2808: ovs-appctl dpctl/dump-conntrack | grep > "dst=172.16.1.1" | sed -e 's/port=[0-9]*/port=<cleared>/g' -e > 's/id=[0-9]*/id=<cleared>/g' -e 's/state=[0-9_A-Z]*/state=<cleared>/g' | > sort | uniq | \ > sed -e 's/zone=[0-9]*/zone=<cleared>/' > ./system-ovn.at:2813: ovs-appctl dpctl/flush-conntrack > ./system-ovn.at:2817: ip netns exec bar1 sh << NS_EXEC_HEREDOC ping -q > -c 3 -i 0.3 -w 2 172.16.1.2 | grep "transmitted" | sed 's/time.*ms$/time > 0ms/' > NS_EXEC_HEREDOC > ./system-ovn.at:2823: ovs-appctl dpctl/dump-conntrack | grep > "dst=172.16.1.1" | sed -e 's/port=[0-9]*/port=<cleared>/g' -e > 's/id=[0-9]*/id=<cleared>/g' -e 's/state=[0-9_A-Z]*/state=<cleared>/g' | > sort | uniq | \ > sed -e 's/zone=[0-9]*/zone=<cleared>/' > --- - 2020-05-20 10:28:50.053113201 -0400 > +++ /root/ovn/tests/system-userspace-testsuite.dir/at-groups/17/stdout > 2020-05-20 10:28:50.050668409 -0400 > @@ -1,2 +1 @@ > -icmp,orig=(src=192.168.2.2,dst=172.16.1.2,id=<cleared>,type=8,code=0),reply=(src=172.16.1.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared> > [...] > > Same happens when running the system test with the kernel datapath: > make check-kernel TESTSUITEFLAGS="17" > [...] > 17: ovn -- DNAT and SNAT on distributed router - N/S FAILED > (system-ovn.at:2823) > > Regards, > Dumitru > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev