On Wed, May 04, 2022 at 12:53:34PM +0200, Ilya Maximets wrote: > On 5/4/22 05:23, lic121 wrote: > > On Tue, May 03, 2022 at 10:18:18PM +0200, Ilya Maximets wrote: > >> On 5/3/22 15:31, Peng He wrote: > >>> Hi, > >>> this issue has been found and resolved by two independent patches: > >>> > >>> http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang...@139.com/ > >>> > >>> <http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang...@139.com/> > >>> http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0...@bytedance.com/ > >>> > >>> <http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0...@bytedance.com/> > >>> > >>> but I recently found my patch is still not fixing all the cases. > >>> > >>> would like to have a discussion about this. > >> > >> Hrm. It seems to me that the problem is more genric then mirrors. > >> The reason is that tnl_port_map_lookup() logic doesn't work for > >> a bridge with more than one tunnel termination port. > >> Before commit dc0bd12f5b04 ("userspace: Enable non-bridge port > >> as tunnel endpoint."), only the bridge port was allowed to terminate > >> the tunnel. MAC+IP lookup performed by the classifier inside the > >> tnl_port_map_lookup() was enough to determine if termination is > >> needed or if the packet has to be forwarded without tunnel_pop. > >> Now we have multiple ports that may or may not terminate the tunnel. > >> > >> Assuming we have 2 different tunnel ports and two different normal > >> ports with different IP addresses and we expect the first tunnel > >> to be terminated on the first port and the second tunnel to be > >> terminated on the second port. At the same time, the packet from > >> the first tunnel should be forwarded to the second port without > >> tunnel_pop and the packet from the second tunnel should be > >> forwarded to the first port without tunnel_pop. > >> > >> Since the classifier in the tunnel-port.c will have rules for > >> both tunnels and it only matches on MAC+IP (+ some other fileds, > >> but we don't care much right now), we will have decapsulation > >> triggered in all cases. The same as what happened with the > >> mirrored traffic. > >> > >> Two patcehs mentioned above only focused on mirrors, so they > >> will not fix the problem in a common case. The patch below > >> is closer to a solution, but it will skip the matches on dl_dst, > >> and, possibly, other tunnel header fileds, which we absolutely > >> need in a non-mirror case with 2 tunnels. > > > > Hi Ilya, you mean we need to check dl_dst as well before tnl classifier > > lookup? For the case that two ports have the same ip address? > > If two ports on the same bridge has the same IP, that sounds > like a misconfiguration (I'd argue that having IP assigned > to a bridge member is a misconfiguration in general, but we > have what we have...), so I'm not sure if we need to check > for IP additionally. But we must unwildcard (add to a flow > match) fields that we're using to make a decision.
Thanks for the explanation, yes you are right. We do need to unwildcard. > tnl_port_map_lookup() does that in a generic way. It has > information about all the MACs, IPs, VLANs and other fileds > that may or may not be significant for the lookup. > Generic classifier lookup inside that function unwildcards > all the fileds that was actually used during lookup. > > So, I'm not convinced if just checking the IP is sufficient, An ovs port mainly has two propertyies: IP and MAC. MAC is not a property for most tunnels. If two ports have the same IP but different MAC. We can't tell which port is the right tunnel terminal port. So I checked the IP only, but missed the unwildcard as you mentioned. > but we definitely have to unwildcard it in a flow match if > we're checking it. > > > > >> > >> The correct solution would be to add an output port match to > >> the classifier, but clasifier is not designed to do that and > >> 'struct match' doesn't have an appropriate field. It shoudl > >> be a separate classifier per ipdev, or something like that. > >> > >> Thoughts? > >> > >> In any case, every part of the packet that was used to make a > >> decision must be unwildcarded. IP on the port is not part of > >> the packet, but dst_ip is. > >> > >> Best regards, Ilya Maximets. > >> > >>> > >>> > >>> > >>> lic121 <lic...@chinatelecom.cn <mailto:lic...@chinatelecom.cn>> > >>> 于2022年5月3日周二 20:55写道: > >>> > >>> In compose_output_action__(), terminate_native_tunnel() is called > >>> to test if a pkt matches a tnl port. If yes, a TUNNEL_POP action > >>> is appended. Instead of outputing the pkt to the origin port, pkt > >>> is redirected into the tnl port. > >>> > >>> pkt metadata(i.e. nw_proto, dst_ip, dst_port) is considered when > >>> testing a pkt regardless of the out port until Ilya's patch[1]. > >>> Patch[1] adds the condition that terminal ports must have IP > >>> address. This patch is to make addtional port ip check that port > >>> ip address must match the pkt dst ip. This check covers the mirror > >>> port case: > >>> To tcpdump underlay pkt, we may create mirror port of dpdk port. > >>> When compose_output_action__() is called on the mirror port, > >>> pkt is redirected into tnl port again. As a result, ping shows > >>> DUP. fast path flow has two tnl_pop actions. > >>> > >>> ``` > >>> [root@ovs1 vagrant]# ping 172.16.100.2 > >>> PING 172.16.100.2 (172.16.100.2) 56(84) bytes of data. > >>> 64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1 ttl=64 > >>> time=0.606 ms > >>> 64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1 ttl=64 > >>> time=0.617 ms (DUP!) > >>> ``` > >>> > >>> ``` > >>> [root@ovs ovs]# ovs-appctl dpctl/dump-flows |grep tnl_pop > >>> > >>> recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=52:54:00:a6:c6:de,dst=ea:b4:09:72:44:4c),eth_type(0x0800),ipv4(dst=192.168.124.33,proto=17,frag=no),udp(dst=8472), > >>> packets:1, bytes:148, used:0.637s, actions:tnl_pop(2),tnl_pop(2) > >>> ``` > >>> > >>> [1] (d584bb2b6a1 "ofproto-dpif-xlate: Terminate native tunnels only > >>> on ports with IP addresses.") > >>> > >>> Signed-off-by: lic121 <lic...@chinatelecom.cn > >>> <mailto:lic...@chinatelecom.cn>> > >>> --- > >>> ofproto/ofproto-dpif-xlate.c | 45 > >>> +++++++++++++++++++++++++++------------- > >>> tests/tunnel-push-pop.at <http://tunnel-push-pop.at> | 49 > >>> ++++++++++++++++++++++++++++++++++++++++++++ > >>> 2 files changed, 80 insertions(+), 14 deletions(-) > >>> > >>> diff --git a/ofproto/ofproto-dpif-xlate.c > >>> b/ofproto/ofproto-dpif-xlate.c > >>> index 17f7e28..30e636e 100644 > >>> --- a/ofproto/ofproto-dpif-xlate.c > >>> +++ b/ofproto/ofproto-dpif-xlate.c > >>> @@ -4107,18 +4107,26 @@ is_neighbor_reply_correct(const struct > >>> xlate_ctx *ctx, const struct flow *flow) > >>> } > >>> > >>> static bool > >>> -xport_has_ip(const struct xport *xport) > >>> +xport_has_ip(const struct xport *xport, const struct in6_addr ip6) > >>> { > >>> struct in6_addr *ip_addr, *mask; > >>> int n_in6 = 0; > >>> + int i; > >>> + bool hasip = false; > >>> > >>> if (netdev_get_addr_list(xport->netdev, &ip_addr, &mask, > >>> &n_in6)) { > >>> n_in6 = 0; > >>> } else { > >>> + for (i = 0; i < n_in6; i++) { > >>> + if (IN6_ARE_ADDR_EQUAL(ip_addr + i, &ip6)) { > >>> + hasip = true; > >>> + break; > >>> + } > >>> + } > >>> free(ip_addr); > >>> free(mask); > >>> } > >>> - return n_in6 ? true : false; > >>> + return hasip; > >>> } > >>> > >>> static bool > >>> @@ -4127,26 +4135,35 @@ terminate_native_tunnel(struct xlate_ctx > >>> *ctx, const struct xport *xport, > >>> odp_port_t *tnl_port) > >>> { > >>> *tnl_port = ODPP_NONE; > >>> + bool ip_pkt = true; > >>> + struct in6_addr ip_dst; > >>> + > >>> + if (flow->dl_type == htons(ETH_TYPE_IP)) { > >>> + ip_dst = in6_addr_mapped_ipv4(flow->nw_dst); > >>> + } else if (flow->dl_type == htons(ETH_TYPE_IPV6)) { > >>> + ip_dst = flow->ipv6_dst; > >>> + } else { > >>> + ip_pkt = false; > >>> + } > >>> > >>> /* XXX: Write better Filter for tunnel port. We can use in_port > >>> * in tunnel-port flow to avoid these checks completely. > >>> * > >>> - * Port without an IP address cannot be a tunnel termination > >>> point. > >>> - * Not performing a lookup in this case to avoid unwildcarding > >>> extra > >>> - * flow fields (dl_dst). */ > >>> - if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto) > >>> - && xport_has_ip(xport)) { > >>> + * Not performing a lookup if dst ip not match, to avoid > >>> unwildcarding > >>> + * extra flow fields (dl_dst). Also avoid duplicate tnl_pop. */ > >>> + if ((flow->dl_type == htons(ETH_TYPE_ARP) || > >>> + flow->nw_proto == IPPROTO_ICMPV6) && > >>> + is_neighbor_reply_correct(ctx, flow)) { > >>> + tnl_neigh_snoop(flow, wc, ctx->xbridge->name, > >>> + ctx->xin->allow_side_effects); > >>> + > >>> + } else if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto) > >>> + && ip_pkt && xport_has_ip(xport, ip_dst)) { > >>> *tnl_port = tnl_port_map_lookup(flow, wc); > >>> > >>> /* If no tunnel port was found and it's about an ARP or > >>> ICMPv6 packet, > >>> * do tunnel neighbor snooping. */ > >>> - if (*tnl_port == ODPP_NONE && > >>> - (flow->dl_type == htons(ETH_TYPE_ARP) || > >>> - flow->nw_proto == IPPROTO_ICMPV6) && > >>> - is_neighbor_reply_correct(ctx, flow)) { > >>> - tnl_neigh_snoop(flow, wc, ctx->xbridge->name, > >>> - ctx->xin->allow_side_effects); > >>> - } else if (*tnl_port != ODPP_NONE && > >>> + if (*tnl_port != ODPP_NONE && > >>> ctx->xin->allow_side_effects && > >>> dl_type_is_ip_any(flow->dl_type)) { > >>> struct eth_addr mac = flow->dl_src; > >>> diff --git a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at> > >>> b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at> > >>> index c633441..b3a3bf9 100644 > >>> --- a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at> > >>> +++ b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at> > >>> @@ -808,6 +808,55 @@ NXST_FLOW reply: > >>> OVS_VSWITCHD_STOP > >>> AT_CLEANUP > >>> > >>> +AT_SETUP([tunnel_push_pop - multiple output]) > >>> + > >>> +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy > >>> ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00]) > >>> +AT_CHECK([ovs-vsctl add-port br0 m0 -- set Interface m0 type=dummy > >>> ofport_request=2]) > >>> +AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br > >>> datapath_type=dummy], [0]) > >>> +AT_CHECK([ovs-vsctl add-port int-br t1 -- set Interface t1 type=gre \ > >>> + options:remote_ip=1.1.2.92 options:key=456 > >>> options:seq=true ofport_request=4], [0]) > >>> + > >>> +AT_CHECK([ovs-appctl dpif/show], [0], [dnl > >>> +dummy@ovs-dummy: hit:0 missed:0 > >>> + br0: > >>> + br0 65534/100: (dummy-internal) > >>> + m0 2/2: (dummy) > >>> + p0 1/1: (dummy) > >>> + int-br: > >>> + int-br 65534/3: (dummy-internal) > >>> + t1 4/4: (gre: key=456, remote_ip=1.1.2.92, seq=true) > >>> +]) > >>> + > >>> +AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/24 > >>> <http://1.1.2.88/24>], [0], [OK > >>> +]) > >>> +AT_CHECK([ovs-appctl netdev-dummy/ip4addr m0 1.1.3.88/24 > >>> <http://1.1.3.88/24>], [0], [OK > >>> +]) > >>> +AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/24 <http://1.1.2.92/24> > >>> br0], [0], [OK > >>> +]) > >>> +AT_CHECK([ovs-appctl ovs/route/add 1.1.3.92/24 <http://1.1.3.92/24> > >>> m0], [0], [OK > >>> +]) > >>> +AT_CHECK([ovs-ofctl add-flow br0 'arp,priority=1,action=normal']) > >>> + > >>> +dnl Use arp reply to achieve tunnel next hop mac binding > >>> +AT_CHECK([ovs-appctl netdev-dummy/receive p0 > >>> 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)']) > >>> + > >>> +dnl Ouput gre pkt to both br0 port and m0 port > >>> +dnl m0 port to simulate mirror port, or flood case > >>> +AT_CHECK([ovs-ofctl add-flow br0 > >>> 'in_port(1),ip,ip_proto=47,priority=99,action=br0,m0']) > >>> + > >>> +AT_CHECK([ovs-appctl netdev-dummy/receive p0 > >>> 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)']) > >>> + > >>> +dnl port br0: has the configured ip address of tnl port t1, so > >>> 'tnl_pop(4)' > >>> +dnl port m0: does not have a configured tnl ip address, so 'output:2' > >>> +AT_CHECK([ovs-appctl dpctl/dump-flows], [0], [dnl > >>> +flow-dump from the main thread: > >>> > >>> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6), > >>> packets:0, bytes:0, used:never, actions:100,2 > >>> > >>> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(dst=1.1.2.88,proto=47,frag=no), > >>> packets:0, bytes:0, used:never, actions:tnl_pop(4),2 > >>> +]) > >>> + > >>> +OVS_VSWITCHD_STOP > >>> +AT_CLEANUP > >>> + > >>> AT_SETUP([tunnel_push_pop - flow translation on unrelated bridges]) > >>> > >>> OVS_VSWITCHD_START( > >>> -- > >>> 1.8.3.1 > >>> > >>> _______________________________________________ > >>> dev mailing list > >>> d...@openvswitch.org <mailto:d...@openvswitch.org> > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >>> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev> > >>> > >>> > >>> > >>> -- > >>> hepeng > >> > > _______________________________________________ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev