On Wed, May 04, 2022 at 12:53:34PM +0200, Ilya Maximets wrote:
> On 5/4/22 05:23, lic121 wrote:
> > On Tue, May 03, 2022 at 10:18:18PM +0200, Ilya Maximets wrote:
> >> On 5/3/22 15:31, Peng He wrote:
> >>> Hi,
> >>> this issue has been found and resolved by two independent patches:
> >>>
> >>> http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang...@139.com/
> >>>  
> >>> <http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang...@139.com/>
> >>> http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0...@bytedance.com/
> >>>  
> >>> <http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0...@bytedance.com/>
> >>>
> >>> but I recently found my patch is still not fixing all the cases.
> >>>
> >>> would like to have a discussion about this.
> >>
> >> Hrm.  It seems to me that the problem is more genric then mirrors.
> >> The reason is that tnl_port_map_lookup() logic doesn't work for
> >> a bridge with more than one tunnel termination port.
> >> Before commit dc0bd12f5b04 ("userspace: Enable non-bridge port
> >> as tunnel endpoint."), only the bridge port was allowed to terminate
> >> the tunnel.  MAC+IP lookup performed by the classifier inside the
> >> tnl_port_map_lookup() was enough to determine if termination is
> >> needed or if the packet has to be forwarded without tunnel_pop.
> >> Now we have multiple ports that may or may not terminate the tunnel.
> >>
> >> Assuming we have 2 different tunnel ports and two different normal
> >> ports with different IP addresses and we expect the first tunnel
> >> to be terminated on the first port and the second tunnel to be
> >> terminated on the second port.  At the same time, the packet from
> >> the first tunnel should be forwarded to the second port without
> >> tunnel_pop and the packet from the second tunnel should be
> >> forwarded to the first port without tunnel_pop.
> >>
> >> Since the classifier in the tunnel-port.c will have rules for
> >> both tunnels and it only matches on MAC+IP (+ some other fileds,
> >> but we don't care much right now), we will have decapsulation
> >> triggered in all cases.  The same as what happened with the
> >> mirrored traffic.
> >>
> >> Two patcehs mentioned above only focused on mirrors, so they
> >> will not fix the problem in a common case.  The patch below
> >> is closer to a solution, but it will skip the matches on dl_dst,
> >> and, possibly, other tunnel header fileds, which we absolutely
> >> need in a non-mirror case with 2 tunnels.
> > 
> > Hi Ilya, you mean we need to check dl_dst as well before tnl classifier
> > lookup? For the case that two ports have the same ip address?
> 
> If two ports on the same bridge has the same IP, that sounds
> like a misconfiguration (I'd argue that having IP assigned
> to a bridge member is a misconfiguration in general, but we
> have what we have...), so I'm not sure if we need to check
> for IP additionally.  But we must unwildcard (add to a flow
> match) fields that we're using to make a decision.

Thanks for the explanation, yes you are right. We do need to unwildcard.

> tnl_port_map_lookup() does that in a generic way.  It has
> information about all the MACs, IPs, VLANs and other fileds
> that may or may not be significant for the lookup.
> Generic classifier lookup inside that function unwildcards
> all the fileds that was actually used during lookup.
> 
> So, I'm not convinced if just checking the IP is sufficient,

An ovs port mainly has two propertyies: IP and MAC. MAC is not a property
for most tunnels. If two ports have the same IP but different MAC. We
can't tell which port is the right tunnel terminal port. So I checked
the IP only, but missed the unwildcard as you mentioned.

> but we definitely have to unwildcard it in a flow match if
> we're checking it.
> 
> > 
> >>
> >> The correct solution would be to add an output port match to
> >> the classifier, but clasifier is not designed to do that and
> >> 'struct match' doesn't have an appropriate field.  It shoudl
> >> be a separate classifier per ipdev, or something like that.
> >>
> >> Thoughts?
> >>
> >> In any case, every part of the packet that was used to make a
> >> decision must be unwildcarded.  IP on the port is not part of
> >> the packet, but dst_ip is.
> >>
> >> Best regards, Ilya Maximets.
> >>
> >>>
> >>>
> >>>
> >>> lic121 <lic...@chinatelecom.cn <mailto:lic...@chinatelecom.cn>> 
> >>> 于2022年5月3日周二 20:55写道:
> >>>
> >>>     In compose_output_action__(), terminate_native_tunnel() is called
> >>>     to test if a pkt matches a tnl port. If yes, a TUNNEL_POP action
> >>>     is appended. Instead of outputing the pkt to the origin port, pkt
> >>>     is redirected into the tnl port.
> >>>
> >>>     pkt metadata(i.e. nw_proto, dst_ip, dst_port) is considered when
> >>>     testing a pkt regardless of the out port until Ilya's patch[1].
> >>>     Patch[1] adds the condition that terminal ports must have IP
> >>>     address. This patch is to make addtional port ip check that port
> >>>     ip address must match the pkt dst ip. This check covers the mirror
> >>>     port case:
> >>>     To tcpdump underlay pkt, we may create mirror port of dpdk port.
> >>>     When compose_output_action__() is called on the mirror port,
> >>>     pkt is redirected into tnl port again. As a result, ping shows
> >>>     DUP. fast path flow has two tnl_pop actions.
> >>>
> >>>     ```
> >>>     [root@ovs1 vagrant]# ping 172.16.100.2
> >>>     PING 172.16.100.2 (172.16.100.2) 56(84) bytes of data.
> >>>     64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1 ttl=64 
> >>> time=0.606 ms
> >>>     64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1 ttl=64 
> >>> time=0.617 ms (DUP!)
> >>>     ```
> >>>
> >>>     ```
> >>>     [root@ovs ovs]# ovs-appctl dpctl/dump-flows |grep tnl_pop
> >>>     
> >>> recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=52:54:00:a6:c6:de,dst=ea:b4:09:72:44:4c),eth_type(0x0800),ipv4(dst=192.168.124.33,proto=17,frag=no),udp(dst=8472),
> >>>  packets:1, bytes:148, used:0.637s, actions:tnl_pop(2),tnl_pop(2)
> >>>     ```
> >>>
> >>>     [1] (d584bb2b6a1 "ofproto-dpif-xlate: Terminate native tunnels only 
> >>> on ports with IP addresses.")
> >>>
> >>>     Signed-off-by: lic121 <lic...@chinatelecom.cn 
> >>> <mailto:lic...@chinatelecom.cn>>
> >>>     ---
> >>>      ofproto/ofproto-dpif-xlate.c | 45 
> >>> +++++++++++++++++++++++++++-------------
> >>>      tests/tunnel-push-pop.at <http://tunnel-push-pop.at>     | 49 
> >>> ++++++++++++++++++++++++++++++++++++++++++++
> >>>      2 files changed, 80 insertions(+), 14 deletions(-)
> >>>
> >>>     diff --git a/ofproto/ofproto-dpif-xlate.c 
> >>> b/ofproto/ofproto-dpif-xlate.c
> >>>     index 17f7e28..30e636e 100644
> >>>     --- a/ofproto/ofproto-dpif-xlate.c
> >>>     +++ b/ofproto/ofproto-dpif-xlate.c
> >>>     @@ -4107,18 +4107,26 @@ is_neighbor_reply_correct(const struct 
> >>> xlate_ctx *ctx, const struct flow *flow)
> >>>      }
> >>>
> >>>      static bool
> >>>     -xport_has_ip(const struct xport *xport)
> >>>     +xport_has_ip(const struct xport *xport, const struct in6_addr ip6)
> >>>      {
> >>>          struct in6_addr *ip_addr, *mask;
> >>>          int n_in6 = 0;
> >>>     +    int i;
> >>>     +    bool hasip = false;
> >>>
> >>>          if (netdev_get_addr_list(xport->netdev, &ip_addr, &mask, 
> >>> &n_in6)) {
> >>>              n_in6 = 0;
> >>>          } else {
> >>>     +        for (i = 0; i < n_in6; i++) {
> >>>     +            if (IN6_ARE_ADDR_EQUAL(ip_addr + i, &ip6)) {
> >>>     +                hasip = true;
> >>>     +                break;
> >>>     +            }
> >>>     +        }
> >>>              free(ip_addr);
> >>>              free(mask);
> >>>          }
> >>>     -    return n_in6 ? true : false;
> >>>     +    return hasip;
> >>>      }
> >>>
> >>>      static bool
> >>>     @@ -4127,26 +4135,35 @@ terminate_native_tunnel(struct xlate_ctx 
> >>> *ctx, const struct xport *xport,
> >>>                              odp_port_t *tnl_port)
> >>>      {
> >>>          *tnl_port = ODPP_NONE;
> >>>     +    bool ip_pkt = true;
> >>>     +    struct in6_addr ip_dst;
> >>>     +
> >>>     +    if (flow->dl_type == htons(ETH_TYPE_IP)) {
> >>>     +        ip_dst = in6_addr_mapped_ipv4(flow->nw_dst);
> >>>     +    } else if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
> >>>     +        ip_dst = flow->ipv6_dst;
> >>>     +    } else {
> >>>     +        ip_pkt = false;
> >>>     +    }
> >>>
> >>>          /* XXX: Write better Filter for tunnel port. We can use in_port
> >>>           * in tunnel-port flow to avoid these checks completely.
> >>>           *
> >>>     -     * Port without an IP address cannot be a tunnel termination 
> >>> point.
> >>>     -     * Not performing a lookup in this case to avoid unwildcarding 
> >>> extra
> >>>     -     * flow fields (dl_dst). */
> >>>     -    if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
> >>>     -        && xport_has_ip(xport)) {
> >>>     +     * Not performing a lookup if dst ip not match, to avoid 
> >>> unwildcarding
> >>>     +     * extra flow fields (dl_dst). Also avoid duplicate tnl_pop. */
> >>>     +    if ((flow->dl_type == htons(ETH_TYPE_ARP) ||
> >>>     +                flow->nw_proto == IPPROTO_ICMPV6) &&
> >>>     +                is_neighbor_reply_correct(ctx, flow)) {
> >>>     +            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
> >>>     +                            ctx->xin->allow_side_effects);
> >>>     +
> >>>     +    } else if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
> >>>     +        && ip_pkt && xport_has_ip(xport, ip_dst)) {
> >>>              *tnl_port = tnl_port_map_lookup(flow, wc);
> >>>
> >>>              /* If no tunnel port was found and it's about an ARP or 
> >>> ICMPv6 packet,
> >>>               * do tunnel neighbor snooping. */
> >>>     -        if (*tnl_port == ODPP_NONE &&
> >>>     -            (flow->dl_type == htons(ETH_TYPE_ARP) ||
> >>>     -             flow->nw_proto == IPPROTO_ICMPV6) &&
> >>>     -             is_neighbor_reply_correct(ctx, flow)) {
> >>>     -            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
> >>>     -                            ctx->xin->allow_side_effects);
> >>>     -        } else if (*tnl_port != ODPP_NONE &&
> >>>     +        if (*tnl_port != ODPP_NONE &&
> >>>                         ctx->xin->allow_side_effects &&
> >>>                         dl_type_is_ip_any(flow->dl_type)) {
> >>>                  struct eth_addr mac = flow->dl_src;
> >>>     diff --git a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at> 
> >>> b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
> >>>     index c633441..b3a3bf9 100644
> >>>     --- a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
> >>>     +++ b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
> >>>     @@ -808,6 +808,55 @@ NXST_FLOW reply:
> >>>      OVS_VSWITCHD_STOP
> >>>      AT_CLEANUP
> >>>
> >>>     +AT_SETUP([tunnel_push_pop - multiple output])
> >>>     +
> >>>     +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy 
> >>> ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
> >>>     +AT_CHECK([ovs-vsctl add-port br0 m0 -- set Interface m0 type=dummy 
> >>> ofport_request=2])
> >>>     +AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br 
> >>> datapath_type=dummy], [0])
> >>>     +AT_CHECK([ovs-vsctl add-port int-br t1 -- set Interface t1 type=gre \
> >>>     +                       options:remote_ip=1.1.2.92 options:key=456 
> >>> options:seq=true ofport_request=4], [0])
> >>>     +
> >>>     +AT_CHECK([ovs-appctl dpif/show], [0], [dnl
> >>>     +dummy@ovs-dummy: hit:0 missed:0
> >>>     +  br0:
> >>>     +    br0 65534/100: (dummy-internal)
> >>>     +    m0 2/2: (dummy)
> >>>     +    p0 1/1: (dummy)
> >>>     +  int-br:
> >>>     +    int-br 65534/3: (dummy-internal)
> >>>     +    t1 4/4: (gre: key=456, remote_ip=1.1.2.92, seq=true)
> >>>     +])
> >>>     +
> >>>     +AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/24 
> >>> <http://1.1.2.88/24>], [0], [OK
> >>>     +])
> >>>     +AT_CHECK([ovs-appctl netdev-dummy/ip4addr m0 1.1.3.88/24 
> >>> <http://1.1.3.88/24>], [0], [OK
> >>>     +])
> >>>     +AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/24 <http://1.1.2.92/24> 
> >>> br0], [0], [OK
> >>>     +])
> >>>     +AT_CHECK([ovs-appctl ovs/route/add 1.1.3.92/24 <http://1.1.3.92/24> 
> >>> m0], [0], [OK
> >>>     +])
> >>>     +AT_CHECK([ovs-ofctl add-flow br0 'arp,priority=1,action=normal'])
> >>>     +
> >>>     +dnl Use arp reply to achieve tunnel next hop mac binding
> >>>     +AT_CHECK([ovs-appctl netdev-dummy/receive p0 
> >>> 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)'])
> >>>     +
> >>>     +dnl Ouput gre pkt to both br0 port and m0 port
> >>>     +dnl m0 port to simulate mirror port, or flood case
> >>>     +AT_CHECK([ovs-ofctl add-flow br0 
> >>> 'in_port(1),ip,ip_proto=47,priority=99,action=br0,m0'])
> >>>     +
> >>>     +AT_CHECK([ovs-appctl netdev-dummy/receive p0 
> >>> 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'])
> >>>     +
> >>>     +dnl port br0: has the configured ip address of tnl port t1, so 
> >>> 'tnl_pop(4)'
> >>>     +dnl port m0: does not have a configured tnl ip address, so 'output:2'
> >>>     +AT_CHECK([ovs-appctl dpctl/dump-flows], [0], [dnl
> >>>     +flow-dump from the main thread:
> >>>     
> >>> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6),
> >>>  packets:0, bytes:0, used:never, actions:100,2
> >>>     
> >>> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(dst=1.1.2.88,proto=47,frag=no),
> >>>  packets:0, bytes:0, used:never, actions:tnl_pop(4),2
> >>>     +])
> >>>     +
> >>>     +OVS_VSWITCHD_STOP
> >>>     +AT_CLEANUP
> >>>     +
> >>>      AT_SETUP([tunnel_push_pop - flow translation on unrelated bridges])
> >>>
> >>>      OVS_VSWITCHD_START(
> >>>     -- 
> >>>     1.8.3.1
> >>>
> >>>     _______________________________________________
> >>>     dev mailing list
> >>>     d...@openvswitch.org <mailto:d...@openvswitch.org>
> >>>     https://mail.openvswitch.org/mailman/listinfo/ovs-dev 
> >>> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> >>>
> >>>
> >>>
> >>> -- 
> >>> hepeng
> >>
> > _______________________________________________
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to