On 5/3/22 15:31, Peng He wrote:
> Hi,
> this issue has been found and resolved by two independent patches:
> 
> http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang...@139.com/
>  
> <http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang...@139.com/>
> http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0...@bytedance.com/
>  
> <http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0...@bytedance.com/>
> 
> but I recently found my patch is still not fixing all the cases.
> 
> would like to have a discussion about this.

Hrm.  It seems to me that the problem is more genric then mirrors.
The reason is that tnl_port_map_lookup() logic doesn't work for
a bridge with more than one tunnel termination port.
Before commit dc0bd12f5b04 ("userspace: Enable non-bridge port
as tunnel endpoint."), only the bridge port was allowed to terminate
the tunnel.  MAC+IP lookup performed by the classifier inside the
tnl_port_map_lookup() was enough to determine if termination is
needed or if the packet has to be forwarded without tunnel_pop.
Now we have multiple ports that may or may not terminate the tunnel.

Assuming we have 2 different tunnel ports and two different normal
ports with different IP addresses and we expect the first tunnel
to be terminated on the first port and the second tunnel to be
terminated on the second port.  At the same time, the packet from
the first tunnel should be forwarded to the second port without
tunnel_pop and the packet from the second tunnel should be
forwarded to the first port without tunnel_pop.

Since the classifier in the tunnel-port.c will have rules for
both tunnels and it only matches on MAC+IP (+ some other fileds,
but we don't care much right now), we will have decapsulation
triggered in all cases.  The same as what happened with the
mirrored traffic.

Two patcehs mentioned above only focused on mirrors, so they
will not fix the problem in a common case.  The patch below
is closer to a solution, but it will skip the matches on dl_dst,
and, possibly, other tunnel header fileds, which we absolutely
need in a non-mirror case with 2 tunnels.

The correct solution would be to add an output port match to
the classifier, but clasifier is not designed to do that and
'struct match' doesn't have an appropriate field.  It shoudl
be a separate classifier per ipdev, or something like that.

Thoughts?

In any case, every part of the packet that was used to make a
decision must be unwildcarded.  IP on the port is not part of
the packet, but dst_ip is.

Best regards, Ilya Maximets.

> 
> 
> 
> lic121 <lic...@chinatelecom.cn <mailto:lic...@chinatelecom.cn>> 于2022年5月3日周二 
> 20:55写道:
> 
>     In compose_output_action__(), terminate_native_tunnel() is called
>     to test if a pkt matches a tnl port. If yes, a TUNNEL_POP action
>     is appended. Instead of outputing the pkt to the origin port, pkt
>     is redirected into the tnl port.
> 
>     pkt metadata(i.e. nw_proto, dst_ip, dst_port) is considered when
>     testing a pkt regardless of the out port until Ilya's patch[1].
>     Patch[1] adds the condition that terminal ports must have IP
>     address. This patch is to make addtional port ip check that port
>     ip address must match the pkt dst ip. This check covers the mirror
>     port case:
>     To tcpdump underlay pkt, we may create mirror port of dpdk port.
>     When compose_output_action__() is called on the mirror port,
>     pkt is redirected into tnl port again. As a result, ping shows
>     DUP. fast path flow has two tnl_pop actions.
> 
>     ```
>     [root@ovs1 vagrant]# ping 172.16.100.2
>     PING 172.16.100.2 (172.16.100.2) 56(84) bytes of data.
>     64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1 ttl=64 
> time=0.606 ms
>     64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1 ttl=64 
> time=0.617 ms (DUP!)
>     ```
> 
>     ```
>     [root@ovs ovs]# ovs-appctl dpctl/dump-flows |grep tnl_pop
>     
> recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=52:54:00:a6:c6:de,dst=ea:b4:09:72:44:4c),eth_type(0x0800),ipv4(dst=192.168.124.33,proto=17,frag=no),udp(dst=8472),
>  packets:1, bytes:148, used:0.637s, actions:tnl_pop(2),tnl_pop(2)
>     ```
> 
>     [1] (d584bb2b6a1 "ofproto-dpif-xlate: Terminate native tunnels only on 
> ports with IP addresses.")
> 
>     Signed-off-by: lic121 <lic...@chinatelecom.cn 
> <mailto:lic...@chinatelecom.cn>>
>     ---
>      ofproto/ofproto-dpif-xlate.c | 45 
> +++++++++++++++++++++++++++-------------
>      tests/tunnel-push-pop.at <http://tunnel-push-pop.at>     | 49 
> ++++++++++++++++++++++++++++++++++++++++++++
>      2 files changed, 80 insertions(+), 14 deletions(-)
> 
>     diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>     index 17f7e28..30e636e 100644
>     --- a/ofproto/ofproto-dpif-xlate.c
>     +++ b/ofproto/ofproto-dpif-xlate.c
>     @@ -4107,18 +4107,26 @@ is_neighbor_reply_correct(const struct xlate_ctx 
> *ctx, const struct flow *flow)
>      }
> 
>      static bool
>     -xport_has_ip(const struct xport *xport)
>     +xport_has_ip(const struct xport *xport, const struct in6_addr ip6)
>      {
>          struct in6_addr *ip_addr, *mask;
>          int n_in6 = 0;
>     +    int i;
>     +    bool hasip = false;
> 
>          if (netdev_get_addr_list(xport->netdev, &ip_addr, &mask, &n_in6)) {
>              n_in6 = 0;
>          } else {
>     +        for (i = 0; i < n_in6; i++) {
>     +            if (IN6_ARE_ADDR_EQUAL(ip_addr + i, &ip6)) {
>     +                hasip = true;
>     +                break;
>     +            }
>     +        }
>              free(ip_addr);
>              free(mask);
>          }
>     -    return n_in6 ? true : false;
>     +    return hasip;
>      }
> 
>      static bool
>     @@ -4127,26 +4135,35 @@ terminate_native_tunnel(struct xlate_ctx *ctx, 
> const struct xport *xport,
>                              odp_port_t *tnl_port)
>      {
>          *tnl_port = ODPP_NONE;
>     +    bool ip_pkt = true;
>     +    struct in6_addr ip_dst;
>     +
>     +    if (flow->dl_type == htons(ETH_TYPE_IP)) {
>     +        ip_dst = in6_addr_mapped_ipv4(flow->nw_dst);
>     +    } else if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
>     +        ip_dst = flow->ipv6_dst;
>     +    } else {
>     +        ip_pkt = false;
>     +    }
> 
>          /* XXX: Write better Filter for tunnel port. We can use in_port
>           * in tunnel-port flow to avoid these checks completely.
>           *
>     -     * Port without an IP address cannot be a tunnel termination point.
>     -     * Not performing a lookup in this case to avoid unwildcarding extra
>     -     * flow fields (dl_dst). */
>     -    if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
>     -        && xport_has_ip(xport)) {
>     +     * Not performing a lookup if dst ip not match, to avoid 
> unwildcarding
>     +     * extra flow fields (dl_dst). Also avoid duplicate tnl_pop. */
>     +    if ((flow->dl_type == htons(ETH_TYPE_ARP) ||
>     +                flow->nw_proto == IPPROTO_ICMPV6) &&
>     +                is_neighbor_reply_correct(ctx, flow)) {
>     +            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
>     +                            ctx->xin->allow_side_effects);
>     +
>     +    } else if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
>     +        && ip_pkt && xport_has_ip(xport, ip_dst)) {
>              *tnl_port = tnl_port_map_lookup(flow, wc);
> 
>              /* If no tunnel port was found and it's about an ARP or ICMPv6 
> packet,
>               * do tunnel neighbor snooping. */
>     -        if (*tnl_port == ODPP_NONE &&
>     -            (flow->dl_type == htons(ETH_TYPE_ARP) ||
>     -             flow->nw_proto == IPPROTO_ICMPV6) &&
>     -             is_neighbor_reply_correct(ctx, flow)) {
>     -            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
>     -                            ctx->xin->allow_side_effects);
>     -        } else if (*tnl_port != ODPP_NONE &&
>     +        if (*tnl_port != ODPP_NONE &&
>                         ctx->xin->allow_side_effects &&
>                         dl_type_is_ip_any(flow->dl_type)) {
>                  struct eth_addr mac = flow->dl_src;
>     diff --git a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at> 
> b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
>     index c633441..b3a3bf9 100644
>     --- a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
>     +++ b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
>     @@ -808,6 +808,55 @@ NXST_FLOW reply:
>      OVS_VSWITCHD_STOP
>      AT_CLEANUP
> 
>     +AT_SETUP([tunnel_push_pop - multiple output])
>     +
>     +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy 
> ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
>     +AT_CHECK([ovs-vsctl add-port br0 m0 -- set Interface m0 type=dummy 
> ofport_request=2])
>     +AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br 
> datapath_type=dummy], [0])
>     +AT_CHECK([ovs-vsctl add-port int-br t1 -- set Interface t1 type=gre \
>     +                       options:remote_ip=1.1.2.92 options:key=456 
> options:seq=true ofport_request=4], [0])
>     +
>     +AT_CHECK([ovs-appctl dpif/show], [0], [dnl
>     +dummy@ovs-dummy: hit:0 missed:0
>     +  br0:
>     +    br0 65534/100: (dummy-internal)
>     +    m0 2/2: (dummy)
>     +    p0 1/1: (dummy)
>     +  int-br:
>     +    int-br 65534/3: (dummy-internal)
>     +    t1 4/4: (gre: key=456, remote_ip=1.1.2.92, seq=true)
>     +])
>     +
>     +AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/24 
> <http://1.1.2.88/24>], [0], [OK
>     +])
>     +AT_CHECK([ovs-appctl netdev-dummy/ip4addr m0 1.1.3.88/24 
> <http://1.1.3.88/24>], [0], [OK
>     +])
>     +AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/24 <http://1.1.2.92/24> 
> br0], [0], [OK
>     +])
>     +AT_CHECK([ovs-appctl ovs/route/add 1.1.3.92/24 <http://1.1.3.92/24> m0], 
> [0], [OK
>     +])
>     +AT_CHECK([ovs-ofctl add-flow br0 'arp,priority=1,action=normal'])
>     +
>     +dnl Use arp reply to achieve tunnel next hop mac binding
>     +AT_CHECK([ovs-appctl netdev-dummy/receive p0 
> 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)'])
>     +
>     +dnl Ouput gre pkt to both br0 port and m0 port
>     +dnl m0 port to simulate mirror port, or flood case
>     +AT_CHECK([ovs-ofctl add-flow br0 
> 'in_port(1),ip,ip_proto=47,priority=99,action=br0,m0'])
>     +
>     +AT_CHECK([ovs-appctl netdev-dummy/receive p0 
> 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'])
>     +
>     +dnl port br0: has the configured ip address of tnl port t1, so 
> 'tnl_pop(4)'
>     +dnl port m0: does not have a configured tnl ip address, so 'output:2'
>     +AT_CHECK([ovs-appctl dpctl/dump-flows], [0], [dnl
>     +flow-dump from the main thread:
>     
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6),
>  packets:0, bytes:0, used:never, actions:100,2
>     
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(dst=1.1.2.88,proto=47,frag=no),
>  packets:0, bytes:0, used:never, actions:tnl_pop(4),2
>     +])
>     +
>     +OVS_VSWITCHD_STOP
>     +AT_CLEANUP
>     +
>      AT_SETUP([tunnel_push_pop - flow translation on unrelated bridges])
> 
>      OVS_VSWITCHD_START(
>     -- 
>     1.8.3.1
> 
>     _______________________________________________
>     dev mailing list
>     d...@openvswitch.org <mailto:d...@openvswitch.org>
>     https://mail.openvswitch.org/mailman/listinfo/ovs-dev 
> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> 
> 
> 
> -- 
> hepeng

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to