On 5/4/22 05:23, lic121 wrote:
> On Tue, May 03, 2022 at 10:18:18PM +0200, Ilya Maximets wrote:
>> On 5/3/22 15:31, Peng He wrote:
>>> Hi,
>>> this issue has been found and resolved by two independent patches:
>>>
>>> http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang...@139.com/
>>>  
>>> <http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang...@139.com/>
>>> http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0...@bytedance.com/
>>>  
>>> <http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0...@bytedance.com/>
>>>
>>> but I recently found my patch is still not fixing all the cases.
>>>
>>> would like to have a discussion about this.
>>
>> Hrm.  It seems to me that the problem is more genric then mirrors.
>> The reason is that tnl_port_map_lookup() logic doesn't work for
>> a bridge with more than one tunnel termination port.
>> Before commit dc0bd12f5b04 ("userspace: Enable non-bridge port
>> as tunnel endpoint."), only the bridge port was allowed to terminate
>> the tunnel.  MAC+IP lookup performed by the classifier inside the
>> tnl_port_map_lookup() was enough to determine if termination is
>> needed or if the packet has to be forwarded without tunnel_pop.
>> Now we have multiple ports that may or may not terminate the tunnel.
>>
>> Assuming we have 2 different tunnel ports and two different normal
>> ports with different IP addresses and we expect the first tunnel
>> to be terminated on the first port and the second tunnel to be
>> terminated on the second port.  At the same time, the packet from
>> the first tunnel should be forwarded to the second port without
>> tunnel_pop and the packet from the second tunnel should be
>> forwarded to the first port without tunnel_pop.
>>
>> Since the classifier in the tunnel-port.c will have rules for
>> both tunnels and it only matches on MAC+IP (+ some other fileds,
>> but we don't care much right now), we will have decapsulation
>> triggered in all cases.  The same as what happened with the
>> mirrored traffic.
>>
>> Two patcehs mentioned above only focused on mirrors, so they
>> will not fix the problem in a common case.  The patch below
>> is closer to a solution, but it will skip the matches on dl_dst,
>> and, possibly, other tunnel header fileds, which we absolutely
>> need in a non-mirror case with 2 tunnels.
> 
> Hi Ilya, you mean we need to check dl_dst as well before tnl classifier
> lookup? For the case that two ports have the same ip address?

If two ports on the same bridge has the same IP, that sounds
like a misconfiguration (I'd argue that having IP assigned
to a bridge member is a misconfiguration in general, but we
have what we have...), so I'm not sure if we need to check
for IP additionally.  But we must unwildcard (add to a flow
match) fields that we're using to make a decision.
tnl_port_map_lookup() does that in a generic way.  It has
information about all the MACs, IPs, VLANs and other fileds
that may or may not be significant for the lookup.
Generic classifier lookup inside that function unwildcards
all the fileds that was actually used during lookup.

So, I'm not convinced if just checking the IP is sufficient,
but we definitely have to unwildcard it in a flow match if
we're checking it.

> 
>>
>> The correct solution would be to add an output port match to
>> the classifier, but clasifier is not designed to do that and
>> 'struct match' doesn't have an appropriate field.  It shoudl
>> be a separate classifier per ipdev, or something like that.
>>
>> Thoughts?
>>
>> In any case, every part of the packet that was used to make a
>> decision must be unwildcarded.  IP on the port is not part of
>> the packet, but dst_ip is.
>>
>> Best regards, Ilya Maximets.
>>
>>>
>>>
>>>
>>> lic121 <lic...@chinatelecom.cn <mailto:lic...@chinatelecom.cn>> 
>>> 于2022年5月3日周二 20:55写道:
>>>
>>>     In compose_output_action__(), terminate_native_tunnel() is called
>>>     to test if a pkt matches a tnl port. If yes, a TUNNEL_POP action
>>>     is appended. Instead of outputing the pkt to the origin port, pkt
>>>     is redirected into the tnl port.
>>>
>>>     pkt metadata(i.e. nw_proto, dst_ip, dst_port) is considered when
>>>     testing a pkt regardless of the out port until Ilya's patch[1].
>>>     Patch[1] adds the condition that terminal ports must have IP
>>>     address. This patch is to make addtional port ip check that port
>>>     ip address must match the pkt dst ip. This check covers the mirror
>>>     port case:
>>>     To tcpdump underlay pkt, we may create mirror port of dpdk port.
>>>     When compose_output_action__() is called on the mirror port,
>>>     pkt is redirected into tnl port again. As a result, ping shows
>>>     DUP. fast path flow has two tnl_pop actions.
>>>
>>>     ```
>>>     [root@ovs1 vagrant]# ping 172.16.100.2
>>>     PING 172.16.100.2 (172.16.100.2) 56(84) bytes of data.
>>>     64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1 ttl=64 
>>> time=0.606 ms
>>>     64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1 ttl=64 
>>> time=0.617 ms (DUP!)
>>>     ```
>>>
>>>     ```
>>>     [root@ovs ovs]# ovs-appctl dpctl/dump-flows |grep tnl_pop
>>>     
>>> recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=52:54:00:a6:c6:de,dst=ea:b4:09:72:44:4c),eth_type(0x0800),ipv4(dst=192.168.124.33,proto=17,frag=no),udp(dst=8472),
>>>  packets:1, bytes:148, used:0.637s, actions:tnl_pop(2),tnl_pop(2)
>>>     ```
>>>
>>>     [1] (d584bb2b6a1 "ofproto-dpif-xlate: Terminate native tunnels only on 
>>> ports with IP addresses.")
>>>
>>>     Signed-off-by: lic121 <lic...@chinatelecom.cn 
>>> <mailto:lic...@chinatelecom.cn>>
>>>     ---
>>>      ofproto/ofproto-dpif-xlate.c | 45 
>>> +++++++++++++++++++++++++++-------------
>>>      tests/tunnel-push-pop.at <http://tunnel-push-pop.at>     | 49 
>>> ++++++++++++++++++++++++++++++++++++++++++++
>>>      2 files changed, 80 insertions(+), 14 deletions(-)
>>>
>>>     diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>>>     index 17f7e28..30e636e 100644
>>>     --- a/ofproto/ofproto-dpif-xlate.c
>>>     +++ b/ofproto/ofproto-dpif-xlate.c
>>>     @@ -4107,18 +4107,26 @@ is_neighbor_reply_correct(const struct 
>>> xlate_ctx *ctx, const struct flow *flow)
>>>      }
>>>
>>>      static bool
>>>     -xport_has_ip(const struct xport *xport)
>>>     +xport_has_ip(const struct xport *xport, const struct in6_addr ip6)
>>>      {
>>>          struct in6_addr *ip_addr, *mask;
>>>          int n_in6 = 0;
>>>     +    int i;
>>>     +    bool hasip = false;
>>>
>>>          if (netdev_get_addr_list(xport->netdev, &ip_addr, &mask, &n_in6)) {
>>>              n_in6 = 0;
>>>          } else {
>>>     +        for (i = 0; i < n_in6; i++) {
>>>     +            if (IN6_ARE_ADDR_EQUAL(ip_addr + i, &ip6)) {
>>>     +                hasip = true;
>>>     +                break;
>>>     +            }
>>>     +        }
>>>              free(ip_addr);
>>>              free(mask);
>>>          }
>>>     -    return n_in6 ? true : false;
>>>     +    return hasip;
>>>      }
>>>
>>>      static bool
>>>     @@ -4127,26 +4135,35 @@ terminate_native_tunnel(struct xlate_ctx *ctx, 
>>> const struct xport *xport,
>>>                              odp_port_t *tnl_port)
>>>      {
>>>          *tnl_port = ODPP_NONE;
>>>     +    bool ip_pkt = true;
>>>     +    struct in6_addr ip_dst;
>>>     +
>>>     +    if (flow->dl_type == htons(ETH_TYPE_IP)) {
>>>     +        ip_dst = in6_addr_mapped_ipv4(flow->nw_dst);
>>>     +    } else if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
>>>     +        ip_dst = flow->ipv6_dst;
>>>     +    } else {
>>>     +        ip_pkt = false;
>>>     +    }
>>>
>>>          /* XXX: Write better Filter for tunnel port. We can use in_port
>>>           * in tunnel-port flow to avoid these checks completely.
>>>           *
>>>     -     * Port without an IP address cannot be a tunnel termination point.
>>>     -     * Not performing a lookup in this case to avoid unwildcarding 
>>> extra
>>>     -     * flow fields (dl_dst). */
>>>     -    if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
>>>     -        && xport_has_ip(xport)) {
>>>     +     * Not performing a lookup if dst ip not match, to avoid 
>>> unwildcarding
>>>     +     * extra flow fields (dl_dst). Also avoid duplicate tnl_pop. */
>>>     +    if ((flow->dl_type == htons(ETH_TYPE_ARP) ||
>>>     +                flow->nw_proto == IPPROTO_ICMPV6) &&
>>>     +                is_neighbor_reply_correct(ctx, flow)) {
>>>     +            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
>>>     +                            ctx->xin->allow_side_effects);
>>>     +
>>>     +    } else if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
>>>     +        && ip_pkt && xport_has_ip(xport, ip_dst)) {
>>>              *tnl_port = tnl_port_map_lookup(flow, wc);
>>>
>>>              /* If no tunnel port was found and it's about an ARP or ICMPv6 
>>> packet,
>>>               * do tunnel neighbor snooping. */
>>>     -        if (*tnl_port == ODPP_NONE &&
>>>     -            (flow->dl_type == htons(ETH_TYPE_ARP) ||
>>>     -             flow->nw_proto == IPPROTO_ICMPV6) &&
>>>     -             is_neighbor_reply_correct(ctx, flow)) {
>>>     -            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
>>>     -                            ctx->xin->allow_side_effects);
>>>     -        } else if (*tnl_port != ODPP_NONE &&
>>>     +        if (*tnl_port != ODPP_NONE &&
>>>                         ctx->xin->allow_side_effects &&
>>>                         dl_type_is_ip_any(flow->dl_type)) {
>>>                  struct eth_addr mac = flow->dl_src;
>>>     diff --git a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at> 
>>> b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
>>>     index c633441..b3a3bf9 100644
>>>     --- a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
>>>     +++ b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
>>>     @@ -808,6 +808,55 @@ NXST_FLOW reply:
>>>      OVS_VSWITCHD_STOP
>>>      AT_CLEANUP
>>>
>>>     +AT_SETUP([tunnel_push_pop - multiple output])
>>>     +
>>>     +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy 
>>> ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
>>>     +AT_CHECK([ovs-vsctl add-port br0 m0 -- set Interface m0 type=dummy 
>>> ofport_request=2])
>>>     +AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br 
>>> datapath_type=dummy], [0])
>>>     +AT_CHECK([ovs-vsctl add-port int-br t1 -- set Interface t1 type=gre \
>>>     +                       options:remote_ip=1.1.2.92 options:key=456 
>>> options:seq=true ofport_request=4], [0])
>>>     +
>>>     +AT_CHECK([ovs-appctl dpif/show], [0], [dnl
>>>     +dummy@ovs-dummy: hit:0 missed:0
>>>     +  br0:
>>>     +    br0 65534/100: (dummy-internal)
>>>     +    m0 2/2: (dummy)
>>>     +    p0 1/1: (dummy)
>>>     +  int-br:
>>>     +    int-br 65534/3: (dummy-internal)
>>>     +    t1 4/4: (gre: key=456, remote_ip=1.1.2.92, seq=true)
>>>     +])
>>>     +
>>>     +AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/24 
>>> <http://1.1.2.88/24>], [0], [OK
>>>     +])
>>>     +AT_CHECK([ovs-appctl netdev-dummy/ip4addr m0 1.1.3.88/24 
>>> <http://1.1.3.88/24>], [0], [OK
>>>     +])
>>>     +AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/24 <http://1.1.2.92/24> 
>>> br0], [0], [OK
>>>     +])
>>>     +AT_CHECK([ovs-appctl ovs/route/add 1.1.3.92/24 <http://1.1.3.92/24> 
>>> m0], [0], [OK
>>>     +])
>>>     +AT_CHECK([ovs-ofctl add-flow br0 'arp,priority=1,action=normal'])
>>>     +
>>>     +dnl Use arp reply to achieve tunnel next hop mac binding
>>>     +AT_CHECK([ovs-appctl netdev-dummy/receive p0 
>>> 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)'])
>>>     +
>>>     +dnl Ouput gre pkt to both br0 port and m0 port
>>>     +dnl m0 port to simulate mirror port, or flood case
>>>     +AT_CHECK([ovs-ofctl add-flow br0 
>>> 'in_port(1),ip,ip_proto=47,priority=99,action=br0,m0'])
>>>     +
>>>     +AT_CHECK([ovs-appctl netdev-dummy/receive p0 
>>> 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'])
>>>     +
>>>     +dnl port br0: has the configured ip address of tnl port t1, so 
>>> 'tnl_pop(4)'
>>>     +dnl port m0: does not have a configured tnl ip address, so 'output:2'
>>>     +AT_CHECK([ovs-appctl dpctl/dump-flows], [0], [dnl
>>>     +flow-dump from the main thread:
>>>     
>>> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6),
>>>  packets:0, bytes:0, used:never, actions:100,2
>>>     
>>> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(dst=1.1.2.88,proto=47,frag=no),
>>>  packets:0, bytes:0, used:never, actions:tnl_pop(4),2
>>>     +])
>>>     +
>>>     +OVS_VSWITCHD_STOP
>>>     +AT_CLEANUP
>>>     +
>>>      AT_SETUP([tunnel_push_pop - flow translation on unrelated bridges])
>>>
>>>      OVS_VSWITCHD_START(
>>>     -- 
>>>     1.8.3.1
>>>
>>>     _______________________________________________
>>>     dev mailing list
>>>     d...@openvswitch.org <mailto:d...@openvswitch.org>
>>>     https://mail.openvswitch.org/mailman/listinfo/ovs-dev 
>>> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>>>
>>>
>>>
>>> -- 
>>> hepeng
>>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to