On Wed, Jan 11, 2023 at 4:27 PM Ilya Maximets <[email protected]> wrote:

> On 1/11/23 16:09, Ales Musil wrote:
> >
> >
> > On Wed, Jan 11, 2023 at 3:52 PM Ilya Maximets <[email protected]
> <mailto:[email protected]>> wrote:
> >
> >     On 1/11/23 14:09, Ales Musil wrote:
> >     >
> >     >
> >     > On Wed, Jan 11, 2023 at 1:34 PM Ilya Maximets <[email protected]
> <mailto:[email protected]> <mailto:[email protected] <mailto:
> [email protected]>>> wrote:
> >     >
> >     >     On 1/11/23 07:53, Ales Musil wrote:
> >     >     >
> >     >     >
> >     >     > On Tue, Jan 10, 2023 at 11:08 PM Ilya Maximets <
> [email protected] <mailto:[email protected]> <mailto:[email protected]
> <mailto:[email protected]>> <mailto:[email protected] <mailto:
> [email protected]> <mailto:[email protected] <mailto:[email protected]>>>>
> wrote:
> >     >     >
> >     >     >     On 12/16/22 13:30, Ales Musil wrote:
> >     >     >     > Currently, the CT can be flushed by dpctl only be
> specifying
> >     >     >
> >     >     >     *by
> >     >     >
> >     >     >     > the whole 5-tuple. This is not very convenient when
> there are
> >     >     >     > only some fields known to the user of CT flush. Add
> new struct
> >     >     >     > ofputil_ct_match which represents the generic
> filtering that can
> >     >     >     > be done for CT flush. The match is done only on fields
> that are
> >     >     >     > non-zero with exception to the icmp fields.
> >     >     >     >
> >     >     >     > This allows the filtering just within dpctl, however
> >     >     >     > it is a preparation for OpenFlow extension.
> >     >     >     >
> >     >     >     > Reported-at: https://bugzilla.redhat.com/2120546 <
> https://bugzilla.redhat.com/2120546> <https://bugzilla.redhat.com/2120546
> <https://bugzilla.redhat.com/2120546>> <
> https://bugzilla.redhat.com/2120546 <https://bugzilla.redhat.com/2120546>
> <https://bugzilla.redhat.com/2120546 <https://bugzilla.redhat.com/2120546
> >>>
> >     >     >     > Signed-off-by: Ales Musil <[email protected] <mailto:
> [email protected]> <mailto:[email protected] <mailto:[email protected]>>
> <mailto:[email protected] <mailto:[email protected]> <mailto:
> [email protected] <mailto:[email protected]>>>>
> >     >     >     > Acked-by: Paolo Valerio <[email protected] <mailto:
> [email protected]> <mailto:[email protected] <mailto:
> [email protected]>> <mailto:[email protected] <mailto:
> [email protected]> <mailto:[email protected] <mailto:
> [email protected]>>>>
> >     >     >     > ---
> >     >     >     > v5: Add ack from Paolo.
> >     >     >     > v4: Fix a flush all scenario.
> >     >     >     > v3: Rebase on top of master.
> >     >     >     >     Address the C99 comment and missing dpif_close
> call.
> >     >     >     > v2: Rebase on top of master.
> >     >     >     >     Address comments from Paolo.
> >     >     >     > ---
> >     >     >
> >     >     >     Hi, Ales.  Thanks for working on this change.
> >     >     >     Sorry for review taking so long.  I initially though
> there
> >     >     >     will be mostly a few cosmetic/style changes needed.  But
> >     >     >     longer I look at this code, more questions I have.  See
> below.
> >     >     >
> >     >     >     Dumitru, maybe you can also help answer some of them?
> >     >     >
> >     >     >     Best regards, Ilya Maximets.
> >     >     >
> >     >     >
> >     >     > Hi Ilya,
> >     >     >
> >     >     > thank you for the review. Before proceeding I tried to
> answer your concerns/questions.
> >     >     > Let me know what you think.
> >     >     >
> >     >     > Thanks,
> >     >     > Ales
> >     >     >
> >     >     >
> >     >     >
> >     >     >     >  NEWS                           |   2 +
> >     >     >     >  include/openvswitch/ofp-util.h |  28 +++
> >     >     >     >  lib/automake.mk <http://automake.mk> <
> http://automake.mk <http://automake.mk>> <http://automake.mk <
> http://automake.mk> <http://automake.mk <http://automake.mk>>>
>     |   2 +
> >     >     >     >  lib/ct-dpif.c                  | 208
> +++++++++------------
> >     >     >     >  lib/ct-dpif.h                  |   4 +-
> >     >     >     >  lib/dpctl.c                    |  42 ++---
> >     >     >     >  lib/dpctl.man                  |   3 +-
> >     >     >     >  lib/ofp-ct-util.c              | 326
> +++++++++++++++++++++++++++++++++
> >     >     >     >  lib/ofp-ct-util.h              |  36 ++++
> >     >     >     >  tests/system-traffic.at <http://system-traffic.at> <
> http://system-traffic.at <http://system-traffic.at>> <
> http://system-traffic.at <http://system-traffic.at> <
> http://system-traffic.at <http://system-traffic.at>>>        |  82
> ++++++++-
> >     >     >     >  10 files changed, 586 insertions(+), 147 deletions(-)
> >     >     >     >  create mode 100644 lib/ofp-ct-util.c
> >     >     >     >  create mode 100644 lib/ofp-ct-util.h
> >     >     >     >
> >     >     >     > diff --git a/NEWS b/NEWS
> >     >     >     > index 265375e1c..ff8904b02 100644
> >     >     >     > --- a/NEWS
> >     >     >     > +++ b/NEWS
> >     >     >     > @@ -14,6 +14,8 @@ Post-v3.0.0
> >     >     >     >       10 Gbps link speed by default in case the actual
> link speed cannot be
> >     >     >     >       determined.  Previously it was 10 Mbps.  Values
> can still be overridden
> >     >     >     >       by specifying 'max-rate' or '[r]stp-path-cost'
> accordingly.
> >     >     >     > +   - ovs-dpctl and related ovs-appctl commands:
> >     >     >
> >     >     >     "ovs-dpctl and 'ovs-appctl dpctl/'" might be a better
> header.
> >     >     >
> >     >     >     > +     * "flush-conntrack" is capable of handling
> partial 5-tuple.
> >     >     >     >
> >     >     >     >
> >     >     >     >  v3.0.0 - 15 Aug 2022
> >     >     >     > diff --git a/include/openvswitch/ofp-util.h
> b/include/openvswitch/ofp-util.h
> >     >     >     > index 091a09cad..84937ae26 100644
> >     >     >     > --- a/include/openvswitch/ofp-util.h
> >     >     >     > +++ b/include/openvswitch/ofp-util.h
> >     >     >     > @@ -19,6 +19,9 @@
> >     >     >     >
> >     >     >     >  #include <stdbool.h>
> >     >     >     >  #include <stdint.h>
> >     >     >     > +#include <sys/types.h>
> >     >     >     > +#include <netinet/in.h>
> >     >     >     > +
> >     >     >     >  #include "openvswitch/ofp-protocol.h"
> >     >     >     >
> >     >     >     >  struct ofp_header;
> >     >     >     > @@ -27,6 +30,31 @@ struct ofp_header;
> >     >     >     >  extern "C" {
> >     >     >     >  #endif
> >     >     >     >
> >     >     >     > +struct ofputil_ct_tuple {
> >     >     >     > +    struct in6_addr src;
> >     >     >     > +    struct in6_addr dst;
> >     >     >     > +
> >     >     >     > +    union {
> >     >     >     > +        ovs_be16 src_port;
> >     >     >     > +        ovs_be16 icmp_id;
> >     >     >     > +    };
> >     >     >     > +    union {
> >     >     >     > +        ovs_be16 dst_port;
> >     >     >     > +        struct {
> >     >     >     > +            uint8_t icmp_code;
> >     >     >     > +            uint8_t icmp_type;
> >     >     >     > +        };
> >     >     >     > +    };
> >     >     >     > +};
> >     >     >     > +
> >     >     >     > +struct ofputil_ct_match {
> >     >     >     > +    uint8_t ip_proto;
> >     >     >     > +    uint16_t l3_type;
> >     >     >     > +
> >     >     >     > +    struct ofputil_ct_tuple tuple_orig;
> >     >     >     > +    struct ofputil_ct_tuple tuple_reply;
> >     >     >     > +};
> >     >     >
> >     >     >     These structures are very similar to existing
> ct_dpif_tuple.
> >     >     >     We can keep them for now, but I think, we'll need to
> replace
> >     >     >     ct_dpif_tuple with the new ofputil_ct_tuple in the
> future.
> >     >     >     We may need some changes in ofputil_ct_tuple for that
> though.
> >     >     >
> >     >     >
> >     >     > I was thinking about replacing them, however it would
> require a much
> >     >     > bigger change, which was out of scope of this RFE. Also I
> was not entirely
> >     >     > sure if there wasn't any specific reason for the
> ct_dpif_tuple specifically
> >     >     > the address part. It seems like optimization to me.
> >     >     >
> >     >     > Anyway it should be possible to unify those if needed.
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >     OTOH, I'm not sure I understand why we need both tuples
> here.
> >     >     >     Can we have a tuple and a flag that indicates a direction
> >     >     >     instead?
> >     >     >
> >     >     >
> >     >     > That is also a possibility. However that wouldn't fit too
> that nicely in the
> >     >     > whole match struct, we could probably duplicate the proto
> field and get rid of the
> >     >     > l3_type. That way we could supply just two tuples instead,
> getting rid of the match completely.
> >     >
> >     >     The point was to have only one tuple.  See below.
> >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >     There are several issues with having both original and
> reply
> >     >     >     tuples, IIUC:
> >     >     >
> >     >     >     1. None of the command line interfaces support specifying
> >     >     >        reply tuple.
> >     >     >
> >     >     >
> >     >     > That was the intention of the original command AFAICT, we
> could modify the commands
> >     >     > to support that. However I'm not sure if there is any
> benefit in that.
> >     >
> >     >     ovs-ofctl supposed to mimic the OpenFlow request, so it should
> >     >     support everything that OpenFlow request can have.
> >     >
> >     >
> >     > I see, sure that can definitely be done, however we should
> probably think how this can
> >     > be done in a more universal way that could be used also for dpctl.
> >     >
> >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >     2. If the reply tuple will be specified via OpenFlow, it
> >     >     >        will be ignored if the original direction tuple is
> >     >     >        fully specified.
> >     >     >
> >     >     >
> >     >     > That is ok IMO, because a fully specified original should be
> enough to determine the one
> >     >     > entry that needs to be erased. So the reply direction in
> that would be an extra.
> >     >
> >     >     That supports the point that we don't need two tuples in the
> request.
> >     >
> >     >
> >     > Right but only if we are able to supply the full original
> direction. See below.
> >     >
> >     >
> >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >     3. Netfilter kernel interface allows specifying only one
> >     >     >        tuple, not both, AFAICT.
> >     >     >
> >     >     >
> >     >     > Right, that's the reason why, when we match, we use the
> original tuple to erase that
> >     >     > entry.
> >     >     >
> >     >     >
> >     >     >
> >     >     >     I'm not a conntrack expert, so I have a question: Is it
> possible
> >     >     >     that addresses and ports in original and reply
> directions are
> >     >     >     not mirrored copies?  Or do we care about these cases?
> >     >     >
> >     >     >
> >     >     > Not sure if it's possible, but even if it would be, the
> current implementation
> >     >     > would account for that, with the exception of specifying
> fully the original
> >     >     > direction.
> >     >     >
> >     >     >
> >     >     >
> >     >     >     If we want to flush all entries related to the same
> IP:PORT,
> >     >     >     we will not be able to do that in a single OpenFlow
> request,
> >     >     >     because we'll need to supply the same IP and PORT as src
> >     >     >     and dst, or as src in both original and reply directions.
> >     >     >     None of these can fit into a single request.
> >     >     >
> >     >     >
> >     >     > Is that really a problem? You could do that in two, one for
> the original
> >     >     > one for reply. This is also very similar to the interface
> supported by conntrack-tools,
> >     >     > with those you couldn't remove all entries in a single call
> as well. We could modify
> >     >     > the interface that would allow matching on tuple without
> direction specification,
> >     >     > that would translate to both directions, however there is
> still the question if that's needed.
> >     >
> >     >
> >     >     It's not a problem, but another supporting point for the
> interface
> >     >     being redundant.  We don't seem to need two tuples in the
> request,
> >     >     because users will have to send two different requests anyway.
> >     >
> >     >     So, the request should have:
> >     >
> >     >     1. zone
> >     >     2. nw_proto
> >     >     3. tuple
> >     >     4. tuple type ('original' or 'reply'), in case we want to allow
> >     >        users to flush based on reply tuple.  But this seems a bit
> >     >        redundant as well.
> >     >
> >     >     Something like:
> >     >
> >     >     struct ofputil_ct_tuple {
> >     >     }
> >     >
> >     >     struct ofputil_ct_match {
> >     >         uint8_t ip_proto;
> >     >         uint16_t l3_type;
> >     >         // enum DIRECTION direction;
> >     >         struct ofputil_ct_tuple tuple;
> >     >     };
> >     >
> >     >
> >     >     And the OpenFlow part:
> >     >
> >     >     /* CT flush available TLVs. */
> >     >     enum nx_ct_flush_tlv_type {
> >     >         NXT_CT_ZONE_ID,           /* u8: CT zone id. */
> >     >         NXT_CT_TUPLE,             /* Nested NXT_CT_TUPLE_*
> attributes. */
> >     >     };
> >     >
> >     >     enum nx_ct_flush_tuple_type {
> >     >         NXT_CT_TUPLE_SRC,         /* be128: source IPv6 or mapped
> IPv4 address. */
> >     >         NXT_CT_TUPLE_DST,         /* be128: destination IPv6 or
> mapped IPv4 address. */
> >     >         NXT_CT_TUPLE_SRC_PORT,    /* be16: source port. */
> >     >         NXT_CT_TUPLE_DST_PORT,    /* be16: destination port. */
> >     >         NXT_CT_TUPLE_ICMP_ID,     /* be16: ICMP id. */
> >     >         NXT_CT_TUPLE_ICMP_TYPE,   /* u8: ICMP type. */
> >     >         NXT_CT_TUPLE_ICMP_CODE,   /* u8: ICMP code. */
> >     >         // NXT_CT_TUPLE_DIRECTION,   /* enum
> nx_ct_flush_tuple_direction: CT tuple direction. */
> >     >     };
> >     >
> >     >     //enum nx_ct_flush_tuple_direction {
> >     >     //    NXT_CT_TUPLE_DIRECTION_ORIG = 1,
> >     >     //    NXT_CT_TUPLE_DIRECTION_REPLY = 2,
> >     >     //}
> >     >
> >     >     struct nx_ct_flush {
> >     >         uint8_t nw_proto;          /* IP protocol. */
> >     >         uint8_t pad[7];            /* Align to 64-bit (must be
> zero). */
> >     >         /* Followed by ... */
> >     >     };
> >     >
> >     >
> >     >     Such interface will also allow us in to have both direction
> specified
> >     >     if we would want that in the future.  For now we should check
> that
> >     >     each TLV type is specified once, i.e. only one tuple is in the
> request.
> >     >
> >     >     We may or may not add NXT_CT_TUPLE_DIRECTION support now.  Can
> add it
> >     >     in later releases if necessary.  Tuples without that TLV can
> be treated
> >     >     as original direction.
> >     >
> >     >
> >     > The original idea was something similar but as stated below that
> wouldn't really help
> >     > us in the OVN case. And there is probably no significant benefit
> in having partial match
> >     > with only single direction at time. I'm not sure if I'm not
> missing something obvious.
> >     >
> >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >     At this point is there actually any value in flushing
> basing
> >     >     >     on reply direction?  Shouldn't we be able to flush the
> same
> >     >     >     entries by flushing by dst in original direction, if we
> will
> >     >     >     need to issue multiple requests anyway?
> >     >     >
> >     >     >
> >     >     > That assumption would be fine when you can specify a full
> tuple for the original direction.
> >     >     > The problem is that for example OVN we know only the
> original DST_IP:DST_PORT (VIP)
> >     >     > and the reply SRC_IP:SRC_PORT (backend).
> >     >
> >     >     Isn't original DST_IP:DST_PORT equal to reply SRC_IP:SRC_PORT ?
> >     >
> >     >
> >     > Unfortunately no, it's DNATed so for example
> >     > 'ovn-nbctl lb-add lb1 30.0.0.1 "172.16.1.2,172.16.1.3,172.16.1.4"'
> >     > will have CT entries as follows:
> >     >
> >     >
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.2,dst=192.168.1.2,sport=80,dport=1),zone=0,mark=2
> >     >
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=2,dport=80),reply=(src=172.16.1.3,dst=192.168.1.2,sport=80,dport=2),zone=0,mark=2
> >     >
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=3,dport=80),reply=(src=172.16.1.4,dst=192.168.1.2,sport=80,dport=3),zone=0,mark=2
> >     >
> >     > We cannot simply filter on the original direction, as we cannot
> infer the SRC part (without filtering all entries etc.).
> >
> >     OK.  This is what I was asking above and you said
> >     'Not sure if it's possible'.  So, it is possible,
> >     I see.
> >
> >
> > Yeah my bad I did not connect the dots right away.
> >
> >
> >
> >     >
> >     >
> >     >
> >     >
> >     >     > That means we cannot fully specify the original direction
> >     >     > without doing the filtering ourselves. It would be possible
> with extra steps from the user. If this interface
> >     >     > is too generic/complicated for that purpose we could say
> that the flush extension supports only full orig tuple
> >     >     > and leave the filtering up to the user. That would
> definitely work, it wouldn't be that elegant for the user however
> >     >     > from the code change perspective much simpler.
> >     >
> >     >     I'm not sure I understand that part.  I'm not arguing aginst
> partial
> >     >     matches.  Or do you mean something else?
> >     >
> >     >     >
> >     >     >  I can see a certain benefit of keeping it in OvS:
> >     >     >
> >     >     > 1) The filtering complexity is hidden from the user.
> >     >     > 2) The data transfer between OvS and the user is much
> smaller (even if you need to send two matches).
> >     >     > 3) Once implemented it's just ready to be used without the
> need of additional thoughts going into it.
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >     Did you start working on the OVN side of things?  Which
> >     >     >     commands OVN will actually use?
> >     >     >
> >     >     >
> >     >     > Yes, I did. The plan is that OVN will use the extension.
> >     >
> >     >     I understand that.  The question was how the actual request
> will
> >     >     look like.
> >     >
> >     >
> >     > The testing version used something like this:
> >     >
> >     >     memset(match, 0, sizeof *match);
> >     >
> >     >     struct ofputil_ct_tuple *orig = &match->tuple_orig;
> >     >     struct ofputil_ct_tuple *reply = &match->tuple_reply;
> >     >
> >     >     if (IN6_IS_ADDR_V4MAPPED(&tuple->vip_ip)) {
> >     >         match->l3_type = AF_INET;
> >     >     } else {
> >     >         match->l3_type = AF_INET6;
> >     >     }
> >     >
> >     >     match->ip_proto = tuple->proto;
> >     >
> >     >     orig->dst = tuple->vip_ip;
> >     >     orig->dst_port = htons(tuple->vip_port);
> >     >
> >     >     reply->src = tuple->backend_ip;
> >     >     reply->src_port = htons(tuple->backend_port);
> >     >
> >     > tuple is the struct containing lb info, so VIP, VIP_PORT, BACKEND,
> BACKEND_PORT, PROTO.
> >     > match is the "struct ofputil_ct_match".
> >     >
> >     > This is with older version of this RFE but it should illustrate
> the point.
> >
> >     Assuming the current v5 implementation of the extension, will the end
> >     result of the request be different if we remove the assignment of
> >     'orig=>dst' and 'orig->dst_port' from the code above?
> >
> >
> > It might delete more than we actually want to. Without specifying the
> LBs 5-tuple
> > we might always remove more entries than we should. As we can have
> multiple backends
> > for the same VIP (that's the whole point of LB), however nothing
> prevents us from two VIPs
> > pointing to the same backend. Which would cause an issue if we will go
> only with the reply part.
>
>
> OK.  That makes sense.  Let's keep both tuples then.
>
> But we need to allow ovs-ofctl to accept both tuples.
>
> And we should use a "shortcut" only if reply tuple is not specified,
> because if I'm asking to delete:
>
>
>  
> orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.3,dst=192.168.1.2,sport=80,dport=1)
>
> I'm not expecting the following entry to be deleted:
>
>
>  
> orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.2,dst=192.168.1.2,sport=80,dport=1)
>
> Even though these entries probably cannot exist at the same time,
> we should not delete the entry that we wasn't asked to delete.
>
> Best regards, Ilya Maximets.
>
>
That makes sense. I'll work on that + the smaller comments.

Thanks,
Ales



-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA <https://www.redhat.com>

[email protected]    IM: amusil
<https://red.ht/sig>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to