On 1/11/23 14:09, Ales Musil wrote:
> 
> 
> On Wed, Jan 11, 2023 at 1:34 PM Ilya Maximets <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     On 1/11/23 07:53, Ales Musil wrote:
>     >
>     >
>     > On Tue, Jan 10, 2023 at 11:08 PM Ilya Maximets <[email protected] 
> <mailto:[email protected]> <mailto:[email protected] 
> <mailto:[email protected]>>> wrote:
>     >
>     >     On 12/16/22 13:30, Ales Musil wrote:
>     >     > Currently, the CT can be flushed by dpctl only be specifying
>     >
>     >     *by
>     >
>     >     > the whole 5-tuple. This is not very convenient when there are
>     >     > only some fields known to the user of CT flush. Add new struct
>     >     > ofputil_ct_match which represents the generic filtering that can
>     >     > be done for CT flush. The match is done only on fields that are
>     >     > non-zero with exception to the icmp fields.
>     >     >
>     >     > This allows the filtering just within dpctl, however
>     >     > it is a preparation for OpenFlow extension.
>     >     >
>     >     > Reported-at: https://bugzilla.redhat.com/2120546 
> <https://bugzilla.redhat.com/2120546> <https://bugzilla.redhat.com/2120546 
> <https://bugzilla.redhat.com/2120546>>
>     >     > Signed-off-by: Ales Musil <[email protected] 
> <mailto:[email protected]> <mailto:[email protected] 
> <mailto:[email protected]>>>
>     >     > Acked-by: Paolo Valerio <[email protected] 
> <mailto:[email protected]> <mailto:[email protected] 
> <mailto:[email protected]>>>
>     >     > ---
>     >     > v5: Add ack from Paolo.
>     >     > v4: Fix a flush all scenario.
>     >     > v3: Rebase on top of master.
>     >     >     Address the C99 comment and missing dpif_close call.
>     >     > v2: Rebase on top of master.
>     >     >     Address comments from Paolo.
>     >     > ---
>     >
>     >     Hi, Ales.  Thanks for working on this change.
>     >     Sorry for review taking so long.  I initially though there
>     >     will be mostly a few cosmetic/style changes needed.  But
>     >     longer I look at this code, more questions I have.  See below.
>     >
>     >     Dumitru, maybe you can also help answer some of them?
>     >
>     >     Best regards, Ilya Maximets.
>     >
>     >
>     > Hi Ilya,
>     >
>     > thank you for the review. Before proceeding I tried to answer your 
> concerns/questions.
>     > Let me know what you think.
>     >
>     > Thanks,
>     > Ales
>     >  
>     >
>     >
>     >     >  NEWS                           |   2 +
>     >     >  include/openvswitch/ofp-util.h |  28 +++
>     >     >  lib/automake.mk <http://automake.mk> <http://automake.mk 
> <http://automake.mk>>                |   2 +
>     >     >  lib/ct-dpif.c                  | 208 +++++++++------------
>     >     >  lib/ct-dpif.h                  |   4 +-
>     >     >  lib/dpctl.c                    |  42 ++---
>     >     >  lib/dpctl.man                  |   3 +-
>     >     >  lib/ofp-ct-util.c              | 326 
> +++++++++++++++++++++++++++++++++
>     >     >  lib/ofp-ct-util.h              |  36 ++++
>     >     >  tests/system-traffic.at <http://system-traffic.at> 
> <http://system-traffic.at <http://system-traffic.at>>        |  82 ++++++++-
>     >     >  10 files changed, 586 insertions(+), 147 deletions(-)
>     >     >  create mode 100644 lib/ofp-ct-util.c
>     >     >  create mode 100644 lib/ofp-ct-util.h
>     >     >
>     >     > diff --git a/NEWS b/NEWS
>     >     > index 265375e1c..ff8904b02 100644
>     >     > --- a/NEWS
>     >     > +++ b/NEWS
>     >     > @@ -14,6 +14,8 @@ Post-v3.0.0
>     >     >       10 Gbps link speed by default in case the actual link speed 
> cannot be
>     >     >       determined.  Previously it was 10 Mbps.  Values can still 
> be overridden
>     >     >       by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
>     >     > +   - ovs-dpctl and related ovs-appctl commands:
>     >
>     >     "ovs-dpctl and 'ovs-appctl dpctl/'" might be a better header.
>     >
>     >     > +     * "flush-conntrack" is capable of handling partial 5-tuple.
>     >     > 
>     >     > 
>     >     >  v3.0.0 - 15 Aug 2022
>     >     > diff --git a/include/openvswitch/ofp-util.h 
> b/include/openvswitch/ofp-util.h
>     >     > index 091a09cad..84937ae26 100644
>     >     > --- a/include/openvswitch/ofp-util.h
>     >     > +++ b/include/openvswitch/ofp-util.h
>     >     > @@ -19,6 +19,9 @@
>     >     > 
>     >     >  #include <stdbool.h>
>     >     >  #include <stdint.h>
>     >     > +#include <sys/types.h>
>     >     > +#include <netinet/in.h>
>     >     > +
>     >     >  #include "openvswitch/ofp-protocol.h"
>     >     > 
>     >     >  struct ofp_header;
>     >     > @@ -27,6 +30,31 @@ struct ofp_header;
>     >     >  extern "C" {
>     >     >  #endif
>     >     > 
>     >     > +struct ofputil_ct_tuple {
>     >     > +    struct in6_addr src;
>     >     > +    struct in6_addr dst;
>     >     > +
>     >     > +    union {
>     >     > +        ovs_be16 src_port;
>     >     > +        ovs_be16 icmp_id;
>     >     > +    };
>     >     > +    union {
>     >     > +        ovs_be16 dst_port;
>     >     > +        struct {
>     >     > +            uint8_t icmp_code;
>     >     > +            uint8_t icmp_type;
>     >     > +        };
>     >     > +    };
>     >     > +};
>     >     > +
>     >     > +struct ofputil_ct_match {
>     >     > +    uint8_t ip_proto;
>     >     > +    uint16_t l3_type;
>     >     > +
>     >     > +    struct ofputil_ct_tuple tuple_orig;
>     >     > +    struct ofputil_ct_tuple tuple_reply;
>     >     > +};
>     >
>     >     These structures are very similar to existing ct_dpif_tuple.
>     >     We can keep them for now, but I think, we'll need to replace
>     >     ct_dpif_tuple with the new ofputil_ct_tuple in the future.
>     >     We may need some changes in ofputil_ct_tuple for that though.
>     >
>     >
>     > I was thinking about replacing them, however it would require a much
>     > bigger change, which was out of scope of this RFE. Also I was not 
> entirely
>     > sure if there wasn't any specific reason for the ct_dpif_tuple 
> specifically
>     > the address part. It seems like optimization to me.
>     >
>     > Anyway it should be possible to unify those if needed.
>     >  
>     >
>     >
>     >
>     >     OTOH, I'm not sure I understand why we need both tuples here.
>     >     Can we have a tuple and a flag that indicates a direction
>     >     instead?
>     >
>     >
>     > That is also a possibility. However that wouldn't fit too that nicely 
> in the
>     > whole match struct, we could probably duplicate the proto field and get 
> rid of the
>     > l3_type. That way we could supply just two tuples instead, getting rid 
> of the match completely.
> 
>     The point was to have only one tuple.  See below.
> 
>     >  
>     >
>     >
>     >     There are several issues with having both original and reply
>     >     tuples, IIUC:
>     >
>     >     1. None of the command line interfaces support specifying
>     >        reply tuple.
>     >
>     >
>     > That was the intention of the original command AFAICT, we could modify 
> the commands
>     > to support that. However I'm not sure if there is any benefit in that.
> 
>     ovs-ofctl supposed to mimic the OpenFlow request, so it should
>     support everything that OpenFlow request can have.
> 
> 
> I see, sure that can definitely be done, however we should probably think how 
> this can
> be done in a more universal way that could be used also for dpctl.
> 
> 
>     >  
>     >
>     >
>     >     2. If the reply tuple will be specified via OpenFlow, it
>     >        will be ignored if the original direction tuple is
>     >        fully specified.
>     >
>     >
>     > That is ok IMO, because a fully specified original should be enough to 
> determine the one
>     > entry that needs to be erased. So the reply direction in that would be 
> an extra.
> 
>     That supports the point that we don't need two tuples in the request.
> 
> 
> Right but only if we are able to supply the full original direction. See 
> below.
>  
> 
> 
>     >  
>     >
>     >
>     >     3. Netfilter kernel interface allows specifying only one
>     >        tuple, not both, AFAICT.
>     >
>     >
>     > Right, that's the reason why, when we match, we use the original tuple 
> to erase that
>     > entry.
>     >  
>     >
>     >
>     >     I'm not a conntrack expert, so I have a question: Is it possible
>     >     that addresses and ports in original and reply directions are
>     >     not mirrored copies?  Or do we care about these cases?
>     >
>     >
>     > Not sure if it's possible, but even if it would be, the current 
> implementation
>     > would account for that, with the exception of specifying fully the 
> original
>     > direction.
>     >  
>     >
>     >
>     >     If we want to flush all entries related to the same IP:PORT,
>     >     we will not be able to do that in a single OpenFlow request,
>     >     because we'll need to supply the same IP and PORT as src
>     >     and dst, or as src in both original and reply directions.
>     >     None of these can fit into a single request.
>     >
>     >
>     > Is that really a problem? You could do that in two, one for the original
>     > one for reply. This is also very similar to the interface supported by 
> conntrack-tools,
>     > with those you couldn't remove all entries in a single call as well. We 
> could modify
>     > the interface that would allow matching on tuple without direction 
> specification,
>     > that would translate to both directions, however there is still the 
> question if that's needed.
> 
> 
>     It's not a problem, but another supporting point for the interface
>     being redundant.  We don't seem to need two tuples in the request,
>     because users will have to send two different requests anyway.
> 
>     So, the request should have:
> 
>     1. zone
>     2. nw_proto
>     3. tuple
>     4. tuple type ('original' or 'reply'), in case we want to allow
>        users to flush based on reply tuple.  But this seems a bit
>        redundant as well.
> 
>     Something like:
> 
>     struct ofputil_ct_tuple {
>     }
> 
>     struct ofputil_ct_match {
>         uint8_t ip_proto;
>         uint16_t l3_type;
>         // enum DIRECTION direction;
>         struct ofputil_ct_tuple tuple;
>     };
> 
> 
>     And the OpenFlow part:
> 
>     /* CT flush available TLVs. */
>     enum nx_ct_flush_tlv_type {
>         NXT_CT_ZONE_ID,           /* u8: CT zone id. */
>         NXT_CT_TUPLE,             /* Nested NXT_CT_TUPLE_* attributes. */
>     };
> 
>     enum nx_ct_flush_tuple_type {
>         NXT_CT_TUPLE_SRC,         /* be128: source IPv6 or mapped IPv4 
> address. */
>         NXT_CT_TUPLE_DST,         /* be128: destination IPv6 or mapped IPv4 
> address. */
>         NXT_CT_TUPLE_SRC_PORT,    /* be16: source port. */
>         NXT_CT_TUPLE_DST_PORT,    /* be16: destination port. */
>         NXT_CT_TUPLE_ICMP_ID,     /* be16: ICMP id. */
>         NXT_CT_TUPLE_ICMP_TYPE,   /* u8: ICMP type. */
>         NXT_CT_TUPLE_ICMP_CODE,   /* u8: ICMP code. */
>         // NXT_CT_TUPLE_DIRECTION,   /* enum nx_ct_flush_tuple_direction: CT 
> tuple direction. */
>     };
> 
>     //enum nx_ct_flush_tuple_direction {
>     //    NXT_CT_TUPLE_DIRECTION_ORIG = 1,
>     //    NXT_CT_TUPLE_DIRECTION_REPLY = 2,
>     //}
> 
>     struct nx_ct_flush {
>         uint8_t nw_proto;          /* IP protocol. */
>         uint8_t pad[7];            /* Align to 64-bit (must be zero). */
>         /* Followed by ... */
>     };
> 
> 
>     Such interface will also allow us in to have both direction specified
>     if we would want that in the future.  For now we should check that
>     each TLV type is specified once, i.e. only one tuple is in the request.
> 
>     We may or may not add NXT_CT_TUPLE_DIRECTION support now.  Can add it
>     in later releases if necessary.  Tuples without that TLV can be treated
>     as original direction.
> 
> 
> The original idea was something similar but as stated below that wouldn't 
> really help
> us in the OVN case. And there is probably no significant benefit in having 
> partial match
> with only single direction at time. I'm not sure if I'm not missing something 
> obvious.
> 
> 
>     >  
>     >
>     >
>     >     At this point is there actually any value in flushing basing
>     >     on reply direction?  Shouldn't we be able to flush the same
>     >     entries by flushing by dst in original direction, if we will
>     >     need to issue multiple requests anyway?
>     >
>     >
>     > That assumption would be fine when you can specify a full tuple for the 
> original direction.
>     > The problem is that for example OVN we know only the original 
> DST_IP:DST_PORT (VIP)
>     > and the reply SRC_IP:SRC_PORT (backend).
> 
>     Isn't original DST_IP:DST_PORT equal to reply SRC_IP:SRC_PORT ?
> 
> 
> Unfortunately no, it's DNATed so for example
> 'ovn-nbctl lb-add lb1 30.0.0.1 "172.16.1.2,172.16.1.3,172.16.1.4"'
> will have CT entries as follows:
> 
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.2,dst=192.168.1.2,sport=80,dport=1),zone=0,mark=2
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=2,dport=80),reply=(src=172.16.1.3,dst=192.168.1.2,sport=80,dport=2),zone=0,mark=2
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=3,dport=80),reply=(src=172.16.1.4,dst=192.168.1.2,sport=80,dport=3),zone=0,mark=2
> 
> We cannot simply filter on the original direction, as we cannot infer the SRC 
> part (without filtering all entries etc.).

OK.  This is what I was asking above and you said
'Not sure if it's possible'.  So, it is possible,
I see.

> 
>  
> 
> 
>     > That means we cannot fully specify the original direction
>     > without doing the filtering ourselves. It would be possible with extra 
> steps from the user. If this interface
>     > is too generic/complicated for that purpose we could say that the flush 
> extension supports only full orig tuple
>     > and leave the filtering up to the user. That would definitely work, it 
> wouldn't be that elegant for the user however
>     > from the code change perspective much simpler.
> 
>     I'm not sure I understand that part.  I'm not arguing aginst partial
>     matches.  Or do you mean something else?
> 
>     >
>     >  I can see a certain benefit of keeping it in OvS:
>     >
>     > 1) The filtering complexity is hidden from the user.
>     > 2) The data transfer between OvS and the user is much smaller (even if 
> you need to send two matches).
>     > 3) Once implemented it's just ready to be used without the need of 
> additional thoughts going into it.
>     >
>     >
>     >  
>     >
>     >
>     >     Did you start working on the OVN side of things?  Which
>     >     commands OVN will actually use?
>     >
>     >
>     > Yes, I did. The plan is that OVN will use the extension.
> 
>     I understand that.  The question was how the actual request will
>     look like.
> 
> 
> The testing version used something like this:
> 
>     memset(match, 0, sizeof *match);
> 
>     struct ofputil_ct_tuple *orig = &match->tuple_orig;
>     struct ofputil_ct_tuple *reply = &match->tuple_reply;
> 
>     if (IN6_IS_ADDR_V4MAPPED(&tuple->vip_ip)) {
>         match->l3_type = AF_INET;
>     } else {
>         match->l3_type = AF_INET6;
>     }
> 
>     match->ip_proto = tuple->proto;
> 
>     orig->dst = tuple->vip_ip;
>     orig->dst_port = htons(tuple->vip_port);
> 
>     reply->src = tuple->backend_ip;
>     reply->src_port = htons(tuple->backend_port);
> 
> tuple is the struct containing lb info, so VIP, VIP_PORT, BACKEND, 
> BACKEND_PORT, PROTO.
> match is the "struct ofputil_ct_match".
> 
> This is with older version of this RFE but it should illustrate the point.

Assuming the current v5 implementation of the extension, will the end
result of the request be different if we remove the assignment of
'orig=>dst' and 'orig->dst_port' from the code above?
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to