On 1/11/23 16:09, Ales Musil wrote:
> 
> 
> On Wed, Jan 11, 2023 at 3:52 PM Ilya Maximets <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     On 1/11/23 14:09, Ales Musil wrote:
>     >
>     >
>     > On Wed, Jan 11, 2023 at 1:34 PM Ilya Maximets <[email protected] 
> <mailto:[email protected]> <mailto:[email protected] 
> <mailto:[email protected]>>> wrote:
>     >
>     >     On 1/11/23 07:53, Ales Musil wrote:
>     >     >
>     >     >
>     >     > On Tue, Jan 10, 2023 at 11:08 PM Ilya Maximets 
> <[email protected] <mailto:[email protected]> <mailto:[email protected] 
> <mailto:[email protected]>> <mailto:[email protected] 
> <mailto:[email protected]> <mailto:[email protected] 
> <mailto:[email protected]>>>> wrote:
>     >     >
>     >     >     On 12/16/22 13:30, Ales Musil wrote:
>     >     >     > Currently, the CT can be flushed by dpctl only be specifying
>     >     >
>     >     >     *by
>     >     >
>     >     >     > the whole 5-tuple. This is not very convenient when there 
> are
>     >     >     > only some fields known to the user of CT flush. Add new 
> struct
>     >     >     > ofputil_ct_match which represents the generic filtering 
> that can
>     >     >     > be done for CT flush. The match is done only on fields that 
> are
>     >     >     > non-zero with exception to the icmp fields.
>     >     >     >
>     >     >     > This allows the filtering just within dpctl, however
>     >     >     > it is a preparation for OpenFlow extension.
>     >     >     >
>     >     >     > Reported-at: https://bugzilla.redhat.com/2120546 
> <https://bugzilla.redhat.com/2120546> <https://bugzilla.redhat.com/2120546 
> <https://bugzilla.redhat.com/2120546>> <https://bugzilla.redhat.com/2120546 
> <https://bugzilla.redhat.com/2120546> <https://bugzilla.redhat.com/2120546 
> <https://bugzilla.redhat.com/2120546>>>
>     >     >     > Signed-off-by: Ales Musil <[email protected] 
> <mailto:[email protected]> <mailto:[email protected] 
> <mailto:[email protected]>> <mailto:[email protected] 
> <mailto:[email protected]> <mailto:[email protected] 
> <mailto:[email protected]>>>>
>     >     >     > Acked-by: Paolo Valerio <[email protected] 
> <mailto:[email protected]> <mailto:[email protected] 
> <mailto:[email protected]>> <mailto:[email protected] 
> <mailto:[email protected]> <mailto:[email protected] 
> <mailto:[email protected]>>>>
>     >     >     > ---
>     >     >     > v5: Add ack from Paolo.
>     >     >     > v4: Fix a flush all scenario.
>     >     >     > v3: Rebase on top of master.
>     >     >     >     Address the C99 comment and missing dpif_close call.
>     >     >     > v2: Rebase on top of master.
>     >     >     >     Address comments from Paolo.
>     >     >     > ---
>     >     >
>     >     >     Hi, Ales.  Thanks for working on this change.
>     >     >     Sorry for review taking so long.  I initially though there
>     >     >     will be mostly a few cosmetic/style changes needed.  But
>     >     >     longer I look at this code, more questions I have.  See below.
>     >     >
>     >     >     Dumitru, maybe you can also help answer some of them?
>     >     >
>     >     >     Best regards, Ilya Maximets.
>     >     >
>     >     >
>     >     > Hi Ilya,
>     >     >
>     >     > thank you for the review. Before proceeding I tried to answer 
> your concerns/questions.
>     >     > Let me know what you think.
>     >     >
>     >     > Thanks,
>     >     > Ales
>     >     >  
>     >     >
>     >     >
>     >     >     >  NEWS                           |   2 +
>     >     >     >  include/openvswitch/ofp-util.h |  28 +++
>     >     >     >  lib/automake.mk <http://automake.mk> <http://automake.mk 
> <http://automake.mk>> <http://automake.mk <http://automake.mk> 
> <http://automake.mk <http://automake.mk>>>                |   2 +
>     >     >     >  lib/ct-dpif.c                  | 208 +++++++++------------
>     >     >     >  lib/ct-dpif.h                  |   4 +-
>     >     >     >  lib/dpctl.c                    |  42 ++---
>     >     >     >  lib/dpctl.man                  |   3 +-
>     >     >     >  lib/ofp-ct-util.c              | 326 
> +++++++++++++++++++++++++++++++++
>     >     >     >  lib/ofp-ct-util.h              |  36 ++++
>     >     >     >  tests/system-traffic.at <http://system-traffic.at> 
> <http://system-traffic.at <http://system-traffic.at>> 
> <http://system-traffic.at <http://system-traffic.at> 
> <http://system-traffic.at <http://system-traffic.at>>>        |  82 ++++++++-
>     >     >     >  10 files changed, 586 insertions(+), 147 deletions(-)
>     >     >     >  create mode 100644 lib/ofp-ct-util.c
>     >     >     >  create mode 100644 lib/ofp-ct-util.h
>     >     >     >
>     >     >     > diff --git a/NEWS b/NEWS
>     >     >     > index 265375e1c..ff8904b02 100644
>     >     >     > --- a/NEWS
>     >     >     > +++ b/NEWS
>     >     >     > @@ -14,6 +14,8 @@ Post-v3.0.0
>     >     >     >       10 Gbps link speed by default in case the actual link 
> speed cannot be
>     >     >     >       determined.  Previously it was 10 Mbps.  Values can 
> still be overridden
>     >     >     >       by specifying 'max-rate' or '[r]stp-path-cost' 
> accordingly.
>     >     >     > +   - ovs-dpctl and related ovs-appctl commands:
>     >     >
>     >     >     "ovs-dpctl and 'ovs-appctl dpctl/'" might be a better header.
>     >     >
>     >     >     > +     * "flush-conntrack" is capable of handling partial 
> 5-tuple.
>     >     >     > 
>     >     >     > 
>     >     >     >  v3.0.0 - 15 Aug 2022
>     >     >     > diff --git a/include/openvswitch/ofp-util.h 
> b/include/openvswitch/ofp-util.h
>     >     >     > index 091a09cad..84937ae26 100644
>     >     >     > --- a/include/openvswitch/ofp-util.h
>     >     >     > +++ b/include/openvswitch/ofp-util.h
>     >     >     > @@ -19,6 +19,9 @@
>     >     >     > 
>     >     >     >  #include <stdbool.h>
>     >     >     >  #include <stdint.h>
>     >     >     > +#include <sys/types.h>
>     >     >     > +#include <netinet/in.h>
>     >     >     > +
>     >     >     >  #include "openvswitch/ofp-protocol.h"
>     >     >     > 
>     >     >     >  struct ofp_header;
>     >     >     > @@ -27,6 +30,31 @@ struct ofp_header;
>     >     >     >  extern "C" {
>     >     >     >  #endif
>     >     >     > 
>     >     >     > +struct ofputil_ct_tuple {
>     >     >     > +    struct in6_addr src;
>     >     >     > +    struct in6_addr dst;
>     >     >     > +
>     >     >     > +    union {
>     >     >     > +        ovs_be16 src_port;
>     >     >     > +        ovs_be16 icmp_id;
>     >     >     > +    };
>     >     >     > +    union {
>     >     >     > +        ovs_be16 dst_port;
>     >     >     > +        struct {
>     >     >     > +            uint8_t icmp_code;
>     >     >     > +            uint8_t icmp_type;
>     >     >     > +        };
>     >     >     > +    };
>     >     >     > +};
>     >     >     > +
>     >     >     > +struct ofputil_ct_match {
>     >     >     > +    uint8_t ip_proto;
>     >     >     > +    uint16_t l3_type;
>     >     >     > +
>     >     >     > +    struct ofputil_ct_tuple tuple_orig;
>     >     >     > +    struct ofputil_ct_tuple tuple_reply;
>     >     >     > +};
>     >     >
>     >     >     These structures are very similar to existing ct_dpif_tuple.
>     >     >     We can keep them for now, but I think, we'll need to replace
>     >     >     ct_dpif_tuple with the new ofputil_ct_tuple in the future.
>     >     >     We may need some changes in ofputil_ct_tuple for that though.
>     >     >
>     >     >
>     >     > I was thinking about replacing them, however it would require a 
> much
>     >     > bigger change, which was out of scope of this RFE. Also I was not 
> entirely
>     >     > sure if there wasn't any specific reason for the ct_dpif_tuple 
> specifically
>     >     > the address part. It seems like optimization to me.
>     >     >
>     >     > Anyway it should be possible to unify those if needed.
>     >     >  
>     >     >
>     >     >
>     >     >
>     >     >     OTOH, I'm not sure I understand why we need both tuples here.
>     >     >     Can we have a tuple and a flag that indicates a direction
>     >     >     instead?
>     >     >
>     >     >
>     >     > That is also a possibility. However that wouldn't fit too that 
> nicely in the
>     >     > whole match struct, we could probably duplicate the proto field 
> and get rid of the
>     >     > l3_type. That way we could supply just two tuples instead, 
> getting rid of the match completely.
>     >
>     >     The point was to have only one tuple.  See below.
>     >
>     >     >  
>     >     >
>     >     >
>     >     >     There are several issues with having both original and reply
>     >     >     tuples, IIUC:
>     >     >
>     >     >     1. None of the command line interfaces support specifying
>     >     >        reply tuple.
>     >     >
>     >     >
>     >     > That was the intention of the original command AFAICT, we could 
> modify the commands
>     >     > to support that. However I'm not sure if there is any benefit in 
> that.
>     >
>     >     ovs-ofctl supposed to mimic the OpenFlow request, so it should
>     >     support everything that OpenFlow request can have.
>     >
>     >
>     > I see, sure that can definitely be done, however we should probably 
> think how this can
>     > be done in a more universal way that could be used also for dpctl.
>     >
>     >
>     >     >  
>     >     >
>     >     >
>     >     >     2. If the reply tuple will be specified via OpenFlow, it
>     >     >        will be ignored if the original direction tuple is
>     >     >        fully specified.
>     >     >
>     >     >
>     >     > That is ok IMO, because a fully specified original should be 
> enough to determine the one
>     >     > entry that needs to be erased. So the reply direction in that 
> would be an extra.
>     >
>     >     That supports the point that we don't need two tuples in the 
> request.
>     >
>     >
>     > Right but only if we are able to supply the full original direction. 
> See below.
>     >  
>     >
>     >
>     >     >  
>     >     >
>     >     >
>     >     >     3. Netfilter kernel interface allows specifying only one
>     >     >        tuple, not both, AFAICT.
>     >     >
>     >     >
>     >     > Right, that's the reason why, when we match, we use the original 
> tuple to erase that
>     >     > entry.
>     >     >  
>     >     >
>     >     >
>     >     >     I'm not a conntrack expert, so I have a question: Is it 
> possible
>     >     >     that addresses and ports in original and reply directions are
>     >     >     not mirrored copies?  Or do we care about these cases?
>     >     >
>     >     >
>     >     > Not sure if it's possible, but even if it would be, the current 
> implementation
>     >     > would account for that, with the exception of specifying fully 
> the original
>     >     > direction.
>     >     >  
>     >     >
>     >     >
>     >     >     If we want to flush all entries related to the same IP:PORT,
>     >     >     we will not be able to do that in a single OpenFlow request,
>     >     >     because we'll need to supply the same IP and PORT as src
>     >     >     and dst, or as src in both original and reply directions.
>     >     >     None of these can fit into a single request.
>     >     >
>     >     >
>     >     > Is that really a problem? You could do that in two, one for the 
> original
>     >     > one for reply. This is also very similar to the interface 
> supported by conntrack-tools,
>     >     > with those you couldn't remove all entries in a single call as 
> well. We could modify
>     >     > the interface that would allow matching on tuple without 
> direction specification,
>     >     > that would translate to both directions, however there is still 
> the question if that's needed.
>     >
>     >
>     >     It's not a problem, but another supporting point for the interface
>     >     being redundant.  We don't seem to need two tuples in the request,
>     >     because users will have to send two different requests anyway.
>     >
>     >     So, the request should have:
>     >
>     >     1. zone
>     >     2. nw_proto
>     >     3. tuple
>     >     4. tuple type ('original' or 'reply'), in case we want to allow
>     >        users to flush based on reply tuple.  But this seems a bit
>     >        redundant as well.
>     >
>     >     Something like:
>     >
>     >     struct ofputil_ct_tuple {
>     >     }
>     >
>     >     struct ofputil_ct_match {
>     >         uint8_t ip_proto;
>     >         uint16_t l3_type;
>     >         // enum DIRECTION direction;
>     >         struct ofputil_ct_tuple tuple;
>     >     };
>     >
>     >
>     >     And the OpenFlow part:
>     >
>     >     /* CT flush available TLVs. */
>     >     enum nx_ct_flush_tlv_type {
>     >         NXT_CT_ZONE_ID,           /* u8: CT zone id. */
>     >         NXT_CT_TUPLE,             /* Nested NXT_CT_TUPLE_* attributes. 
> */
>     >     };
>     >
>     >     enum nx_ct_flush_tuple_type {
>     >         NXT_CT_TUPLE_SRC,         /* be128: source IPv6 or mapped IPv4 
> address. */
>     >         NXT_CT_TUPLE_DST,         /* be128: destination IPv6 or mapped 
> IPv4 address. */
>     >         NXT_CT_TUPLE_SRC_PORT,    /* be16: source port. */
>     >         NXT_CT_TUPLE_DST_PORT,    /* be16: destination port. */
>     >         NXT_CT_TUPLE_ICMP_ID,     /* be16: ICMP id. */
>     >         NXT_CT_TUPLE_ICMP_TYPE,   /* u8: ICMP type. */
>     >         NXT_CT_TUPLE_ICMP_CODE,   /* u8: ICMP code. */
>     >         // NXT_CT_TUPLE_DIRECTION,   /* enum 
> nx_ct_flush_tuple_direction: CT tuple direction. */
>     >     };
>     >
>     >     //enum nx_ct_flush_tuple_direction {
>     >     //    NXT_CT_TUPLE_DIRECTION_ORIG = 1,
>     >     //    NXT_CT_TUPLE_DIRECTION_REPLY = 2,
>     >     //}
>     >
>     >     struct nx_ct_flush {
>     >         uint8_t nw_proto;          /* IP protocol. */
>     >         uint8_t pad[7];            /* Align to 64-bit (must be zero). */
>     >         /* Followed by ... */
>     >     };
>     >
>     >
>     >     Such interface will also allow us in to have both direction 
> specified
>     >     if we would want that in the future.  For now we should check that
>     >     each TLV type is specified once, i.e. only one tuple is in the 
> request.
>     >
>     >     We may or may not add NXT_CT_TUPLE_DIRECTION support now.  Can add 
> it
>     >     in later releases if necessary.  Tuples without that TLV can be 
> treated
>     >     as original direction.
>     >
>     >
>     > The original idea was something similar but as stated below that 
> wouldn't really help
>     > us in the OVN case. And there is probably no significant benefit in 
> having partial match
>     > with only single direction at time. I'm not sure if I'm not missing 
> something obvious.
>     >
>     >
>     >     >  
>     >     >
>     >     >
>     >     >     At this point is there actually any value in flushing basing
>     >     >     on reply direction?  Shouldn't we be able to flush the same
>     >     >     entries by flushing by dst in original direction, if we will
>     >     >     need to issue multiple requests anyway?
>     >     >
>     >     >
>     >     > That assumption would be fine when you can specify a full tuple 
> for the original direction.
>     >     > The problem is that for example OVN we know only the original 
> DST_IP:DST_PORT (VIP)
>     >     > and the reply SRC_IP:SRC_PORT (backend).
>     >
>     >     Isn't original DST_IP:DST_PORT equal to reply SRC_IP:SRC_PORT ?
>     >
>     >
>     > Unfortunately no, it's DNATed so for example
>     > 'ovn-nbctl lb-add lb1 30.0.0.1 "172.16.1.2,172.16.1.3,172.16.1.4"'
>     > will have CT entries as follows:
>     >
>     > 
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.2,dst=192.168.1.2,sport=80,dport=1),zone=0,mark=2
>     > 
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=2,dport=80),reply=(src=172.16.1.3,dst=192.168.1.2,sport=80,dport=2),zone=0,mark=2
>     > 
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=3,dport=80),reply=(src=172.16.1.4,dst=192.168.1.2,sport=80,dport=3),zone=0,mark=2
>     >
>     > We cannot simply filter on the original direction, as we cannot infer 
> the SRC part (without filtering all entries etc.).
> 
>     OK.  This is what I was asking above and you said
>     'Not sure if it's possible'.  So, it is possible,
>     I see.
> 
> 
> Yeah my bad I did not connect the dots right away.
>  
> 
> 
>     >
>     >  
>     >
>     >
>     >     > That means we cannot fully specify the original direction
>     >     > without doing the filtering ourselves. It would be possible with 
> extra steps from the user. If this interface
>     >     > is too generic/complicated for that purpose we could say that the 
> flush extension supports only full orig tuple
>     >     > and leave the filtering up to the user. That would definitely 
> work, it wouldn't be that elegant for the user however
>     >     > from the code change perspective much simpler.
>     >
>     >     I'm not sure I understand that part.  I'm not arguing aginst partial
>     >     matches.  Or do you mean something else?
>     >
>     >     >
>     >     >  I can see a certain benefit of keeping it in OvS:
>     >     >
>     >     > 1) The filtering complexity is hidden from the user.
>     >     > 2) The data transfer between OvS and the user is much smaller 
> (even if you need to send two matches).
>     >     > 3) Once implemented it's just ready to be used without the need 
> of additional thoughts going into it.
>     >     >
>     >     >
>     >     >  
>     >     >
>     >     >
>     >     >     Did you start working on the OVN side of things?  Which
>     >     >     commands OVN will actually use?
>     >     >
>     >     >
>     >     > Yes, I did. The plan is that OVN will use the extension.
>     >
>     >     I understand that.  The question was how the actual request will
>     >     look like.
>     >
>     >
>     > The testing version used something like this:
>     >
>     >     memset(match, 0, sizeof *match);
>     >
>     >     struct ofputil_ct_tuple *orig = &match->tuple_orig;
>     >     struct ofputil_ct_tuple *reply = &match->tuple_reply;
>     >
>     >     if (IN6_IS_ADDR_V4MAPPED(&tuple->vip_ip)) {
>     >         match->l3_type = AF_INET;
>     >     } else {
>     >         match->l3_type = AF_INET6;
>     >     }
>     >
>     >     match->ip_proto = tuple->proto;
>     >
>     >     orig->dst = tuple->vip_ip;
>     >     orig->dst_port = htons(tuple->vip_port);
>     >
>     >     reply->src = tuple->backend_ip;
>     >     reply->src_port = htons(tuple->backend_port);
>     >
>     > tuple is the struct containing lb info, so VIP, VIP_PORT, BACKEND, 
> BACKEND_PORT, PROTO.
>     > match is the "struct ofputil_ct_match".
>     >
>     > This is with older version of this RFE but it should illustrate the 
> point.
> 
>     Assuming the current v5 implementation of the extension, will the end
>     result of the request be different if we remove the assignment of
>     'orig=>dst' and 'orig->dst_port' from the code above?
> 
> 
> It might delete more than we actually want to. Without specifying the LBs 
> 5-tuple
> we might always remove more entries than we should. As we can have multiple 
> backends
> for the same VIP (that's the whole point of LB), however nothing prevents us 
> from two VIPs
> pointing to the same backend. Which would cause an issue if we will go only 
> with the reply part.


OK.  That makes sense.  Let's keep both tuples then.

But we need to allow ovs-ofctl to accept both tuples.

And we should use a "shortcut" only if reply tuple is not specified,
because if I'm asking to delete:

 
orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.3,dst=192.168.1.2,sport=80,dport=1)

I'm not expecting the following entry to be deleted:

 
orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.2,dst=192.168.1.2,sport=80,dport=1)

Even though these entries probably cannot exist at the same time,
we should not delete the entry that we wasn't asked to delete.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to