On Wed, Jan 11, 2023 at 4:27 PM Ilya Maximets <[email protected]> wrote:
> On 1/11/23 16:09, Ales Musil wrote: > > > > > > On Wed, Jan 11, 2023 at 3:52 PM Ilya Maximets <[email protected] > <mailto:[email protected]>> wrote: > > > > On 1/11/23 14:09, Ales Musil wrote: > > > > > > > > > On Wed, Jan 11, 2023 at 1:34 PM Ilya Maximets <[email protected] > <mailto:[email protected]> <mailto:[email protected] <mailto: > [email protected]>>> wrote: > > > > > > On 1/11/23 07:53, Ales Musil wrote: > > > > > > > > > > > > On Tue, Jan 10, 2023 at 11:08 PM Ilya Maximets < > [email protected] <mailto:[email protected]> <mailto:[email protected] > <mailto:[email protected]>> <mailto:[email protected] <mailto: > [email protected]> <mailto:[email protected] <mailto:[email protected]>>>> > wrote: > > > > > > > > On 12/16/22 13:30, Ales Musil wrote: > > > > > Currently, the CT can be flushed by dpctl only be > specifying > > > > > > > > *by > > > > > > > > > the whole 5-tuple. This is not very convenient when > there are > > > > > only some fields known to the user of CT flush. Add > new struct > > > > > ofputil_ct_match which represents the generic > filtering that can > > > > > be done for CT flush. The match is done only on fields > that are > > > > > non-zero with exception to the icmp fields. > > > > > > > > > > This allows the filtering just within dpctl, however > > > > > it is a preparation for OpenFlow extension. > > > > > > > > > > Reported-at: https://bugzilla.redhat.com/2120546 < > https://bugzilla.redhat.com/2120546> <https://bugzilla.redhat.com/2120546 > <https://bugzilla.redhat.com/2120546>> < > https://bugzilla.redhat.com/2120546 <https://bugzilla.redhat.com/2120546> > <https://bugzilla.redhat.com/2120546 <https://bugzilla.redhat.com/2120546 > >>> > > > > > Signed-off-by: Ales Musil <[email protected] <mailto: > [email protected]> <mailto:[email protected] <mailto:[email protected]>> > <mailto:[email protected] <mailto:[email protected]> <mailto: > [email protected] <mailto:[email protected]>>>> > > > > > Acked-by: Paolo Valerio <[email protected] <mailto: > [email protected]> <mailto:[email protected] <mailto: > [email protected]>> <mailto:[email protected] <mailto: > [email protected]> <mailto:[email protected] <mailto: > [email protected]>>>> > > > > > --- > > > > > v5: Add ack from Paolo. > > > > > v4: Fix a flush all scenario. > > > > > v3: Rebase on top of master. > > > > > Address the C99 comment and missing dpif_close > call. > > > > > v2: Rebase on top of master. > > > > > Address comments from Paolo. > > > > > --- > > > > > > > > Hi, Ales. Thanks for working on this change. > > > > Sorry for review taking so long. I initially though > there > > > > will be mostly a few cosmetic/style changes needed. But > > > > longer I look at this code, more questions I have. See > below. > > > > > > > > Dumitru, maybe you can also help answer some of them? > > > > > > > > Best regards, Ilya Maximets. > > > > > > > > > > > > Hi Ilya, > > > > > > > > thank you for the review. Before proceeding I tried to > answer your concerns/questions. > > > > Let me know what you think. > > > > > > > > Thanks, > > > > Ales > > > > > > > > > > > > > > > > > NEWS | 2 + > > > > > include/openvswitch/ofp-util.h | 28 +++ > > > > > lib/automake.mk <http://automake.mk> < > http://automake.mk <http://automake.mk>> <http://automake.mk < > http://automake.mk> <http://automake.mk <http://automake.mk>>> > | 2 + > > > > > lib/ct-dpif.c | 208 > +++++++++------------ > > > > > lib/ct-dpif.h | 4 +- > > > > > lib/dpctl.c | 42 ++--- > > > > > lib/dpctl.man | 3 +- > > > > > lib/ofp-ct-util.c | 326 > +++++++++++++++++++++++++++++++++ > > > > > lib/ofp-ct-util.h | 36 ++++ > > > > > tests/system-traffic.at <http://system-traffic.at> < > http://system-traffic.at <http://system-traffic.at>> < > http://system-traffic.at <http://system-traffic.at> < > http://system-traffic.at <http://system-traffic.at>>> | 82 > ++++++++- > > > > > 10 files changed, 586 insertions(+), 147 deletions(-) > > > > > create mode 100644 lib/ofp-ct-util.c > > > > > create mode 100644 lib/ofp-ct-util.h > > > > > > > > > > diff --git a/NEWS b/NEWS > > > > > index 265375e1c..ff8904b02 100644 > > > > > --- a/NEWS > > > > > +++ b/NEWS > > > > > @@ -14,6 +14,8 @@ Post-v3.0.0 > > > > > 10 Gbps link speed by default in case the actual > link speed cannot be > > > > > determined. Previously it was 10 Mbps. Values > can still be overridden > > > > > by specifying 'max-rate' or '[r]stp-path-cost' > accordingly. > > > > > + - ovs-dpctl and related ovs-appctl commands: > > > > > > > > "ovs-dpctl and 'ovs-appctl dpctl/'" might be a better > header. > > > > > > > > > + * "flush-conntrack" is capable of handling > partial 5-tuple. > > > > > > > > > > > > > > > v3.0.0 - 15 Aug 2022 > > > > > diff --git a/include/openvswitch/ofp-util.h > b/include/openvswitch/ofp-util.h > > > > > index 091a09cad..84937ae26 100644 > > > > > --- a/include/openvswitch/ofp-util.h > > > > > +++ b/include/openvswitch/ofp-util.h > > > > > @@ -19,6 +19,9 @@ > > > > > > > > > > #include <stdbool.h> > > > > > #include <stdint.h> > > > > > +#include <sys/types.h> > > > > > +#include <netinet/in.h> > > > > > + > > > > > #include "openvswitch/ofp-protocol.h" > > > > > > > > > > struct ofp_header; > > > > > @@ -27,6 +30,31 @@ struct ofp_header; > > > > > extern "C" { > > > > > #endif > > > > > > > > > > +struct ofputil_ct_tuple { > > > > > + struct in6_addr src; > > > > > + struct in6_addr dst; > > > > > + > > > > > + union { > > > > > + ovs_be16 src_port; > > > > > + ovs_be16 icmp_id; > > > > > + }; > > > > > + union { > > > > > + ovs_be16 dst_port; > > > > > + struct { > > > > > + uint8_t icmp_code; > > > > > + uint8_t icmp_type; > > > > > + }; > > > > > + }; > > > > > +}; > > > > > + > > > > > +struct ofputil_ct_match { > > > > > + uint8_t ip_proto; > > > > > + uint16_t l3_type; > > > > > + > > > > > + struct ofputil_ct_tuple tuple_orig; > > > > > + struct ofputil_ct_tuple tuple_reply; > > > > > +}; > > > > > > > > These structures are very similar to existing > ct_dpif_tuple. > > > > We can keep them for now, but I think, we'll need to > replace > > > > ct_dpif_tuple with the new ofputil_ct_tuple in the > future. > > > > We may need some changes in ofputil_ct_tuple for that > though. > > > > > > > > > > > > I was thinking about replacing them, however it would > require a much > > > > bigger change, which was out of scope of this RFE. Also I > was not entirely > > > > sure if there wasn't any specific reason for the > ct_dpif_tuple specifically > > > > the address part. It seems like optimization to me. > > > > > > > > Anyway it should be possible to unify those if needed. > > > > > > > > > > > > > > > > > > > > OTOH, I'm not sure I understand why we need both tuples > here. > > > > Can we have a tuple and a flag that indicates a direction > > > > instead? > > > > > > > > > > > > That is also a possibility. However that wouldn't fit too > that nicely in the > > > > whole match struct, we could probably duplicate the proto > field and get rid of the > > > > l3_type. That way we could supply just two tuples instead, > getting rid of the match completely. > > > > > > The point was to have only one tuple. See below. > > > > > > > > > > > > > > > > > > > There are several issues with having both original and > reply > > > > tuples, IIUC: > > > > > > > > 1. None of the command line interfaces support specifying > > > > reply tuple. > > > > > > > > > > > > That was the intention of the original command AFAICT, we > could modify the commands > > > > to support that. However I'm not sure if there is any > benefit in that. > > > > > > ovs-ofctl supposed to mimic the OpenFlow request, so it should > > > support everything that OpenFlow request can have. > > > > > > > > > I see, sure that can definitely be done, however we should > probably think how this can > > > be done in a more universal way that could be used also for dpctl. > > > > > > > > > > > > > > > > > > > > > > 2. If the reply tuple will be specified via OpenFlow, it > > > > will be ignored if the original direction tuple is > > > > fully specified. > > > > > > > > > > > > That is ok IMO, because a fully specified original should be > enough to determine the one > > > > entry that needs to be erased. So the reply direction in > that would be an extra. > > > > > > That supports the point that we don't need two tuples in the > request. > > > > > > > > > Right but only if we are able to supply the full original > direction. See below. > > > > > > > > > > > > > > > > > > > > > > > > > 3. Netfilter kernel interface allows specifying only one > > > > tuple, not both, AFAICT. > > > > > > > > > > > > Right, that's the reason why, when we match, we use the > original tuple to erase that > > > > entry. > > > > > > > > > > > > > > > > I'm not a conntrack expert, so I have a question: Is it > possible > > > > that addresses and ports in original and reply > directions are > > > > not mirrored copies? Or do we care about these cases? > > > > > > > > > > > > Not sure if it's possible, but even if it would be, the > current implementation > > > > would account for that, with the exception of specifying > fully the original > > > > direction. > > > > > > > > > > > > > > > > If we want to flush all entries related to the same > IP:PORT, > > > > we will not be able to do that in a single OpenFlow > request, > > > > because we'll need to supply the same IP and PORT as src > > > > and dst, or as src in both original and reply directions. > > > > None of these can fit into a single request. > > > > > > > > > > > > Is that really a problem? You could do that in two, one for > the original > > > > one for reply. This is also very similar to the interface > supported by conntrack-tools, > > > > with those you couldn't remove all entries in a single call > as well. We could modify > > > > the interface that would allow matching on tuple without > direction specification, > > > > that would translate to both directions, however there is > still the question if that's needed. > > > > > > > > > It's not a problem, but another supporting point for the > interface > > > being redundant. We don't seem to need two tuples in the > request, > > > because users will have to send two different requests anyway. > > > > > > So, the request should have: > > > > > > 1. zone > > > 2. nw_proto > > > 3. tuple > > > 4. tuple type ('original' or 'reply'), in case we want to allow > > > users to flush based on reply tuple. But this seems a bit > > > redundant as well. > > > > > > Something like: > > > > > > struct ofputil_ct_tuple { > > > } > > > > > > struct ofputil_ct_match { > > > uint8_t ip_proto; > > > uint16_t l3_type; > > > // enum DIRECTION direction; > > > struct ofputil_ct_tuple tuple; > > > }; > > > > > > > > > And the OpenFlow part: > > > > > > /* CT flush available TLVs. */ > > > enum nx_ct_flush_tlv_type { > > > NXT_CT_ZONE_ID, /* u8: CT zone id. */ > > > NXT_CT_TUPLE, /* Nested NXT_CT_TUPLE_* > attributes. */ > > > }; > > > > > > enum nx_ct_flush_tuple_type { > > > NXT_CT_TUPLE_SRC, /* be128: source IPv6 or mapped > IPv4 address. */ > > > NXT_CT_TUPLE_DST, /* be128: destination IPv6 or > mapped IPv4 address. */ > > > NXT_CT_TUPLE_SRC_PORT, /* be16: source port. */ > > > NXT_CT_TUPLE_DST_PORT, /* be16: destination port. */ > > > NXT_CT_TUPLE_ICMP_ID, /* be16: ICMP id. */ > > > NXT_CT_TUPLE_ICMP_TYPE, /* u8: ICMP type. */ > > > NXT_CT_TUPLE_ICMP_CODE, /* u8: ICMP code. */ > > > // NXT_CT_TUPLE_DIRECTION, /* enum > nx_ct_flush_tuple_direction: CT tuple direction. */ > > > }; > > > > > > //enum nx_ct_flush_tuple_direction { > > > // NXT_CT_TUPLE_DIRECTION_ORIG = 1, > > > // NXT_CT_TUPLE_DIRECTION_REPLY = 2, > > > //} > > > > > > struct nx_ct_flush { > > > uint8_t nw_proto; /* IP protocol. */ > > > uint8_t pad[7]; /* Align to 64-bit (must be > zero). */ > > > /* Followed by ... */ > > > }; > > > > > > > > > Such interface will also allow us in to have both direction > specified > > > if we would want that in the future. For now we should check > that > > > each TLV type is specified once, i.e. only one tuple is in the > request. > > > > > > We may or may not add NXT_CT_TUPLE_DIRECTION support now. Can > add it > > > in later releases if necessary. Tuples without that TLV can > be treated > > > as original direction. > > > > > > > > > The original idea was something similar but as stated below that > wouldn't really help > > > us in the OVN case. And there is probably no significant benefit > in having partial match > > > with only single direction at time. I'm not sure if I'm not > missing something obvious. > > > > > > > > > > > > > > > > > > > > > > At this point is there actually any value in flushing > basing > > > > on reply direction? Shouldn't we be able to flush the > same > > > > entries by flushing by dst in original direction, if we > will > > > > need to issue multiple requests anyway? > > > > > > > > > > > > That assumption would be fine when you can specify a full > tuple for the original direction. > > > > The problem is that for example OVN we know only the > original DST_IP:DST_PORT (VIP) > > > > and the reply SRC_IP:SRC_PORT (backend). > > > > > > Isn't original DST_IP:DST_PORT equal to reply SRC_IP:SRC_PORT ? > > > > > > > > > Unfortunately no, it's DNATed so for example > > > 'ovn-nbctl lb-add lb1 30.0.0.1 "172.16.1.2,172.16.1.3,172.16.1.4"' > > > will have CT entries as follows: > > > > > > > tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.2,dst=192.168.1.2,sport=80,dport=1),zone=0,mark=2 > > > > tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=2,dport=80),reply=(src=172.16.1.3,dst=192.168.1.2,sport=80,dport=2),zone=0,mark=2 > > > > tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=3,dport=80),reply=(src=172.16.1.4,dst=192.168.1.2,sport=80,dport=3),zone=0,mark=2 > > > > > > We cannot simply filter on the original direction, as we cannot > infer the SRC part (without filtering all entries etc.). > > > > OK. This is what I was asking above and you said > > 'Not sure if it's possible'. So, it is possible, > > I see. > > > > > > Yeah my bad I did not connect the dots right away. > > > > > > > > > > > > > > > > > > > > > > That means we cannot fully specify the original direction > > > > without doing the filtering ourselves. It would be possible > with extra steps from the user. If this interface > > > > is too generic/complicated for that purpose we could say > that the flush extension supports only full orig tuple > > > > and leave the filtering up to the user. That would > definitely work, it wouldn't be that elegant for the user however > > > > from the code change perspective much simpler. > > > > > > I'm not sure I understand that part. I'm not arguing aginst > partial > > > matches. Or do you mean something else? > > > > > > > > > > > I can see a certain benefit of keeping it in OvS: > > > > > > > > 1) The filtering complexity is hidden from the user. > > > > 2) The data transfer between OvS and the user is much > smaller (even if you need to send two matches). > > > > 3) Once implemented it's just ready to be used without the > need of additional thoughts going into it. > > > > > > > > > > > > > > > > > > > > > > > > Did you start working on the OVN side of things? Which > > > > commands OVN will actually use? > > > > > > > > > > > > Yes, I did. The plan is that OVN will use the extension. > > > > > > I understand that. The question was how the actual request > will > > > look like. > > > > > > > > > The testing version used something like this: > > > > > > memset(match, 0, sizeof *match); > > > > > > struct ofputil_ct_tuple *orig = &match->tuple_orig; > > > struct ofputil_ct_tuple *reply = &match->tuple_reply; > > > > > > if (IN6_IS_ADDR_V4MAPPED(&tuple->vip_ip)) { > > > match->l3_type = AF_INET; > > > } else { > > > match->l3_type = AF_INET6; > > > } > > > > > > match->ip_proto = tuple->proto; > > > > > > orig->dst = tuple->vip_ip; > > > orig->dst_port = htons(tuple->vip_port); > > > > > > reply->src = tuple->backend_ip; > > > reply->src_port = htons(tuple->backend_port); > > > > > > tuple is the struct containing lb info, so VIP, VIP_PORT, BACKEND, > BACKEND_PORT, PROTO. > > > match is the "struct ofputil_ct_match". > > > > > > This is with older version of this RFE but it should illustrate > the point. > > > > Assuming the current v5 implementation of the extension, will the end > > result of the request be different if we remove the assignment of > > 'orig=>dst' and 'orig->dst_port' from the code above? > > > > > > It might delete more than we actually want to. Without specifying the > LBs 5-tuple > > we might always remove more entries than we should. As we can have > multiple backends > > for the same VIP (that's the whole point of LB), however nothing > prevents us from two VIPs > > pointing to the same backend. Which would cause an issue if we will go > only with the reply part. > > > OK. That makes sense. Let's keep both tuples then. > > But we need to allow ovs-ofctl to accept both tuples. > > And we should use a "shortcut" only if reply tuple is not specified, > because if I'm asking to delete: > > > > orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.3,dst=192.168.1.2,sport=80,dport=1) > > I'm not expecting the following entry to be deleted: > > > > orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.2,dst=192.168.1.2,sport=80,dport=1) > > Even though these entries probably cannot exist at the same time, > we should not delete the entry that we wasn't asked to delete. > > Best regards, Ilya Maximets. > > That makes sense. I'll work on that + the smaller comments. Thanks, Ales -- Ales Musil Senior Software Engineer - OVN Core Red Hat EMEA <https://www.redhat.com> [email protected] IM: amusil <https://red.ht/sig> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
