Re: [ovs-dev] [PATCH 4/4] conntrack: compact the size of conn structure.
Gaëtan Rivet 于2021年3月3日周三 上午4:01写道: > > On Sun, Nov 29, 2020, at 03:32, hepeng.0320 wrote: > > From: hepeng > > > > This patch tries to use ovs_spin_lock as an alternative to reduce the > > size of each conn. The ovs_mutex_lock consumes 48 bytes while the > > ovs_spin_lock only uses 16 bytes. > > Using a spin-lock means that the thread won't yield. It might be fine for > userland datapath threads, but those are not the only one locking the > connections. The ct_clean thread will do, and will execute on randomly > assigned CPUs, potentially blocking other threads from being scheduled there. > As many conn critical sections are not short, this could impair other modules > beyond conntrack. I guess one way to check if you should use spinlock is to check if your datapath is polling base or not, the spinlock might be harmful for AF_XDP userspace datapath, I guess. since PMD are normally tied into some cores, so the execute time of the critical section is limited and predictable (as normally some special optimization like putting isolcpu in the GRUB), so we can expect that the CT clean thread will not be blocked that long? I see in DPDK code the spinlock is used in many places, I guess in certain cases, that we can switch to use spinlock. Maybe uses pthread_mutex as a default setting, but provide compile macros as in this patch. > > Gaining some bytes is good, but it needs more thought and opinions. There > might be other venues available, less risky, to reduce the size. > > >Also, we remove > > the alg info into an > > extra space as alg is not common in the real world userspace ct use case. > > (mostly used as security group and nat in the cloud). > > Can you submit this as a separate commit? > > > > > The size of conn_tcp: 312 -> 240. > > > > Signed-off-by: Peng He > > Regards, > -- > Gaetan Rivet > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev -- hepeng ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 3/4] conntrack: refactoring alg handle code path
Gaëtan Rivet 于2021年3月3日周三 上午3:49写道: > > On Sun, Nov 29, 2020, at 03:32, hepeng.0320 wrote: > > From: hepeng > > > > Just like the previous patch, split the handle_ftp_ctl into > > handle_ftp_interest and handle_ftp_other, since we can infer > > which one to call from the called positions. > > > > We also introduce a simple framework which unifies the normal > > conn handle path and alg conn handle path by using two hooks: > > *before_conn_update_hook* and *after_conn_update_hook*. > > > > Reading this 'also', I thought initially the commit should be split and the > two changes, dividing the ALG CTL in two and introducing the framework were > unrelated. Reading further, it's clearer now that this is one logical change. > I think the link between the two should be expanded in the commit log. > > The motivation I get for this commit is to "unify the normal and ALG conn > handle path". However, only the FTP ALG benefits from the hooks introduced. > The code is streamlined and a single-purpose function is removed, but it > seems flimsy as a motivation to introduce a new callback system. The userspace CT only supports very few ALGs, and even for FTP ALG, the code can just work for the normal traffic, any retrans TCP will confuse FTP ALG since the seq adjust account both normal and retrans packets. the mini framework is just to unify the code handle path, as for alg process, you have to do something before conn_udpate and after, it's better to have a hook framework better than hard coded. I am not sure if future ALG handlers can benefit from this framework, but it seems it's better than the current implementation. The other part I agree with you. > > Can you expand on the goal of the framework, and which ALG it could help to > implement? > It is hard to judge whether the framework is useful and properly defined > without having a full picture. > > > Signed-off-by: Peng He > > --- > > lib/conntrack-private.h | 16 +- > > lib/conntrack.c | 477 > > 2 files changed, 307 insertions(+), 186 deletions(-) > > > > diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h > > index 6bec43d3f..2aa828674 100644 > > --- a/lib/conntrack-private.h > > +++ b/lib/conntrack-private.h > > @@ -83,7 +83,22 @@ struct alg_exp_node { > > bool nat_rpl_dst; > > }; > > > > +enum ct_alg_ctl_type { > > +CT_ALG_CTL_NONE, > > +CT_ALG_CTL_FTP, > > +CT_ALG_CTL_TFTP, > > +/* SIP is not enabled through Openflow and presently only used as > > + * an example of an alg that allows a wildcard src ip. */ > > +CT_ALG_CTL_SIP, > > +CT_ALG_CTL_MAX, > > I don't see this CT_ALG_CTL_MAX used afterward in this commit? > It is introduced by this change. > > > +}; > > + > > #define CONN_FLAG_NAT_MASK 0xf > > +#define CONN_FLAG_CTL_FTP (CT_ALG_CTL_FTP << 4) > > +#define CONN_FLAG_CTL_TFTP (CT_ALG_CTL_TFTP << 4) > > +#define CONN_FLAG_CTL_SIP (CT_ALG_CTL_SIP << 4) > > +/* currently only 3 algs supported */ > > +#define CONN_FLAG_ALG_MASK 0x70 > > The enum values are not flags. > I guess they should instead be defined as > > #define CONN_FLAG_CTL_FTP (UINT64_C(1) << (4 + CT_ALG_CTL_FTP)) > > In the code this should have no effect as SIP is not implemented, so only ALG > were 1 and 2, which are valid flags. > > > > > struct conn_dir { > > struct cmap_node cm_node; > > @@ -99,7 +114,6 @@ struct conn { > > struct conn_key parent_key; /* Only used for orig_tuple support. */ > > struct ovs_list exp_node; > > uint64_t conn_flags; > > -char *alg; > > > > /* Mutable data. */ > > struct ovs_mutex lock; /* Guards all mutable fields. */ > > diff --git a/lib/conntrack.c b/lib/conntrack.c > > index a22252a63..fffc617fb 100644 > > --- a/lib/conntrack.c > > +++ b/lib/conntrack.c > > @@ -69,15 +69,6 @@ enum ct_alg_mode { > > CT_TFTP_MODE, > > }; > > > > -enum ct_alg_ctl_type { > > -CT_ALG_CTL_NONE, > > -CT_ALG_CTL_FTP, > > -CT_ALG_CTL_TFTP, > > -/* SIP is not enabled through Openflow and presently only used as > > - * an example of an alg that allows a wildcard src ip. */ > > -CT_ALG_CTL_SIP, > > -}; > > - > > struct zone_limit { > > struct hmap_node node; > > struct conntrack_zone_limit czl; > > @@ -153,30 +144,145 @@ static struct ct_l4_proto *l4_protos[] = { > > }; > > > > static void > > -handle_ftp_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx, > > - struct dp_packet *pkt, struct conn *ec, long long now, > > - enum ftp_ctl_pkt ftp_ctl, bool nat); > > - > > +handle_ftp_ctl_other(struct conntrack *ct, const struct > > conn_lookup_ctx *ctx, > > + struct dp_packet *pkt, struct conn *ec, long long > > now, > > + bool nat); > > +static void > > +handle_ftp_ctl_interest(struct conntrack *ct, const struct > > conn_lookup_ctx *ctx, > > +struct dp_packet *pkt, struct conn *ec, long > > long
Re: [ovs-dev] [PATCH 1/4] conntrack: remove nat_conn
Gaëtan Rivet 于2021年3月3日周三 上午1:46写道: > > On Sat, Feb 27, 2021, at 09:49, 贺鹏 wrote: > > Hi, > > > > Thanks for the detailed reviews. > > These patches are like RFC to see if the work is interesting enough. > > > > Aaron Conole 于2021年2月25日周四 上午1:37写道: > > > > > > "hepeng.0320" writes: > > > > > > > From: hepeng > > > > > > > > Currently, when doing NAT, the userspace conntrack will use an extra > > > > conn for the two directions in a flow. However, each conn has actually > > > > the two keys for both orig and rev flow directions. This patch > > > > introduces a conn_dir member in the conn and it consists of both rev and > > > > orig cmap_node for hash lookup. This saves extra allocation for > > > > nat_conn, and makes userspace code much cleaner. > > > > > > If I'm understanding this correctly, you still re-insert the conn into > > > the conn list? > > > > > > Yes. This is to insert the rev key into the cmap also. So for the nat > > traffic, > > both orig and rev key are in the cmap. > > > > > > > > > We also introduces a conn_flags member to reduce the memory footprint > > > > of a > > > > conn. > > > > > > We should split this out to a separate patch. > > > > > > > Signed-off-by: Peng He > > > > --- > > > > lib/conntrack-private.h | 20 +-- > > > > lib/conntrack.c | 264 > > > > 2 files changed, 121 insertions(+), 163 deletions(-) > > > > > > > > diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h > > > > index 789af82ff..6bec43d3f 100644 > > > > --- a/lib/conntrack-private.h > > > > +++ b/lib/conntrack-private.h > > > > @@ -83,21 +83,23 @@ struct alg_exp_node { > > > > bool nat_rpl_dst; > > > > }; > > > > > > > > -enum OVS_PACKED_ENUM ct_conn_type { > > > > -CT_CONN_TYPE_DEFAULT, > > > > -CT_CONN_TYPE_UN_NAT, > > > > +#define CONN_FLAG_NAT_MASK 0xf > > > > + > > > > +struct conn_dir { > > > > +struct cmap_node cm_node; > > > > +bool orig; > > > > > > This naming is confusing. We can have 'conn->orig->orig'. Consider > > > renaming one of these fields to distinguish what is going on. I would > > > prefer seeing 'fwd' used (since it's the forward direction). > > > > Yes, agree. > > > > > > > > > }; > > > > > > > > struct conn { > > > > /* Immutable data. */ > > > > struct conn_key key; > > > > +struct conn_dir orig; > > > > struct conn_key rev_key; > > > > +struct conn_dir rev; > > > > > > I think this might be better if setup like: > > > +enum ct_direction { > > > +CT_DIRECTION_FWD, > > > +CT_DIRECTION_REV, > > > +CT_DIRECTIONS > > > +} > > > + > > > +struct conn_data { > > > +struct conn_key key; > > > +struct cmap_node cm_node; > > > }; > > > > > > struct conn { > > > -struct conn_key key; > > > -struct conn_key rev_key; > > > +struct conn_data cn_data[CT_DIRECTIONS]; > > > ... > > > > > > Then in the code, we can always get orig and rev information: > > > > > > conn->cn_data[CT_DIRECTION_FWD] > > > conn->cn_data[CT_DIRECTION_REV] > > > > > > Did I miss something? > > > > Since both origin and rev are in the cmap, when you lookup the hash table, > > you get a pointer to the 'middle data structure', in the patch, it is > > the conn_dir. > > > > However, you are not sure what you get is the origin dir or the rev dir. > > The key > > is to use a variable stored in the conn_dir, like 'fwd' or 'orig'. > > Then you know the dir, > > by knowing the dir, you know the offset between your pointer and the > > pointer to > > the conn, just like the belowing code: > > > > static inline struct conn * conn_from_conndir(struct conn_dir *conndir) { > > return conndir->orig ? CONTAINER_OF(conndir, struct conn, orig) : \ > >CONTAINER_OF(conndir, struct conn, rev); > > } > > > > In your case: > > > > static inline struct conn * conn_from_conndir(struct conn_dir *conndir) { > > return conndir->orig ? CONTAINER_OF(conndir, struct conn, cn_data[FWD]) > > : \ > >CONTAINER_OF(conndir, struct conn, cn_data[REV]); > > } > > > > In the following patches I've changed the code of conn_dir as a part > > of conn_key > > I haven't seen it in the following patches, I see in the last commit still > struct conn_dir on its own, and not as part of conn_key. Maybe I > misunderstood what you meant? > > Also, logically I'd consider the conn_key as part of the conn_dir, not the > inverse. I would expect conn_key to contain the whole key and only the key of > the conn to compute its hash, no other elements such as a dir mark or a cmap > node. > > would it make sense to have something like > > struct conn_dir { > enum ct_direction dir; > struct conn_key key; > struct cmap_node cm_node; > }; > > struct conn { > [...] > struct conn_dir dir[CT_DIRECTIONS]; > }; > > So the same as what Aaron proposed, but with 'conn_data' renamed as > 'conn_dir' and the enum ct_directions used to mark each conn_dir.
Re: [ovs-dev] conntrack: code refactoring userspace ct.
Hi, I've read all your suggestions and appreciate that. Just have been through a very busy week, and probably will have a very busy month, so the work and reply might be slow. Will do a rebase in the next version. I've seen some of your patches related to CT. do you have some plans for CT optimization? is it a part of plans related to CT offload? Gaëtan Rivet 于2021年3月2日周二 下午10:47写道: > > On Sun, Nov 29, 2020, at 03:32, hepeng.0320 wrote: > > This series of patch refactor the code of userspace CT, which makes > > the code more simple and more structured. We mainly refactor the code > > path for NAT and ALG, using conn_flags instead of bool value and use a > > light weight lock to reduce the size of the conn. The size of conn has > > been reduced from 312 to 240 bytes. > > > > > > Hello, > > This series is interesting, but it needs some work. > > Commits should be divided into smaller elements (as Aaron pointed out), and > some changes are > barely described, or bunched with other changes without enough justification. > > Could you rebase and rework the series to have a more organized patchset? > That would make the review easier. > > Regards, > -- > Gaetan > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev -- hepeng ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3] ofproto-dpif-xlate: fix zone set from non-frozen-metadata fields
Mark Gray 于2021年3月5日周五 下午7:54写道: > > On 27/02/2021 09:34, Peng He wrote: > > CT zone could be set from a field that is not included in frozen > > metadata. Consider the example rules which are typically seen in > > OpenStack security group rules: > > > > priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0) > > priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2 > > > > The zone is set from the first rule's ct action. These two rules will > > generate two megaflows: the first one uses zone=5 to query the CT module, > > the second one sets the zone-id from the first megaflow and commit to CT. > > > > The current implementation will generate a megaflow that does not use > > ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone > > is > > set by an Imm not a field. > > > > Consider a situation that one changes the zone id (for example to 15) > > in the first rule, however, still keep the second rule unchanged. During > > this change, there is traffic hitting the two generated megaflows, the > > revaldiator would revalidate all megaflows, however, the revalidator will > > not change the second megaflow, because zone=5 is recorded in the > > megaflow, so the xlate will still translate the commit action into zone=5, > > and the new traffic will still commit to CT as zone=5, not zone=15, > > resulting in taffic drops and other issues. > > > > Just like OVS set-field convention, if a field X is set by Y > > (Y is a variable not an Imm), we should also mask Y as a match > > in the generated megaflow. An exception is that if the zone-id is > > set by the field that is included in the frozen state (i.e. regs) and this > > upcall is a resume of a thawed xlate, the un-wildcarding can be skipped, > > as the recirc_id is a hash of the values in these fields, and it will change > > following the changes of these fields. When the recirc_id changes, > > all megaflows with the old recirc id will be invalid later. > > This looks good to me and all the unit-tests pass. > > There is some trailing whitespace. You can run > "./utilities/checkpatch.py" when submitting to catch them before the > 0-day robot does. Its a bit of a nit and I don't know if this won't get > committed because of it. That's a decision for a maintainer. thanks for your kind suggestion . better send a new version, but before that maybe there are some more suggestions from maintainers, and it's better to wait and then resend. > > > > Fixes: 07659514c3 ("Add support for connection tracking.") > > Reported-by: Sai Su > > Signed-off-by: Peng He > > --- > > include/openvswitch/meta-flow.h | 1 + > > lib/meta-flow.c | 13 ++ > > ofproto/ofproto-dpif-xlate.c| 12 + > > tests/system-traffic.at | 45 + > > 4 files changed, 71 insertions(+) > > > > diff --git a/include/openvswitch/meta-flow.h > > b/include/openvswitch/meta-flow.h > > index 95e52e358..045dce8f5 100644 > > --- a/include/openvswitch/meta-flow.h > > +++ b/include/openvswitch/meta-flow.h > > @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *, > >const union mf_value *mask, > >struct flow *); > > bool mf_is_tun_metadata(const struct mf_field *); > > +bool mf_is_frozen_metadata(const struct mf_field *); > > bool mf_is_pipeline_field(const struct mf_field *); > > bool mf_is_set(const struct mf_field *, const struct flow *); > > void mf_mask_field(const struct mf_field *, struct flow_wildcards *); > > diff --git a/lib/meta-flow.c b/lib/meta-flow.c > > index c808d205d..e03cd8d0c 100644 > > --- a/lib/meta-flow.c > > +++ b/lib/meta-flow.c > > @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf) > > mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS; > > } > > > > +bool > > +mf_is_frozen_metadata(const struct mf_field *mf) > > +{ > > +if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) { > > +return true; > > +} > > + > > +if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) { > > +return true; > > +} > > +return false; > > +} > > + > > bool > > mf_is_pipeline_field(const struct mf_field *mf) > > { > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > > index 7108c8a30..14d00db1e 100644 > > --- a/ofproto/ofproto-dpif-xlate.c > > +++ b/ofproto/ofproto-dpif-xlate.c > > @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, > > struct ofpact_conntrack *ofc, > > > > if (ofc->zone_src.field) { > > zone = mf_get_subfield(>zone_src, >xin->flow); > > +if (ctx->xin->frozen_state) { > > +/* If the upcall is a resume of a recirculation, we only need > > to > > + * unwildcard the fields that are not in the frozen_metadata, > > as > > + * when the rules update, OVS will generate a new recirc_id, > > + * which
[ovs-dev] [PATCH ovn] ovn-northd-ddlog: Fix startup race updating nb_cfg_timestamp.
The DDlog code used by ovn-northd-ddlog relies on always having a value (exactly one of them) in the NbCfgTimestamp relation. The C code inserts and updates this value. It's supposed to insert the initial value the first time that it processes any update from the northbound database. However, it actually only did this if the first update it processed was from the northbound database--if it was from the southbound database, it forgot to do so even when the first northbound update showed up. The DDlog code would then go kind of berzerk creating and deleting the NB_Global record until something actually happened to increment nb_cfg (such as "ovn-nbctl --wait=sb sync"). This fixes the problem by only recording that a value was inserted to NbCfgTimestamp when the first update from the northbound database is received, same as originally intended. Signed-off-by: Ben Pfaff --- northd/ovn-northd-ddlog.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/northd/ovn-northd-ddlog.c b/northd/ovn-northd-ddlog.c index aa0ea73e401d..f373bb129a89 100644 --- a/northd/ovn-northd-ddlog.c +++ b/northd/ovn-northd-ddlog.c @@ -445,8 +445,10 @@ northd_parse_updates(struct northd_ctx *ctx, struct ovs_list *updates) if (ddlog_commit(ctx->ddlog)) { goto error; } -old_nb_cfg = new_nb_cfg; -old_nb_cfg_timestamp = new_nb_cfg_timestamp; +if (ctx->has_timestamp_columns) { +old_nb_cfg = new_nb_cfg; +old_nb_cfg_timestamp = new_nb_cfg_timestamp; +} /* This update may have implications for the other side, so * immediately wake to check for more changes to be applied. */ -- 2.29.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn] ovn-sbctl: Sort "dump-flows" output by actions if all else fails.
The Logical_Flow table can have rows that differ only in actions. Such rows would have identical matches and priorities and be in the same datapath. At first glance, any these rows would be a bug, since they'd match the same packets and there'd be no way to choose between them. In practice, though, actions can have prerequisites. If two different flows have mutually exclusive prerequisites, the otherwise identical flows are actually valid. This comes up in practice in DNS response flows. There are two of them, in the same table and pipeline and with the same match expression "udp.dst == 53 && reg0[4]", but one of them includes the action "ip4.src <-> ip4.dst;" and the other "ip6.src <-> ip6.dst;". Because the first has an IPv4 prerequisite and the latter an IPv6 prerequisite, their matches don't really overlap. Anyway, regardless of whether they're valid, it's still good to sort consistently, so that's all a digression. Signed-off-by: Ben Pfaff --- utilities/ovn-sbctl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/utilities/ovn-sbctl.c b/utilities/ovn-sbctl.c index c38e8ec3bdbd..94e33d2bdfe6 100644 --- a/utilities/ovn-sbctl.c +++ b/utilities/ovn-sbctl.c @@ -754,13 +754,14 @@ sbctl_lflow_cmp(const void *a_, const void *b_) int a_pipeline = pipeline_encode(a->pipeline); int b_pipeline = pipeline_encode(b->pipeline); -return (a_pipeline > b_pipeline ? 1 +cmp = (a_pipeline > b_pipeline ? 1 : a_pipeline < b_pipeline ? -1 : a->table_id > b->table_id ? 1 : a->table_id < b->table_id ? -1 : a->priority > b->priority ? -1 : a->priority < b->priority ? 1 : strcmp(a->match, b->match)); +return cmp ? cmp : strcmp(a->actions, b->actions); } static char * -- 2.29.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn 1/2] tests: Fix bad syntax in "test" for waiting.
'test "$a"' is always true if $abc is nonempty. Spaces are necessary to test for equality, e.g. 'test $a = $b'. Signed-off-by: Ben Pfaff --- tests/ovn-controller.at | 4 ++-- tests/ovn.at| 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at index 2cd3e261f2eb..300872aa1852 100644 --- a/tests/ovn-controller.at +++ b/tests/ovn-controller.at @@ -165,11 +165,11 @@ OVS_WAIT_UNTIL([check_datapath_type foo]) # Change the br-int's datapath type to bar. # It should be reset to foo since ovn-bridge-datapath-type is configured. ovs-vsctl set Bridge br-int datapath-type=bar -OVS_WAIT_UNTIL([test foo=`ovs-vsctl get Bridge br-int datapath-type`]) +OVS_WAIT_UNTIL([test foo = `ovs-vsctl get Bridge br-int datapath-type`]) OVS_WAIT_UNTIL([check_datapath_type foo]) ovs-vsctl set Open_vSwitch . external_ids:ovn-bridge-datapath-type=foobar -OVS_WAIT_UNTIL([test foobar=`ovs-vsctl get Bridge br-int datapath-type`]) +OVS_WAIT_UNTIL([test foobar = `ovs-vsctl get Bridge br-int datapath-type`]) OVS_WAIT_UNTIL([check_datapath_type foobar]) expected_iface_types=$(ovs-vsctl get Open_vSwitch . iface_types | tr -d '[[]] ""') diff --git a/tests/ovn.at b/tests/ovn.at index 39edb6b85526..87579e59d7a5 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -11084,7 +11084,7 @@ sent_garp="0101020381020806000108000604000101010203c0a80 OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [foo1.expout]) # Wait until we receive atleast 1 packet -OVS_WAIT_UNTIL([test 1=`$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv2/br-ex_n2-tx.pcap | wc -l`]) +OVS_WAIT_UNTIL([test 1 = `$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv2/br-ex_n2-tx.pcap | wc -l`]) $PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv2/br-ex_n2-tx.pcap | head -1 > packets echo $sent_garp > expout AT_CHECK([cat packets], [0], [expout]) -- 2.29.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn 2/2] tests: Add missing sync in "controller I-P handling when lrp added last".
Signed-off-by: Ben Pfaff --- tests/ovn.at | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tests/ovn.at b/tests/ovn.at index 87579e59d7a5..b751d6db2e4f 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -22235,7 +22235,8 @@ ovn-nbctl lsp-set-addresses sw0-lr0 00:00:00:00:00:01 ovn-nbctl lr-add lr0 ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:00:01 192.168.1.1/24 aef0:0:0:0:0:0:0:1/64 -OVS_WAIT_UNTIL([test x$(ovn-nbctl lsp-get-up sw0-p1) = xup]) +wait_for_ports_up sw0-p1 +ovn-nbctl --wait=sb sync sw0_dpkey=$(ovn-sbctl --bare --columns tunnel_key list datapath_binding sw0) lr0_dpkey=$(ovn-sbctl --bare --columns tunnel_key list datapath_binding lr0) -- 2.29.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn] binding: Fix potential NULL dereference of lbinding.
There is a valid code path that can lead to 'lbinding' being NULL in handle_deleted_vif_lport(). Make sure we check for it. Found by code inspection. Fixes: 68cf9fdceba8 ("binding: Fix container port removal from local bindings.") Signed-off-by: Dumitru Ceara --- controller/binding.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/controller/binding.c b/controller/binding.c index 2b19fd0..4e6c756 100644 --- a/controller/binding.c +++ b/controller/binding.c @@ -2153,10 +2153,12 @@ handle_deleted_vif_lport(const struct sbrec_port_binding *pb, /* If the container port is removed we should also remove it from * its parent's children set. */ -if (lbinding->parent) { -local_binding_delete_child(lbinding->parent, lbinding); +if (lbinding) { +if (lbinding->parent) { +local_binding_delete_child(lbinding->parent, lbinding); +} +local_binding_destroy(lbinding); } -local_binding_destroy(lbinding); } handle_deleted_lport(pb, b_ctx_in, b_ctx_out); -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] RFC: netdev-afxdp: Support for XDP metadata HW hints.
Thanks Björn and Jesper for feedbacks. On Fri, Mar 5, 2021 at 7:34 AM Björn Töpel wrote: > > On 2021-03-05 16:07, Jesper Dangaard Brouer wrote: > > On Fri, 5 Mar 2021 14:56:02 +0100 > > Björn Töpel wrote: > > > >> On 2021-03-05 11:13, Jesper Dangaard Brouer wrote: > > > [...] > >> > >> You'd like to use the length of metadata to express if it's available or > >> not. I'm stating that we don't need that for AF_XDP. In XDP if we're > >> accessing the field, we need to validate access due to the verifier. But > >> it's not to know if it's actually there. When we attach a certain XDP to > >> a device, we can query what offload it supports, and the the application > >> knows that meta data is structured in a certain way. > >> > >> Some application has simple metadata: > >> struct tstamp { u64 rx; }; > >> > >> That application will unconditionally look at (struct tstamp > >> *)(pkt-sizeof(struct tstamp)->rx;. > > > > I imagine that NIC hardware will/can provide packets with different > > metadata. Your example with timestamps are a good example, as some > > drivers/HW only support timestamping PTP packets. Thus, I don't want > > to provide metadata info on tstamp if HW didn't provide it (and I also > > don't want to spend CPU-cycles on clearing the u64 field with zero). > > Another example is vlan IDs is sometimes provided by hardware. > > > > Thus, on a per packet basis the metadata can change. > > Is this future work? Looking at Saeed's patch, I thought we query the netdev through BTF netlink API, and we get a BTF ID and C-struct per-netdev. And this will be done when OVS control plane, when a user attaches a device to its bridge, we query its metadata structure. > > Yes, I agree. For a certain set of packets, there can be different > metadata per packet that, as you pointed out, can be dispatched do > different handlers. > > >> Some might be more flexible and have: > >> struct metadata { char storage[64]; int btf_id; }; > > > > I really like the idea of placing the btf_id as the last entry, that > > way we know its location. I can get the btf_id via packet minus-offset > > sizeof(int). > > [...] > > > > BTF sort-of make the struct names identifiers and API. And gives *BPF* > > programs the ability to adjust offsets, when loading the BPF-prog into > > the kernel. > > > > Driver-1 choose: > > > > struct xdp_md_desc { > > uint32_t flow_mark; > > uint32_t hash32; > > } __attribute__((preserve_access_index)); > > > > Driver-2 choose: > > > > struct xdp_md_desc { > > uint32_t hash32; > > uint32_t flow_mark; > > } __attribute__((preserve_access_index)); > > > > The BPF code can access the members, and kernel will remap-struct > > access, and internally in the kernel create two different BPF-programs > > with adjusted offsets in the byte-code. > > > > How do we handle this in the OVS userspace C-code? > > BTW, do we assume all vendors use the same field name for same purpose? For example: When OVS query Intel or mlx, if rxhash is available, they should use the same name as "hash32". Otherwise, I don't know how to associate it into OVS code logic. If that's the case, I can do (in userspace C code, not BPF program) 1) When attaching a device, query its MD struct and MD's size using BTF info. 2) From the BTF info returned, check if string "hash32" exists. if yes, then calculate its offset from the beginning of the MD. 3) OVS cat set it by calling "dp_packet_set_rss(packet, md[rxhash_offset]);" > > Longer-term, I think extending the linkers to support struct member > relocations would be a good thing! I.e. simply support > __attribute__((preserve_access_index)) in userland. That would enable > very powerful userland/kernel interface, like what we're discussing now. > > But I wonder if it's too big of an initial undertaking to do that as a > first step? Maybe non-relocatable struct could be a start. > Having relocatable struct would be great. So I can just use 'md->hash32', instead of offset like above. > I definitely agree that the goal should be relocatable structs! > > > Maybe/hopefully you have a better idea than mine, which is simply to > > compile two C-programs (or program for-each know BTF-id) and dispatch > > based on the BTF-id. > > Thanks William ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2] northd: Fix the missing force_snat_for_lb flows when router_ip is configured.
On Fri, Mar 05, 2021 at 02:53:47PM +0530, num...@ovn.org wrote: > From: Numan Siddique > > The commit c6e21a23bd8 which supported the option 'lb_force_snat_ip=router_ip' > on a gateway router, missed out on > - updating the flows in 'lr_in_dnat' to set 'flags.force_snat_for_lb = 1'. > - removing the flow to drop if ip.dst == router port IP in 'lr_in_ip_input' > stage. > > This patch fixes these issue and adds a system test to cover the > hairpin load balancer traffic for the gateway routers. > > Fixes: c6e21a23bd8("northd: Provide the Gateway router option > 'lb_force_snat_ip' to take router port ips.") > > CC: Ben Pfaff > Signed-off-by: Numan Siddique Thanks! Acked-by: Ben Pfaff ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] rhel: increase timeout of ovsdb-server.service to 300s
On Fri, Mar 05, 2021 at 12:47:05PM +0100, Timothy Redaelli wrote: > In some scenarios starting ovsdb-server takes more than 90 seconds and > so ovsdb-server.service is marked as failed. > > Signed-off-by: Timothy Redaelli I'm surprised it takes so long. What leads to that? Is it a bug in ovsdb-server? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] northd: Remove the usage of 'ct.inv' in logical flows.
On Fri, Mar 5, 2021 at 11:53 AM Han Zhou wrote: > > On Thu, Mar 4, 2021 at 5:25 PM Numan Siddique wrote: > > > > On Fri, Mar 5, 2021 at 4:22 AM Han Zhou wrote: > > > > > > On Mon, Mar 1, 2021 at 5:40 AM Numan Siddique wrote: > > > > > > > > On Fri, Feb 26, 2021 at 1:15 AM Han Zhou wrote: > > > > > > > > > > On Thu, Feb 25, 2021 at 1:25 AM Numan Siddique > wrote: > > > > > > > > > > > > On Thu, Feb 25, 2021 at 1:12 PM Han Zhou wrote: > > > > > > > > > > > > > > On Wed, Feb 24, 2021 at 5:27 AM wrote: > > > > > > > > > > > > > > > > From: Numan Siddique > > > > > > > > > > > > > > > > Presently we add 65535 priority lflows in the stages - > > > > > > > > 'ls_in_acl' and 'ls_out_acl' to drop packets which > > > > > > > > match on 'ct.inv'. > > > > > > > > > > > > > > > > As per the 'ovs-fields' man page, this > > > > > > > > ct state field can be used to identify problems such as: > > > > > > > > • L3/L4 protocol handler is not loaded/unavailable. > > > > > > > > > > > > > > > > • L3/L4 protocol handler determines that the packet is > > > > > > > > malformed. > > > > > > > > > > > > > > > > • Packets are unexpected length for protocol. > > > > > > > > > > > > > > > > This patch removes the usage of this field for the following > > > > > > > > reasons: > > > > > > > > > > > > > > > > • Some of the smart NICs which support offloading datapath > > > > > > > >flows don't support this field. > > > > > > > > > > > > > > What do you mean by "don't support this field"? Do you mean the > NIC > > > > > > > offloading supports connection tracking, but cannot detect if a > > > packet > > > > > is > > > > > > > invalid and always populate the ct.inv as 0? > > > > > > > > > > > > I think so. From what I understand, the kernel conntrack feature > is > > > used > > > > > > for the actual connection tracking. So NIC can't tell if the > packet is > > > > > > invalid or not > > > > > > (say due to out-of-window tcp errors). > > > > > > > > > > > I know some NICs support CT offloading and some doesn't. > > > > > So here what you are referring are the NICs that doesn't support CT > > > > > offloading, which falls back to kernel datapath when CT is used, is > it? > > > If > > > > > this is the case, then even without ct.inv it still couldn't support > > > > > ct.est, etc. right? > > > > > Or, do you mean this is specifically for NICs that support CT > offloading > > > > > but not ct.inv, i.e. it can do regular conntrack in NIC but just > can't > > > > > identify out-of-window packets, and that's why it supports ct.est > but > > > not > > > > > ct.inv? > > > > > I am still quite confused. Could clarify a little more, which types > of > > > NICs > > > > > would benefit from this, and how? > > > > > > > > I'm not sure if I can explain the issue well. Can you please look > > > > into this bugzilla - > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1921946 > > > > We can discuss further if you have further questions or comments. > > > > > > > Unfortunately this one seems to require access permission. > > > > Ok. Let me try to share in some other way. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > • A recent commit in kernel ovs datapath sets the committed > > > > > > > >connection tracking entry to be liberal for out-of-window > > > > > > > >tcp packets (nf_ct_set_tcp_be_liberal()). Such TCP > > > > > > > >packets will not be marked as invalid. > > > > > > > > > > > > > > > > > > > > > > Could you share a link to this commit? > > > > > > > > > > > > Sure. > > > > > > > > > > > > > > > https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next/+/e2ef5203c817a60bfb591343ffd851b6537370ff > > > > > > > > > > > Thanks for sharing. So OVS is not capable of detecting a > out-of-window > > > > > packet now. Could you explain more about the motivation? I couldn't > get > > > the > > > > > full picture from commit message of that patch. Do you have a link > that > > > > > discusses more details? > > > > > > > > Let me share with you the patch which I first submitted to handle this > > > issue. > > > > During the review, @Florian Westphal suggested being liberal for > > > out-of-window > > > > packets to solve this issue. > > > > > > > > I think the patch discussions have enough information. Please let me > know > > > > if you have further questions or comments and we can discuss further. > > > > > > > > Initial approach taken by me - > > > > > > > > https://patchwork.ozlabs.org/project/netdev/patch/20201006083355.121018-1-nusid...@redhat.com/ > > > > Final approach taken - > > > > > > > > https://patchwork.ozlabs.org/project/netdev/patch/20201109072930.14048-1-nusid...@redhat.com/ > > > > > > > Thanks for the pointers. Now I have a better context. It seems all these > > > work was to deal with (optimize) the LB (stateful) + stateless ACL use > > > cases. > > > 1. we don't want to track packets coming from VIF (because there is no > > > stateful ACL) > > > 2. but
[ovs-dev] PMD auto load balance experimental tag
Hi All, PMD auto load balance (ALB) is an experimental feature since OVS 2.11. In short, RxQs are reassigned to PMDs based on load during a reconfig. However, there might not be a reconfig for a long time so ALB runs periodically and checks PMD thread load and the load variance between PMD threads. If load is high and it's estimated that it will be improved by a certain amount after reassigning RxQs to PMDs, a reassignment is triggered. ALB calls the reassignment code similar to if there was a reconfig or if the rebalance command is used. There are some ideas for how to improve RxQ to PMD assignments in general, but that's a slightly different topic and can be a later optimization if needed. ALB is documented more here: https://docs.openvswitch.org/en/latest/topics/dpdk/pmd/?highlight=rebalance#automatic-assignment-of-port-rx-queue-to-pmd-threads-experimental I would like to progress it so the experimental tag on ALB can be removed. There are a few things I think are needed: - Need to restructure the code so that ALB reassignment dry run uses the same code path as the reassignment code it will call - ALB operation should be understandable through logs and stats e.g. Should be able to confirm setting state/parameters in logs and see if ALB is taking some action. Should be able to get more debug if needed etc. - Unit tests Some patches here: http://patchwork.ozlabs.org/project/openvswitch/patch/20210218104802.293740-2-ktray...@redhat.com/ - Iron out if the current ALB enable/disable user interface logging based partly min resource requirements is right or we want to it to run regardless i.e. https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380386.html - Review/Improve operation documentation - Further testing Thoughts? thanks, Kevin. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH branch-2.11 0/2] Release patches for v2.11.7.
On 3/5/21 4:50 PM, Stokes, Ian wrote: >> These patches are for the *final* stable release of OVS 2.11. >> Includes a couple of a latest bug fixes and validated with the >> final stable DPDK 18.11 release. >> >> Since 2.5 reached it's EOL, we're no longer supporting branches >> below 2.12. This release for 2.11 is mostly to close all the >> remaining documentation gaps. >> >> 2.13 is our current LTS release and 2.15 is our latest stable. >> And according to our release documentation, while 2.12 and 2.14 >> are not formally maintained we're providing releases for these >> branches once in a while, however these branches may not include >> all the fixes that LTS or latest stable does. >> >> There is still 'Prepare for 2.11.8' patch, but it's here just to >> follow the release process and not break my scripts. :) >> >> Ilya Maximets (2): >> Set release date for 2.11.7. >> Prepare for 2.11.8. >> >> NEWS | 6 +- >> configure.ac | 2 +- >> debian/changelog | 8 +++- >> 3 files changed, 13 insertions(+), 3 deletions(-) >> >> -- >> 2.26.2 > > Acked for the series. > > BR > Ian Stokes > Thanks! Applied to branch-2.11 and tagged. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH branch-2.11 0/2] Release patches for v2.11.7.
> These patches are for the *final* stable release of OVS 2.11. > Includes a couple of a latest bug fixes and validated with the > final stable DPDK 18.11 release. > > Since 2.5 reached it's EOL, we're no longer supporting branches > below 2.12. This release for 2.11 is mostly to close all the > remaining documentation gaps. > > 2.13 is our current LTS release and 2.15 is our latest stable. > And according to our release documentation, while 2.12 and 2.14 > are not formally maintained we're providing releases for these > branches once in a while, however these branches may not include > all the fixes that LTS or latest stable does. > > There is still 'Prepare for 2.11.8' patch, but it's here just to > follow the release process and not break my scripts. :) > > Ilya Maximets (2): > Set release date for 2.11.7. > Prepare for 2.11.8. > > NEWS | 6 +- > configure.ac | 2 +- > debian/changelog | 8 +++- > 3 files changed, 13 insertions(+), 3 deletions(-) > > -- > 2.26.2 Acked for the series. BR Ian Stokes ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.11 2/2] Prepare for 2.11.8.
Signed-off-by: Ilya Maximets --- NEWS | 3 +++ configure.ac | 2 +- debian/changelog | 6 ++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index afda9198b..b8510e01e 100644 --- a/NEWS +++ b/NEWS @@ -1,3 +1,6 @@ +v2.11.8 - xx xxx +- + v2.11.7 - 05 Mar 2021 - - Bug fixes diff --git a/configure.ac b/configure.ac index c627448a7..107429d41 100644 --- a/configure.ac +++ b/configure.ac @@ -13,7 +13,7 @@ # limitations under the License. AC_PREREQ(2.63) -AC_INIT(openvswitch, 2.11.7, b...@openvswitch.org) +AC_INIT(openvswitch, 2.11.8, b...@openvswitch.org) AC_CONFIG_SRCDIR([datapath/datapath.c]) AC_CONFIG_MACRO_DIR([m4]) AC_CONFIG_AUX_DIR([build-aux]) diff --git a/debian/changelog b/debian/changelog index b3322f0a6..5b1d53da3 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,3 +1,9 @@ +openvswitch (2.11.8-1) unstable; urgency=low + [ Open vSwitch team ] + * New upstream version + + -- Open vSwitch team Fri, 05 Mar 2021 16:27:47 +0100 + openvswitch (2.11.7-1) unstable; urgency=low [ Open vSwitch team ] * New upstream version -- 2.26.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.11 0/2] Release patches for v2.11.7.
These patches are for the *final* stable release of OVS 2.11. Includes a couple of a latest bug fixes and validated with the final stable DPDK 18.11 release. Since 2.5 reached it's EOL, we're no longer supporting branches below 2.12. This release for 2.11 is mostly to close all the remaining documentation gaps. 2.13 is our current LTS release and 2.15 is our latest stable. And according to our release documentation, while 2.12 and 2.14 are not formally maintained we're providing releases for these branches once in a while, however these branches may not include all the fixes that LTS or latest stable does. There is still 'Prepare for 2.11.8' patch, but it's here just to follow the release process and not break my scripts. :) Ilya Maximets (2): Set release date for 2.11.7. Prepare for 2.11.8. NEWS | 6 +- configure.ac | 2 +- debian/changelog | 8 +++- 3 files changed, 13 insertions(+), 3 deletions(-) -- 2.26.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.11 1/2] Set release date for 2.11.7.
Signed-off-by: Ilya Maximets --- NEWS | 3 ++- debian/changelog | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/NEWS b/NEWS index 11a7b40fc..afda9198b 100644 --- a/NEWS +++ b/NEWS @@ -1,5 +1,6 @@ -v2.11.7 - xx xxx +v2.11.7 - 05 Mar 2021 - + - Bug fixes - DPDK * OVS validated with DPDK 18.11.11. Due to this being the final release in the DPDK 18.11 series it is recommended to be used. diff --git a/debian/changelog b/debian/changelog index d1e4505c4..b3322f0a6 100644 --- a/debian/changelog +++ b/debian/changelog @@ -2,7 +2,7 @@ openvswitch (2.11.7-1) unstable; urgency=low [ Open vSwitch team ] * New upstream version - -- Open vSwitch team Wed, 10 Feb 2021 16:07:06 +0100 + -- Open vSwitch team Fri, 05 Mar 2021 16:27:47 +0100 openvswitch (2.11.6-1) unstable; urgency=low [ Open vSwitch team ] -- 2.26.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] RFC: netdev-afxdp: Support for XDP metadata HW hints.
On 2021-03-05 16:07, Jesper Dangaard Brouer wrote: On Fri, 5 Mar 2021 14:56:02 +0100 Björn Töpel wrote: On 2021-03-05 11:13, Jesper Dangaard Brouer wrote: [...] We also want this sizeof-offset to be dynamic. And I hope that AF_XDP could provide this info, on the size of data_meta area. I need some help from Magnus and Bjørn here? Look at the kernel-side code of AF_XDP/XSK I don't see this being transferred? Help is this true? Can we add the info? XDP sockets make sure that the meta data is propagated to userland. However, the meta data *size* is not passed via the AF_XDP descriptors. This was per-design; The metadata is a contract between the XDP program and the XDP socket, and in the future maybe the driver as well. So, we're simply stating that "your AF_XDP application should know if there's meta data or not, and how it's structured". I guess we could explore starting to use the options bits of struct xdp_desc for this, but only if we're *really* seeing a need for it. If the XDP program would like to pass it's length, it can do that via the metadata area as well. struct meta { int len; I'm not interested in having the len part of metadata. The btf_id will basically/indirectly give us the length. Ok! ... ... }; index = addr >> FRAME_SHIFT; xpacket = >xpool.array[index]; packet = >packet; @@ -868,6 +881,12 @@ netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch, OVS_XDP_HEADROOM); dp_packet_set_size(packet, len); +/* FIXME: This should be done by detecting whether + * XDP MD is enabled or not. Ex: + * $ bpftool net xdp set dev enp2s0f0np0 md_btf on + */ +dp_packet_set_rss_hash(packet, md->hash32); The general idea to make this code more dynamic is to let AF_XDP provide the meta-data-len and then use the (proposed) btf_id to know what "callback" C-code to call. Hmm, again, here I might have a bit of a different view. Personally I think it's better that all the "NIC/XDP/socket" does the negotiation in the control/setup phase, and keep the fast-path lean. Instead of: recv(void *data) { len = get_meta_data_len(data); if (len) { // check this, check that, if-else-if-else } ... } You'd like to use the length of metadata to express if it's available or not. I'm stating that we don't need that for AF_XDP. In XDP if we're accessing the field, we need to validate access due to the verifier. But it's not to know if it's actually there. When we attach a certain XDP to a device, we can query what offload it supports, and the the application knows that meta data is structured in a certain way. Some application has simple metadata: struct tstamp { u64 rx; }; That application will unconditionally look at (struct tstamp *)(pkt-sizeof(struct tstamp)->rx;. I imagine that NIC hardware will/can provide packets with different metadata. Your example with timestamps are a good example, as some drivers/HW only support timestamping PTP packets. Thus, I don't want to provide metadata info on tstamp if HW didn't provide it (and I also don't want to spend CPU-cycles on clearing the u64 field with zero). Another example is vlan IDs is sometimes provided by hardware. Thus, on a per packet basis the metadata can change. Yes, I agree. For a certain set of packets, there can be different metadata per packet that, as you pointed out, can be dispatched do different handlers. Some might be more flexible and have: struct metadata { char storage[64]; int btf_id; }; I really like the idea of placing the btf_id as the last entry, that way we know its location. I can get the btf_id via packet minus-offset sizeof(int). If you need an "availability check", then you can solve that. int xdp_prog(struct xdp_md *ctx) { struct metadata *m = (void *)(long)ctx->data_meta; void *data = (void *)(unsigned long)ctx->data; if ((m + 1) > data) { ret = bpf_xdp_adjust_meta(ctx, -(int)sizeof(*)); if (ret < 0) return XDP_ABORTED; data = (void *)(unsigned long)ctx->data; m = (void *)(long)ctx->data_meta; if ((m + 1) > data) return XDP_ABORTED; m->btf_id = -1; // not valid } // ... bpf_redirect_map() to socket } Would you agree? Or maybe expand a bit why you need the length. With the btf_id as the last entry, I basically don't need the length. Ok! Good! :-) The dispatcher function, will based on the btf_id know the size of the struct. The C-code will basically type-cast the struct at the pointer offset and get access to the metadata at the known offsets. Make sense, and inline with my view! Using a btf_id make this independent of the NIC driver. E.g. we don't need to dereference the NIC driver struct to find what
Re: [ovs-dev] [PATCH] RFC: netdev-afxdp: Support for XDP metadata HW hints.
On Fri, 5 Mar 2021 14:56:02 +0100 Björn Töpel wrote: > On 2021-03-05 11:13, Jesper Dangaard Brouer wrote: > > > > Bjørn and Magnus please take a look at my questions inlined below. > > > > On Thu, 4 Mar 2021 10:27:05 -0800 > > William Tu wrote: > > > >> One big problem of netdev-afxdp is that there is no metadata support > >> from the hardware at all. For example, OVS netdev-afxdp has to do rxhash, > >> or TCP checksum in software, resulting in high performance overhead. > >> > >> A generic meta data type for XDP frame using BTF is proposed[1] and > >> there is sample implementation[2][3]. This patch experiments enabling > >> the XDP metadata, or called HW hints, and shows the potential performance > >> improvement. The patch uses only the rxhash value provided from HW, > >> so avoiding at the calculation of hash at lib/dpif-netdev.c: > >> if (!dp_packet_rss_valid(execute->packet)) { > >> dp_packet_set_rss_hash(execute->packet, > >> flow_hash_5tuple(execute->flow, 0)); > >> } > >> > >> Using '$ ovs-appctl dpif-netdev/pmd-stats-show', the 'avg processing > >> cycles per packet' drops from 402 to 272. More details below > >> > >> Reference: > >> -- > >> [1] https://www.kernel.org/doc/html/latest/bpf/btf.html > >> [2] > >> https://netdevconf.info/0x14/pub/slides/54/[1]%20XDP%20meta%20data%20acceleration.pdf > >> [3] > >> https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/log/?h=topic/xdp_metadata4 > >> > >> Testbed: > >> > >> Two Xeon E5-2620 v3 2.4GHz connected back-to-back using Mellanox > >> ConnectX-6Dx 25GbE. Before starting OVS, enable the MD by: > >> $ bpftool net xdp show > >> xdp: > >> enp2s0f0np0(4) md_btf_id(1) md_btf_enabled(0) > >> enp2s0f1np1(5) md_btf_id(2) md_btf_enabled(0) > >> $ bpftool net xdp set dev enp2s0f0np0 md_btf on > >> $ bpftool net xdp > >> xdp: > >> enp2s0f0np0(4) md_btf_id(1) md_btf_enabled(1) > >> > >> Limitations/TODO: > >> - > >> 1. Support only AF_XDP native mode, not zero-copy mode. > >> 2. Currently only three fields: vlan, hash, and flow_mark, and only receive > >> side supports XDP metadata. > >> 3. Control plane, how to enable and probe the structure, not upstream yet. > >> > >> OVS rxdrop without HW hints: > >> --- > >> Drop rate: 4.8Mpps > >> > >> pmd thread numa_id 0 core_id 3: > >>packets received: 196592006 > >>packet recirculations: 0 > >>avg. datapath passes per packet: 1.00 > >>emc hits: 196592006 > >>smc hits: 0 > >>megaflow hits: 0 > >>avg. subtable lookups per megaflow hit: 0.00 > >>miss with success upcall: 0 > >>miss with failed upcall: 0 > >>avg. packets per output batch: 0.00 > >>idle cycles: 56009063835 (41.43%) > >>processing cycles: 79164971931 (58.57%) > >>avg cycles per packet: 687.59 (135174035766/196592006) > >>avg processing cycles per packet: 402.69 (79164971931/196592006) > >> > >> pmd thread numa_id 0 core_id 3: > >>Iterations: 339607649 (0.23 us/it) > >>- Used TSC cycles: 188620512777 ( 99.9 % of total cycles) > >>- idle iterations:330697002 ( 40.3 % of used cycles) > >>- busy iterations: 8910647 ( 59.7 % of used cycles) > >>Rx packets: 285140031 (3624 Kpps, 395 cycles/pkt) > >>Datapath passes: 285140031 (1.00 passes/pkt) > >>- EMC hits: 28513 (100.0 %) > >>- SMC hits: 0 ( 0.0 %) > >>- Megaflow hits: 0 ( 0.0 %, 0.00 subtbl lookups/hit) > >>- Upcalls:0 ( 0.0 %, 0.0 us/upcall) > >>- Lost upcalls: 0 ( 0.0 %) > >>Tx packets: 0 > >> > >> Perf report: > >>17.56% pmd-c03/id:11 ovs-vswitchd[.] netdev_afxdp_rxq_recv > >>14.39% pmd-c03/id:11 ovs-vswitchd[.] > >> dp_netdev_process_rxq_port > >>14.17% pmd-c03/id:11 ovs-vswitchd[.] pmd_thread_main > >>10.86% pmd-c03/id:11 [vdso] [.] __vdso_clock_gettime > >>10.19% pmd-c03/id:11 ovs-vswitchd[.] pmd_perf_end_iteration > >> 7.71% pmd-c03/id:11 ovs-vswitchd[.] time_timespec__ > >> 5.64% pmd-c03/id:11 ovs-vswitchd[.] time_usec > >> 3.88% pmd-c03/id:11 ovs-vswitchd[.] netdev_get_class > >> 2.95% pmd-c03/id:11 ovs-vswitchd[.] netdev_rxq_recv > >> 2.78% pmd-c03/id:11 libbpf.so.0.2.0 [.] xsk_socket__fd > >> 2.74% pmd-c03/id:11 ovs-vswitchd[.] pmd_perf_start_iteration > >> 2.11% pmd-c03/id:11 libc-2.27.so[.] __clock_gettime > >> 1.32% pmd-c03/id:11 ovs-vswitchd[.] xsk_socket__fd@plt > >> > >> OVS rxdrop with HW hints: > >> - > >> rxdrop rate: 4.73Mpps > >> > >> pmd thread numa_id 0 core_id 7: > >>packets received: 13686880 > >>packet recirculations: 0 > >>avg. datapath passes per packet: 1.00 > >>emc hits:
Re: [ovs-dev] [PATCH] RFC: netdev-afxdp: Support for XDP metadata HW hints.
On 2021-03-05 11:13, Jesper Dangaard Brouer wrote: Bjørn and Magnus please take a look at my questions inlined below. On Thu, 4 Mar 2021 10:27:05 -0800 William Tu wrote: One big problem of netdev-afxdp is that there is no metadata support from the hardware at all. For example, OVS netdev-afxdp has to do rxhash, or TCP checksum in software, resulting in high performance overhead. A generic meta data type for XDP frame using BTF is proposed[1] and there is sample implementation[2][3]. This patch experiments enabling the XDP metadata, or called HW hints, and shows the potential performance improvement. The patch uses only the rxhash value provided from HW, so avoiding at the calculation of hash at lib/dpif-netdev.c: if (!dp_packet_rss_valid(execute->packet)) { dp_packet_set_rss_hash(execute->packet, flow_hash_5tuple(execute->flow, 0)); } Using '$ ovs-appctl dpif-netdev/pmd-stats-show', the 'avg processing cycles per packet' drops from 402 to 272. More details below Reference: -- [1] https://www.kernel.org/doc/html/latest/bpf/btf.html [2] https://netdevconf.info/0x14/pub/slides/54/[1]%20XDP%20meta%20data%20acceleration.pdf [3] https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/log/?h=topic/xdp_metadata4 Testbed: Two Xeon E5-2620 v3 2.4GHz connected back-to-back using Mellanox ConnectX-6Dx 25GbE. Before starting OVS, enable the MD by: $ bpftool net xdp show xdp: enp2s0f0np0(4) md_btf_id(1) md_btf_enabled(0) enp2s0f1np1(5) md_btf_id(2) md_btf_enabled(0) $ bpftool net xdp set dev enp2s0f0np0 md_btf on $ bpftool net xdp xdp: enp2s0f0np0(4) md_btf_id(1) md_btf_enabled(1) Limitations/TODO: - 1. Support only AF_XDP native mode, not zero-copy mode. 2. Currently only three fields: vlan, hash, and flow_mark, and only receive side supports XDP metadata. 3. Control plane, how to enable and probe the structure, not upstream yet. OVS rxdrop without HW hints: --- Drop rate: 4.8Mpps pmd thread numa_id 0 core_id 3: packets received: 196592006 packet recirculations: 0 avg. datapath passes per packet: 1.00 emc hits: 196592006 smc hits: 0 megaflow hits: 0 avg. subtable lookups per megaflow hit: 0.00 miss with success upcall: 0 miss with failed upcall: 0 avg. packets per output batch: 0.00 idle cycles: 56009063835 (41.43%) processing cycles: 79164971931 (58.57%) avg cycles per packet: 687.59 (135174035766/196592006) avg processing cycles per packet: 402.69 (79164971931/196592006) pmd thread numa_id 0 core_id 3: Iterations: 339607649 (0.23 us/it) - Used TSC cycles: 188620512777 ( 99.9 % of total cycles) - idle iterations:330697002 ( 40.3 % of used cycles) - busy iterations: 8910647 ( 59.7 % of used cycles) Rx packets: 285140031 (3624 Kpps, 395 cycles/pkt) Datapath passes: 285140031 (1.00 passes/pkt) - EMC hits: 28513 (100.0 %) - SMC hits: 0 ( 0.0 %) - Megaflow hits: 0 ( 0.0 %, 0.00 subtbl lookups/hit) - Upcalls:0 ( 0.0 %, 0.0 us/upcall) - Lost upcalls: 0 ( 0.0 %) Tx packets: 0 Perf report: 17.56% pmd-c03/id:11 ovs-vswitchd[.] netdev_afxdp_rxq_recv 14.39% pmd-c03/id:11 ovs-vswitchd[.] dp_netdev_process_rxq_port 14.17% pmd-c03/id:11 ovs-vswitchd[.] pmd_thread_main 10.86% pmd-c03/id:11 [vdso] [.] __vdso_clock_gettime 10.19% pmd-c03/id:11 ovs-vswitchd[.] pmd_perf_end_iteration 7.71% pmd-c03/id:11 ovs-vswitchd[.] time_timespec__ 5.64% pmd-c03/id:11 ovs-vswitchd[.] time_usec 3.88% pmd-c03/id:11 ovs-vswitchd[.] netdev_get_class 2.95% pmd-c03/id:11 ovs-vswitchd[.] netdev_rxq_recv 2.78% pmd-c03/id:11 libbpf.so.0.2.0 [.] xsk_socket__fd 2.74% pmd-c03/id:11 ovs-vswitchd[.] pmd_perf_start_iteration 2.11% pmd-c03/id:11 libc-2.27.so[.] __clock_gettime 1.32% pmd-c03/id:11 ovs-vswitchd[.] xsk_socket__fd@plt OVS rxdrop with HW hints: - rxdrop rate: 4.73Mpps pmd thread numa_id 0 core_id 7: packets received: 13686880 packet recirculations: 0 avg. datapath passes per packet: 1.00 emc hits: 13686880 smc hits: 0 megaflow hits: 0 avg. subtable lookups per megaflow hit: 0.00 miss with success upcall: 0 miss with failed upcall: 0 avg. packets per output batch: 0.00 idle cycles: 3182105544 (46.02%) processing cycles: 3732023844 (53.98%) avg cycles per packet: 505.16 (6914129388/13686880) avg processing cycles per packet: 272.67 (3732023844/13686880) pmd thread numa_id 0 core_id 7: Iterations: 392909539 (0.18 us/it) - Used TSC cycles: 167697342678 ( 99.9 % of total cycles) - idle iterations:382539861 ( 46.0 % of used cycles)
[ovs-dev] [PATCH ovn v2 9/9] tests: Test with SSL and RBAC for controller by default
To help ourself to not forget updating RBAC rules when we land changes to existing functionality and new features we must enable SSL+RBAC on the `ovn-controller` <-> SB DB connection for builds with OpenSSL enabled. Signed-off-by: Frode Nordahl --- tests/automake.mk | 9 +++-- tests/ofproto-macros.at | 12 tests/ovn-macros.at | 38 -- tests/ovn-northd.at | 6 +++--- 4 files changed, 58 insertions(+), 7 deletions(-) diff --git a/tests/automake.mk b/tests/automake.mk index 771dddea2..ba8567da4 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -237,7 +237,10 @@ FLAKE8_PYFILES += $(CHECK_PYFILES) if HAVE_OPENSSL OVS_PKI_DIR = $(CURDIR)/tests/pki -TESTPKI_CNS = test test2 +# NOTE: Certificate generation has to be done serially, and each one adds a few +# seconds to the test run. Please try to re-use one of the many CNs already +# used in the existing tests. +TESTPKI_CNS = test test2 main hv hv-foo hv1 hv2 hv3 hv4 hv5 hv6 hv7 hv8 hv9 hv10 hv-1 hv-2 hv-10-1 hv-10-2 hv-20-1 hv-20-2 vtep hv_gw pbr-hv gw1 gw2 gw3 gw4 gw5 ext1 TESTPKI_FILES = $(shell \ for cn in $(TESTPKI_CNS); do \ echo tests/testpki-$$cn-cert.pem ; \ @@ -262,9 +265,11 @@ tests/pki/stamp: $(AM_V_at)rm -f tests/pki/stamp $(AM_V_at)rm -rf tests/pki $(AM_V_GEN)$(OVS_PKI) init && \ + cd tests/pki && \ for cn in $(TESTPKI_CNS); do \ - $(OVS_PKI) req+sign tests/pki/$$cn; \ + $(OVS_PKI) -u req+sign $$cn; \ done && \ + cd ../../ && \ : > tests/pki/stamp CLEANFILES += tests/ovs-pki.log diff --git a/tests/ofproto-macros.at b/tests/ofproto-macros.at index 3d7ac08b3..23d793a95 100644 --- a/tests/ofproto-macros.at +++ b/tests/ofproto-macros.at @@ -104,6 +104,7 @@ start_daemon () { # # sim_add hv0 # as hv0 ovs-vsctl add-br br0 +PKIDIR="$(cd $abs_top_builddir/tests && pwd)" sims= sim_add () { echo "adding simulator '$1'" @@ -126,6 +127,17 @@ sim_add () { # Start ovs-vswitchd as $1 start_daemon ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif -vunixctl as $1 ovs-appctl vlog/disable-rate-limit vconn + if test X$HAVE_OPENSSL = Xyes; then + if test -f $PKIDIR/testpki-$1-privkey.pem; then + as $1 ovs-vsctl set-ssl \ +$PKIDIR/testpki-$1-privkey.pem \ +$PKIDIR/testpki-$1-cert.pem \ +$PKIDIR/testpki-cacert.pem \ +|| return 1 + else + echo "WARNING: No certificate created for sim '$1', check TESTPKI_CNS variable in tests/automake.mk" + fi + fi } # "as $1" sets the OVS_*DIR environment variables to point to $ovs_base/$1. diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at index be8114de2..25f3dbe34 100644 --- a/tests/ovn-macros.at +++ b/tests/ovn-macros.at @@ -124,7 +124,18 @@ ovn_init_db () { mkdir "$d" || return 1 : > "$d"/.$1.db.~lock~ as $as_d ovsdb-tool create "$d"/$1.db "$abs_top_srcdir"/$1.ovsschema -as $as_d start_daemon ovsdb-server -vjsonrpc --remote=punix:"$d"/$1.sock "$d"/$1.db + +local remote_in_db= +if test X$HAVE_OPENSSL = Xyes -a X"$1" = X"ovn-sb"; then +remote_in_db="--remote=db:OVN_Southbound,SB_Global,connections --private-key=$PKIDIR/testpki-test-privkey.pem --certificate=$PKIDIR/testpki-test-cert.pem --ca-cert=$PKIDIR/testpki-cacert.pem" +fi + +as $as_d start_daemon ovsdb-server \ +-vjsonrpc \ +--remote=punix:"$d"/$1.sock \ +$remote_in_db \ +"$d"/$1.db + local var=`echo $1_db | tr a-z- A-Z_` AS_VAR_SET([$var], [unix:"$d"/$1.sock]); export $var } @@ -193,6 +204,24 @@ ovn_start () { ovn_start_northd backup $AZ fi +if test X$HAVE_OPENSSL = Xyes; then +# Create the SB DB pssl+RBAC connection. Ideally we could pre-create +# SB_Global and Connection with ovsdb-tool transact at DB creation +# time, but unfortunately that does not work, northd-ddlog will replace +# the SB_Global record on startup. +ovn-sbctl \ +-- --id=@c create connection \ +target=\"pssl:0:127.0.0.1\" role=ovn-controller \ +-- add SB_Global . connections @c +local d=$ovs_base +if test -n "$AZ"; then +d=$d/$AZ +fi +PARSE_LISTENING_PORT([$d/ovn-sb/ovsdb-server.log], [TCP_PORT]) +var="SSL_OVN_SB_DB" +AS_VAR_SET([$var], [ssl:127.0.0.1:$TCP_PORT]); export $var +fi + if test -n "$AZ"; then ovn-nbctl --wait=sb sync || exit $? @@ -257,11 +286,16 @@ ovn_az_attach() { local ovn_remote if test X"$az" = XNONE; then -ovn_remote=unix:$ovs_base/ovn-sb/ovn-sb.sock +if test X$HAVE_OPENSSL = Xyes; then +ovn_remote=$SSL_OVN_SB_DB +else +ovn_remote=unix:$ovs_base/ovn-sb/ovn-sb.sock +fi else ovn_remote=unix:$ovs_base/$az/ovn-sb/ovn-sb.sock
[ovs-dev] [PATCH ovn v2 8/9] tests: Make certificate generation extendable
In preparation for enabling testing with SSL and RBAC enabled by default, rework the certificate generation so that we can easily add generation of more certificates/CN on demand. A side erffect of the change is a more generic naming scheme for the certificate files so the patch also contains an update to existing tests so that they use the new filenames. Signed-off-by: Frode Nordahl --- tests/automake.mk | 48 ++- tests/ovn.at | 48 +++ 2 files changed, 46 insertions(+), 50 deletions(-) diff --git a/tests/automake.mk b/tests/automake.mk index df6d0a2a9..771dddea2 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -236,39 +236,35 @@ PYCOV_CLEAN_FILES += $(CHECK_PYFILES:.py=.py,cover) .coverage FLAKE8_PYFILES += $(CHECK_PYFILES) if HAVE_OPENSSL -TESTPKI_FILES = \ - tests/testpki-cacert.pem \ - tests/testpki-cert.pem \ - tests/testpki-privkey.pem \ - tests/testpki-req.pem \ - tests/testpki-cert2.pem \ - tests/testpki-privkey2.pem \ - tests/testpki-req2.pem +OVS_PKI_DIR = $(CURDIR)/tests/pki +TESTPKI_CNS = test test2 +TESTPKI_FILES = $(shell \ + for cn in $(TESTPKI_CNS); do \ + echo tests/testpki-$$cn-cert.pem ; \ + echo tests/testpki-$$cn-privkey.pem ; \ + echo tests/testpki-$$cn-req.pem ; \ + done) + +tests/testpki-cacert.pem: tests/pki/stamp + $(AM_V_GEN)cp $(OVS_PKI_DIR)/switchca/cacert.pem $@ + +$(TESTPKI_FILES): tests/pki/stamp + $(AM_V_GEN)cp $(OVS_PKI_DIR)/$(notdir $(subst testpki-,,$@)) $@ + +check_DATA += tests/testpki-cacert.pem check_DATA += $(TESTPKI_FILES) +CLEANFILES += tests/testpki-cacert.pem CLEANFILES += $(TESTPKI_FILES) -tests/testpki-cacert.pem: tests/pki/stamp - $(AM_V_GEN)cp tests/pki/switchca/cacert.pem $@ -tests/testpki-cert.pem: tests/pki/stamp - $(AM_V_GEN)cp tests/pki/test-cert.pem $@ -tests/testpki-req.pem: tests/pki/stamp - $(AM_V_GEN)cp tests/pki/test-req.pem $@ -tests/testpki-privkey.pem: tests/pki/stamp - $(AM_V_GEN)cp tests/pki/test-privkey.pem $@ -tests/testpki-cert2.pem: tests/pki/stamp - $(AM_V_GEN)cp tests/pki/test2-cert.pem $@ -tests/testpki-req2.pem: tests/pki/stamp - $(AM_V_GEN)cp tests/pki/test2-req.pem $@ -tests/testpki-privkey2.pem: tests/pki/stamp - $(AM_V_GEN)cp tests/pki/test2-privkey.pem $@ - -OVS_PKI = $(SHELL) $(ovs_srcdir)/utilities/ovs-pki.in --dir=tests/pki --log=tests/ovs-pki.log + +OVS_PKI = $(SHELL) $(ovs_srcdir)/utilities/ovs-pki.in --dir=$(OVS_PKI_DIR) --log=tests/ovs-pki.log tests/pki/stamp: $(AM_V_at)rm -f tests/pki/stamp $(AM_V_at)rm -rf tests/pki $(AM_V_GEN)$(OVS_PKI) init && \ - $(OVS_PKI) req+sign tests/pki/test && \ - $(OVS_PKI) req+sign tests/pki/test2 && \ + for cn in $(TESTPKI_CNS); do \ + $(OVS_PKI) req+sign tests/pki/$$cn; \ + done && \ : > tests/pki/stamp CLEANFILES += tests/ovs-pki.log diff --git a/tests/ovn.at b/tests/ovn.at index ca9623fee..5cd8b34d7 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -8810,8 +8810,8 @@ AT_CHECK( start_daemon ovsdb-server --remote=punix:ovn-sb.sock \ --remote=db:OVN_Southbound,SB_Global,connections \ - --private-key="$PKIDIR/testpki-privkey2.pem" \ - --certificate="$PKIDIR/testpki-cert2.pem" \ + --private-key="$PKIDIR/testpki-test2-privkey.pem" \ + --certificate="$PKIDIR/testpki-test2-cert.pem" \ --ca-cert="$PKIDIR/testpki-cacert.pem" \ ovn-sb.db @@ -8819,20 +8819,20 @@ PARSE_LISTENING_PORT([ovsdb-server.log], [TCP_PORT]) # read-only accesses should succeed AT_CHECK([ovn-sbctl --db=ssl:127.0.0.1:$TCP_PORT \ ---private-key=$PKIDIR/testpki-privkey.pem \ ---certificate=$PKIDIR/testpki-cert.pem \ +--private-key=$PKIDIR/testpki-test-privkey.pem \ +--certificate=$PKIDIR/testpki-test-cert.pem \ --ca-cert=$PKIDIR/testpki-cacert.pem \ list SB_Global], [0], [stdout], [ignore]) AT_CHECK([ovn-sbctl --db=ssl:127.0.0.1:$TCP_PORT \ ---private-key=$PKIDIR/testpki-privkey.pem \ ---certificate=$PKIDIR/testpki-cert.pem \ +--private-key=$PKIDIR/testpki-test-privkey.pem \ +--certificate=$PKIDIR/testpki-test-cert.pem \ --ca-cert=$PKIDIR/testpki-cacert.pem \ list Connection], [0], [stdout], [ignore]) # write access should fail AT_CHECK([ovn-sbctl --db=ssl:127.0.0.1:$TCP_PORT \ ---private-key=$PKIDIR/testpki-privkey.pem \ ---certificate=$PKIDIR/testpki-cert.pem \ +
[ovs-dev] [PATCH ovn v2 6/9] tests: Amend release stale port binding test for RBAC
The current version of the test attempts to simulate chassis registration prior to starting `ovn-controller`, however it does not set the `hostname` field. The RBAC role for `ovn-controller` does not allow for a chassis to change its own name or hostname, which makes sense as this is used for authentication. Update the test to set the `hostname` field when simulating chassis registration so that `ovn-controller` does not attempt to update it and subsequently make the test fail. Fixes b6b3823d4 ("ovn-controller: Fix I-P for SB Port_Binding and OVS Interface") Signed-off-by: Frode Nordahl --- tests/ovn.at | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/ovn.at b/tests/ovn.at index bec593dcc..ca9623fee 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -21572,7 +21572,7 @@ ovn-nbctl --wait=sb lsp-add ls1 lsp1 # Simulate the fact that lsp1 had been previously bound on hv1. ovn-sbctl --id=@e create encap chassis_name=hv1 ip="192.168.0.1" type="geneve" \ --- --id=@c create chassis name=hv1 encaps=@e \ +-- --id=@c create chassis hostname=hv1 name=hv1 encaps=@e \ -- set Port_Binding lsp1 chassis=@c as hv1 -- 2.30.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v2 2/9] northd: Add missing RBAC rules for FDB table
The recently added FDB table did not get its RBAC rules which would prohibit a `ovn-controller` from updating it with RBAC enabled. Fixes: 6ec3b1259 ("MAC learning: Add a new FDB table in southbound db") Signed-off-by: Frode Nordahl --- northd/ovn-northd.c | 13 + 1 file changed, 13 insertions(+) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index bafcb51e9..bb8f3032c 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -13248,6 +13248,11 @@ static const char *rbac_encap_auth[] = static const char *rbac_encap_update[] = {"type", "options", "ip"}; +static const char *rbac_fdb_auth[] = +{""}; +static const char *rbac_fdb_update[] = +{"dp_key", "mac", "port_key"}; + static const char *rbac_port_binding_auth[] = {""}; static const char *rbac_port_binding_update[] = @@ -13300,6 +13305,14 @@ static struct rbac_perm_cfg { .update = rbac_encap_update, .n_update = ARRAY_SIZE(rbac_encap_update), .row = NULL +},{ +.table = "FDB", +.auth = rbac_fdb_auth, +.n_auth = ARRAY_SIZE(rbac_fdb_auth), +.insdel = true, +.update = rbac_fdb_update, +.n_update = ARRAY_SIZE(rbac_fdb_update), +.row = NULL },{ .table = "Port_Binding", .auth = rbac_port_binding_auth, -- 2.30.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v2 7/9] tests: Use ovn_start in tests/ovn-controller.at
The current version of the tests only initializes the SB DB and instruments it directly. This does not work with SSL+RBAC as northd must run to program the RBAC rules into the SB DB. Run tests both for C and ddlog version of northd. Add workaround for ovn-controller not re-reading certificates to 'ovn-controller - Chassis other_config' test. Signed-off-by: Frode Nordahl --- tests/ovn-controller.at | 50 - 1 file changed, 44 insertions(+), 6 deletions(-) diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at index 2cd3e261f..1dd1553cd 100644 --- a/tests/ovn-controller.at +++ b/tests/ovn-controller.at @@ -1,8 +1,9 @@ AT_BANNER([ovn-controller]) +OVN_FOR_EACH_NORTHD([ AT_SETUP([ovn-controller - ovn-bridge-mappings]) AT_KEYWORDS([ovn]) -ovn_init_db ovn-sb +ovn_start net_add n1 sim_add hv as hv @@ -54,6 +55,14 @@ check_bridge_mappings () { OVS_WAIT_UNTIL([test x"${local_mappings}" = x$(ovn-sbctl get Chassis ${sysid} other_config:ovn-bridge-mappings | sed -e 's/\"//g')]) } +# NOTE: This test originally ran with only the SB-DB and no northd. For the +# test to be successfull with SSL+RBAC we need to initially run northd to get +# the RBAC rules programmed into the SB-DB. The test instruments the SB-DB +# directly and we need to stop northd to avoid overwriting the instrumentation. +kill `cat northd/ovn-northd.pid` +kill `cat northd-backup/ovn-northd.pid` +kill `cat ovn-nb/ovsdb-server.pid` + # Initially there should be no patch ports. check_patches @@ -116,12 +125,14 @@ as ovn-sb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) AT_CLEANUP +]) # Checks that ovn-controller populates datapath-type and iface-types # correctly in the Chassis other_config column. +OVN_FOR_EACH_NORTHD([ AT_SETUP([ovn-controller - Chassis other_config]) AT_KEYWORDS([ovn]) -ovn_init_db ovn-sb +ovn_start net_add n1 sim_add hv @@ -192,7 +203,21 @@ OVS_WAIT_UNTIL([ # chassis_private records. Until that happens ovn-controller fails to # create the records due to constraint violation on the Encap table. sysid=${sysid}-foo -ovs-vsctl set Open_vSwitch . external-ids:system-id="${sysid}" +current_remote=`ovs-vsctl get Open_vSwitch . external-ids:ovn-remote` +if test X$HAVE_OPENSSL = Xyes; then +# To change chassis name we need to change certificate with matching CN +ovs-vsctl set-ssl \ +$PKIDIR/testpki-${sysid}-privkey.pem \ +$PKIDIR/testpki-${sysid}-cert.pem \ +$PKIDIR/testpki-cacert.pem +# force reconnect which makes OVN controller read the new certificates +# TODO implement check for change of certificates in ovn-controller +# and remove this workaround. +ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=unix:/dev/null +fi +ovs-vsctl -- set Open_vSwitch . external-ids:hostname="${sysid}" \ + -- set Open_vSwitch . external-ids:system-id="${sysid}" \ + -- set Open_vSwitch . external-ids:ovn-remote="${current_remote}" OVS_WAIT_UNTIL([ grep -q 'Transaction causes multiple rows in \\"Encap\\" table to have identical values' hv/ovn-controller.log @@ -216,12 +241,14 @@ as ovn-sb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) AT_CLEANUP +]) # Checks that ovn-controller correctly maintains the mapping from the Encap # table in the Southbound database to OVS in the face of changes on both sides +OVN_FOR_EACH_NORTHD([ AT_SETUP([ovn-controller - change Encap properties]) AT_KEYWORDS([ovn]) -ovn_init_db ovn-sb +ovn_start net_add n1 sim_add hv @@ -271,11 +298,13 @@ as ovn-sb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) AT_CLEANUP +]) # Check ovn-controller connection status to Southbound database +OVN_FOR_EACH_NORTHD([ AT_SETUP([ovn-controller - check sbdb connection]) AT_KEYWORDS([ovn]) -ovn_init_db ovn-sb +ovn_start net_add n1 sim_add hv @@ -305,11 +334,13 @@ as ovn-sb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) AT_CLEANUP +]) # Checks that ovn-controller recreates its chassis record when deleted externally. +OVN_FOR_EACH_NORTHD([ AT_SETUP([ovn-controller - Chassis self record]) AT_KEYWORDS([ovn]) -ovn_init_db ovn-sb +ovn_start net_add n1 sim_add hv @@ -360,8 +391,10 @@ OVS_WAIT_UNTIL([test x0 = x`ovn-sbctl --columns nb_cfg --bare find chassis`]) OVN_CLEANUP([hv]) AT_CLEANUP +]) # Test unix command: debug/delay-nb-cfg-report +OVN_FOR_EACH_NORTHD([ AT_SETUP([ovn-controller - debug/delay-nb-cfg-report]) AT_KEYWORDS([ovn]) ovn_start @@ -393,7 +426,9 @@ AT_CHECK([ovn-nbctl --timeout=1 --wait=hv sync]) OVN_CLEANUP([hv]) AT_CLEANUP +]) +OVN_FOR_EACH_NORTHD([ AT_SETUP([ovn -- nb_cfg sync to OVS]) ovn_start @@ -414,7 +449,9 @@ OVS_WAIT_UNTIL([ovs-vsctl get Bridge br-int external_ids:ovn-nb-cfg], [0], [1]) OVN_CLEANUP([hv1]) AT_CLEANUP +]) +OVN_FOR_EACH_NORTHD([ AT_SETUP([ovn -- features]) AT_KEYWORDS([features]) ovn_start @@ -431,3 +468,4 @@ OVS_WAIT_UNTIL([ OVN_CLEANUP([hv1]) AT_CLEANUP +]) -- 2.30.0 ___
[ovs-dev] [PATCH ovn v2 1/9] northd: Amend RBAC rules for Port_Binding table
When `ovn-controller` claims a virtual lport it will update the Port_Binding table with which chassis currently has claimed the port as well as recording information about the virtual parent lport [0]. When `ovn-controller` claims a lport it will also update the encap field of the Port_Binding table if set and an update is needed. The current RBAC rules does not allow for these updates. 0: https://github.com/ovn-org/ovn/blob/b7b0fbdab03ce8b39d5bdc114876e6b0d0683892/controller/pinctrl.c#L6150 Fixes: 054f4c85c ("Add a new logical switch port type - 'virtual'") Fixes: 6c8b9a132 (" ovn-controller: Store the local port bindings in the runtime data I-P state") Reported-At: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1917475 Signed-off-by: Frode Nordahl --- northd/ovn-northd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index ac872aade..bafcb51e9 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -13251,7 +13251,7 @@ static const char *rbac_encap_update[] = static const char *rbac_port_binding_auth[] = {""}; static const char *rbac_port_binding_update[] = -{"chassis", "up"}; +{"chassis", "encap", "up", "virtual_parent"}; static const char *rbac_mac_binding_auth[] = {""}; -- 2.30.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v2 4/9] northd: Add Controller_Event RBAC rules
The use of the Controller_Event table does currently not work when RBAC is enabled. Fixes: be1eeb09d ("OVN: introduce Controller_Event table") Signed-off-by: Frode Nordahl --- northd/ovn-northd.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index f85a3dcff..c4a3f2383 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -13248,6 +13248,12 @@ static const char *rbac_encap_auth[] = static const char *rbac_encap_update[] = {"type", "options", "ip"}; +static const char *rbac_controller_event_auth[] = +{""}; +static const char *rbac_controller_event_update[] = +{"chassis", "event_info", "event_type", "seq_num"}; + + static const char *rbac_fdb_auth[] = {""}; static const char *rbac_fdb_update[] = @@ -13297,6 +13303,14 @@ static struct rbac_perm_cfg { .update = rbac_chassis_private_update, .n_update = ARRAY_SIZE(rbac_chassis_private_update), .row = NULL +},{ +.table = "Controller_Event", +.auth = rbac_controller_event_auth, +.n_auth = ARRAY_SIZE(rbac_controller_event_auth), +.insdel = true, +.update = rbac_controller_event_update, +.n_update = ARRAY_SIZE(rbac_controller_event_update), +.row = NULL },{ .table = "Encap", .auth = rbac_encap_auth, -- 2.30.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v2 0/9] Fix missing RBAC rules and enable testing
To help ourself to not forget updating RBAC rules when we land changes to existing functionality and new features we must enable SSL+RBAC on the `ovn-controller` <-> SB DB connection for builds with OpenSSL enabled. The series is structured with one commit per table where RBAC rules are fixed for the C version and one summary commit to update the northd-ddlog implementation. Then there are a few fixes to existing tests before finally enabling SSL+RBAC for all tests. This should allow for easier backports back to series where the respective tables / features first appeared. A successful testrun can be viewed at [0], in addittion I have done local testing with ovn-northd-ddlog. 0: https://github.com/fnordahl/ovn/actions/runs/624324890 Frode Nordahl (9): northd: Amend RBAC rules for Port_Binding table northd: Add missing RBAC rules for FDB table northd: Amend Chassis RBAC rules northd: Add Controller_Event RBAC rules northd-ddlog: Update RBAC rules tests: Amend release stale port binding test for RBAC tests: Use ovn_start in tests/ovn-controller.at tests: Make certificate generation extendable tests: Test with SSL and RBAC for controller by default northd/ovn-northd.c | 31 ++-- northd/ovn_northd.dl| 24 +-- tests/automake.mk | 53 + tests/ofproto-macros.at | 12 ++ tests/ovn-controller.at | 50 +- tests/ovn-macros.at | 38 +++-- tests/ovn-northd.at | 6 ++--- tests/ovn.at| 50 +++--- 8 files changed, 198 insertions(+), 66 deletions(-) -- 2.30.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v2 3/9] northd: Amend Chassis RBAC rules
The Transport Zones support does currently not work when RBAC is enabled. Fixes: 07d0d258d ("OVN: Add support for Transport Zones") Signed-off-by: Frode Nordahl --- northd/ovn-northd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index bb8f3032c..f85a3dcff 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -13236,7 +13236,7 @@ static const char *rbac_chassis_auth[] = {"name"}; static const char *rbac_chassis_update[] = {"nb_cfg", "external_ids", "encaps", "vtep_logical_switches", - "other_config"}; + "other_config", "transport_zones"}; static const char *rbac_chassis_private_auth[] = {"name"}; -- 2.30.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v2 5/9] northd-ddlog: Update RBAC rules
This patch summarizes a series of fixes to the C northd for missing or out of date RBAC rules and updates the DDlog version of Northd accordingly. Signed-off-by: Frode Nordahl --- northd/ovn_northd.dl | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 4482cffc0..8bc6dd9f6 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -1257,7 +1257,8 @@ sb::Out_RBAC_Permission ( .authorization = set_singleton("name"), .insert_delete = true, .update = ["nb_cfg", "external_ids", "encaps", - "vtep_logical_switches", "other_config"].to_set() + "vtep_logical_switches", "other_config", + "transport_zones"].to_set() ). sb::Out_RBAC_Permission ( @@ -1281,7 +1282,7 @@ sb::Out_RBAC_Permission ( .table = "Port_Binding", .authorization = set_singleton(""), .insert_delete = false, -.update = ["chassis", "up"].to_set() +.update = ["chassis", "encap", "up", "virtual_parent"].to_set() ). sb::Out_RBAC_Permission ( @@ -1308,6 +1309,23 @@ sb::Out_RBAC_Permission ( .update = ["address", "chassis", "datapath", "ports"].to_set() ). +sb::Out_RBAC_Permission ( +._uuid = 128'h2e5cbf3d_26f6_4f8a_9926_d6f77f61654f, +.table = "Controller_Event", +.authorization = set_singleton(""), +.insert_delete = true, +.update = ["chassis", "event_info", "event_type", + "seq_num"].to_set() +). + +sb::Out_RBAC_Permission ( +._uuid = 128'hb70964fc_322f_4ae5_aee4_ff6afadcc126, +.table = "FDB", +.authorization = set_singleton(""), +.insert_delete = true, +.update = ["dp_key", "mac", "port_key"].to_set() +). + /* * RBAC_Role: fixed */ @@ -1317,7 +1335,9 @@ sb::Out_RBAC_Role ( .permissions = [ "Chassis" -> 128'h7df3749a_1754_4a78_afa4_3abf526fe510, "Chassis_Private" -> 128'h07e623f7_137c_4a11_9084_3b3f89cb4a54, +"Controller_Event" -> 128'h2e5cbf3d_26f6_4f8a_9926_d6f77f61654f, "Encap" -> 128'h94bec860_431e_4d95_82e7_3b75d8997241, +"FDB" -> 128'hb70964fc_322f_4ae5_aee4_ff6afadcc126, "Port_Binding" -> 128'hd8ceff1a_2b11_48bd_802f_4a991aa4e908, "MAC_Binding" -> 128'h6ffdc696_8bfb_4d82_b620_a00d39270b2f, "Service_Monitor"-> 128'h39231c7e_4bf1_41d0_ada4_1d8a319c0da3] -- 2.30.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3] ofproto-dpif-xlate: fix zone set from non-frozen-metadata fields
On 27/02/2021 09:34, Peng He wrote: > CT zone could be set from a field that is not included in frozen > metadata. Consider the example rules which are typically seen in > OpenStack security group rules: > > priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0) > priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2 > > The zone is set from the first rule's ct action. These two rules will > generate two megaflows: the first one uses zone=5 to query the CT module, > the second one sets the zone-id from the first megaflow and commit to CT. > > The current implementation will generate a megaflow that does not use > ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is > set by an Imm not a field. > > Consider a situation that one changes the zone id (for example to 15) > in the first rule, however, still keep the second rule unchanged. During > this change, there is traffic hitting the two generated megaflows, the > revaldiator would revalidate all megaflows, however, the revalidator will > not change the second megaflow, because zone=5 is recorded in the > megaflow, so the xlate will still translate the commit action into zone=5, > and the new traffic will still commit to CT as zone=5, not zone=15, > resulting in taffic drops and other issues. > > Just like OVS set-field convention, if a field X is set by Y > (Y is a variable not an Imm), we should also mask Y as a match > in the generated megaflow. An exception is that if the zone-id is > set by the field that is included in the frozen state (i.e. regs) and this > upcall is a resume of a thawed xlate, the un-wildcarding can be skipped, > as the recirc_id is a hash of the values in these fields, and it will change > following the changes of these fields. When the recirc_id changes, > all megaflows with the old recirc id will be invalid later. This looks good to me and all the unit-tests pass. There is some trailing whitespace. You can run "./utilities/checkpatch.py" when submitting to catch them before the 0-day robot does. Its a bit of a nit and I don't know if this won't get committed because of it. That's a decision for a maintainer. > > Fixes: 07659514c3 ("Add support for connection tracking.") > Reported-by: Sai Su > Signed-off-by: Peng He > --- > include/openvswitch/meta-flow.h | 1 + > lib/meta-flow.c | 13 ++ > ofproto/ofproto-dpif-xlate.c| 12 + > tests/system-traffic.at | 45 + > 4 files changed, 71 insertions(+) > > diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h > index 95e52e358..045dce8f5 100644 > --- a/include/openvswitch/meta-flow.h > +++ b/include/openvswitch/meta-flow.h > @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *, >const union mf_value *mask, >struct flow *); > bool mf_is_tun_metadata(const struct mf_field *); > +bool mf_is_frozen_metadata(const struct mf_field *); > bool mf_is_pipeline_field(const struct mf_field *); > bool mf_is_set(const struct mf_field *, const struct flow *); > void mf_mask_field(const struct mf_field *, struct flow_wildcards *); > diff --git a/lib/meta-flow.c b/lib/meta-flow.c > index c808d205d..e03cd8d0c 100644 > --- a/lib/meta-flow.c > +++ b/lib/meta-flow.c > @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf) > mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS; > } > > +bool > +mf_is_frozen_metadata(const struct mf_field *mf) > +{ > +if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) { > +return true; > +} > + > +if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) { > +return true; > +} > +return false; > +} > + > bool > mf_is_pipeline_field(const struct mf_field *mf) > { > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > index 7108c8a30..14d00db1e 100644 > --- a/ofproto/ofproto-dpif-xlate.c > +++ b/ofproto/ofproto-dpif-xlate.c > @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct > ofpact_conntrack *ofc, > > if (ofc->zone_src.field) { > zone = mf_get_subfield(>zone_src, >xin->flow); > +if (ctx->xin->frozen_state) { > +/* If the upcall is a resume of a recirculation, we only need to > + * unwildcard the fields that are not in the frozen_metadata, as > + * when the rules update, OVS will generate a new recirc_id, > + * which will invalidate the megaflow with old the recirc_id. > + */ > +if (!mf_is_frozen_metadata(ofc->zone_src.field)) { > +mf_mask_field(ofc->zone_src.field, ctx->wc); > +} > +} else { > +mf_mask_field(ofc->zone_src.field, ctx->wc); > +} > } else { > zone = ofc->zone_imm; > } > diff --git a/tests/system-traffic.at
[ovs-dev] [PATCH] rhel: increase timeout of ovsdb-server.service to 300s
In some scenarios starting ovsdb-server takes more than 90 seconds and so ovsdb-server.service is marked as failed. Signed-off-by: Timothy Redaelli --- NOTE: Currently systemd will not start openvswitch.service and ovs-vswitchd.service if ovsdb-server.service is automatically restarted after a failure, but this is a feature that needs to be implemented in systemd itself (systemd#18856). --- rhel/usr_lib_systemd_system_ovsdb-server.service | 1 + 1 file changed, 1 insertion(+) diff --git a/rhel/usr_lib_systemd_system_ovsdb-server.service b/rhel/usr_lib_systemd_system_ovsdb-server.service index 98338b9df..ed6419f31 100644 --- a/rhel/usr_lib_systemd_system_ovsdb-server.service +++ b/rhel/usr_lib_systemd_system_ovsdb-server.service @@ -29,3 +29,4 @@ ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd stop ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd \ ${OVS_USER_OPT} \ --no-monitor restart $OPTIONS +TimeoutSec=300 -- 2.29.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v1] Static Routes: Add ability to specify "discard" nexthop
On 22/02/2021 19:40, svc.eng.git-m...@nutanix.com wrote: > From: karthik-kc Hi Karthik. Thanks for this. I have some comments below. Also, just to let you know, its needs a rebase and the UTs passed for me. > > Physical switches have the ability to specify "discard" or sometimes > "NULL interface" as a nexthop for routes. This can be used to prevent > routing loops in the network. Add a similar configuration for ovn > where nexthop accepts the string "discard". When the nexthop is discard > the action in the routing table will be to drop the packets. > > Signed-off-by: Karthik Chandrashekar > --- > northd/ovn-northd.c | 126 +++--- > ovn-nb.xml| 4 +- > tests/ovn-nbctl.at| 13 + > tests/ovn.at | 93 +++ > utilities/ovn-nbctl.c | 50 - > 5 files changed, 215 insertions(+), 71 deletions(-) > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > index 39d798782..18d0e0b43 100644 > --- a/northd/ovn-northd.c > +++ b/northd/ovn-northd.c > @@ -7749,6 +7749,7 @@ struct parsed_route { > uint32_t hash; > const struct nbrec_logical_router_static_route *route; > bool ecmp_symmetric_reply; > +bool is_discard_route; > }; > > static uint32_t > @@ -7768,20 +7769,23 @@ parsed_routes_add(struct ovs_list *routes, > /* Verify that the next hop is an IP address with an all-ones mask. */ > struct in6_addr nexthop; > unsigned int plen; > -if (!ip46_parse_cidr(route->nexthop, , )) { > -static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); > -VLOG_WARN_RL(, "bad 'nexthop' %s in static route" > - UUID_FMT, route->nexthop, > - UUID_ARGS(>header_.uuid)); > -return NULL; > -} > -if ((IN6_IS_ADDR_V4MAPPED() && plen != 32) || > -(!IN6_IS_ADDR_V4MAPPED() && plen != 128)) { > -static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); > -VLOG_WARN_RL(, "bad next hop mask %s in static route" > - UUID_FMT, route->nexthop, > - UUID_ARGS(>header_.uuid)); > -return NULL; > +bool is_discard_route = !strcmp(route->nexthop, "discard"); > +if (!is_discard_route) { > +if (!ip46_parse_cidr(route->nexthop, , )) { > +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); > +VLOG_WARN_RL(, "bad 'nexthop' %s in static route" > + UUID_FMT, route->nexthop, > + UUID_ARGS(>header_.uuid)); > +return NULL; > +} > +if ((IN6_IS_ADDR_V4MAPPED() && plen != 32) || > +(!IN6_IS_ADDR_V4MAPPED() && plen != 128)) { > +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); > +VLOG_WARN_RL(, "bad next hop mask %s in static route" > + UUID_FMT, route->nexthop, > + UUID_ARGS(>header_.uuid)); > +return NULL; > +} > } > > /* Parse ip_prefix */ > @@ -7795,13 +7799,15 @@ parsed_routes_add(struct ovs_list *routes, > } > > /* Verify that ip_prefix and nexthop have same address familiy. */ > -if (IN6_IS_ADDR_V4MAPPED() != IN6_IS_ADDR_V4MAPPED()) { > -static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); > -VLOG_WARN_RL(, "Address family doesn't match between 'ip_prefix' > %s" > - " and 'nexthop' %s in static route"UUID_FMT, > - route->ip_prefix, route->nexthop, > - UUID_ARGS(>header_.uuid)); > -return NULL; > +if (!is_discard_route) { > +if (IN6_IS_ADDR_V4MAPPED() != IN6_IS_ADDR_V4MAPPED()) > { > +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); > +VLOG_WARN_RL(, "Address family doesn't match between > 'ip_prefix' %s" > + " and 'nexthop' %s in static route"UUID_FMT, > + route->ip_prefix, route->nexthop, > + UUID_ARGS(>header_.uuid)); > +return NULL; > +} > } > > const struct nbrec_bfd *nb_bt = route->bfd; > @@ -7832,6 +7838,7 @@ parsed_routes_add(struct ovs_list *routes, > pr->route = route; > pr->ecmp_symmetric_reply = smap_get_bool(>options, > "ecmp_symmetric_reply", false); > +pr->is_discard_route = is_discard_route; > ovs_list_insert(routes, >list_node); > return pr; > } > @@ -8244,10 +8251,11 @@ build_ecmp_route_flow(struct hmap *lflows, struct > ovn_datapath *od, > } > > static void > -add_route(struct hmap *lflows, const struct ovn_port *op, > - const char *lrp_addr_s, const char *network_s, int plen, > - const char *gateway, bool is_src_route, > - const struct ovsdb_idl_row *stage_hint) > +add_route(struct hmap *lflows, struct ovn_datapath *od,
Re: [ovs-dev] [PATCH] RFC: netdev-afxdp: Support for XDP metadata HW hints.
Bjørn and Magnus please take a look at my questions inlined below. On Thu, 4 Mar 2021 10:27:05 -0800 William Tu wrote: > One big problem of netdev-afxdp is that there is no metadata support > from the hardware at all. For example, OVS netdev-afxdp has to do rxhash, > or TCP checksum in software, resulting in high performance overhead. > > A generic meta data type for XDP frame using BTF is proposed[1] and > there is sample implementation[2][3]. This patch experiments enabling > the XDP metadata, or called HW hints, and shows the potential performance > improvement. The patch uses only the rxhash value provided from HW, > so avoiding at the calculation of hash at lib/dpif-netdev.c: > if (!dp_packet_rss_valid(execute->packet)) { > dp_packet_set_rss_hash(execute->packet, >flow_hash_5tuple(execute->flow, 0)); > } > > Using '$ ovs-appctl dpif-netdev/pmd-stats-show', the 'avg processing > cycles per packet' drops from 402 to 272. More details below > > Reference: > -- > [1] https://www.kernel.org/doc/html/latest/bpf/btf.html > [2] > https://netdevconf.info/0x14/pub/slides/54/[1]%20XDP%20meta%20data%20acceleration.pdf > [3] > https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/log/?h=topic/xdp_metadata4 > > Testbed: > > Two Xeon E5-2620 v3 2.4GHz connected back-to-back using Mellanox > ConnectX-6Dx 25GbE. Before starting OVS, enable the MD by: > $ bpftool net xdp show > xdp: > enp2s0f0np0(4) md_btf_id(1) md_btf_enabled(0) > enp2s0f1np1(5) md_btf_id(2) md_btf_enabled(0) > $ bpftool net xdp set dev enp2s0f0np0 md_btf on > $ bpftool net xdp > xdp: > enp2s0f0np0(4) md_btf_id(1) md_btf_enabled(1) > > Limitations/TODO: > - > 1. Support only AF_XDP native mode, not zero-copy mode. > 2. Currently only three fields: vlan, hash, and flow_mark, and only receive >side supports XDP metadata. > 3. Control plane, how to enable and probe the structure, not upstream yet. > > OVS rxdrop without HW hints: > --- > Drop rate: 4.8Mpps > > pmd thread numa_id 0 core_id 3: > packets received: 196592006 > packet recirculations: 0 > avg. datapath passes per packet: 1.00 > emc hits: 196592006 > smc hits: 0 > megaflow hits: 0 > avg. subtable lookups per megaflow hit: 0.00 > miss with success upcall: 0 > miss with failed upcall: 0 > avg. packets per output batch: 0.00 > idle cycles: 56009063835 (41.43%) > processing cycles: 79164971931 (58.57%) > avg cycles per packet: 687.59 (135174035766/196592006) > avg processing cycles per packet: 402.69 (79164971931/196592006) > > pmd thread numa_id 0 core_id 3: > Iterations: 339607649 (0.23 us/it) > - Used TSC cycles: 188620512777 ( 99.9 % of total cycles) > - idle iterations:330697002 ( 40.3 % of used cycles) > - busy iterations: 8910647 ( 59.7 % of used cycles) > Rx packets: 285140031 (3624 Kpps, 395 cycles/pkt) > Datapath passes: 285140031 (1.00 passes/pkt) > - EMC hits: 28513 (100.0 %) > - SMC hits: 0 ( 0.0 %) > - Megaflow hits: 0 ( 0.0 %, 0.00 subtbl lookups/hit) > - Upcalls:0 ( 0.0 %, 0.0 us/upcall) > - Lost upcalls: 0 ( 0.0 %) > Tx packets: 0 > > Perf report: > 17.56% pmd-c03/id:11 ovs-vswitchd[.] netdev_afxdp_rxq_recv > 14.39% pmd-c03/id:11 ovs-vswitchd[.] dp_netdev_process_rxq_port > 14.17% pmd-c03/id:11 ovs-vswitchd[.] pmd_thread_main > 10.86% pmd-c03/id:11 [vdso] [.] __vdso_clock_gettime > 10.19% pmd-c03/id:11 ovs-vswitchd[.] pmd_perf_end_iteration >7.71% pmd-c03/id:11 ovs-vswitchd[.] time_timespec__ >5.64% pmd-c03/id:11 ovs-vswitchd[.] time_usec >3.88% pmd-c03/id:11 ovs-vswitchd[.] netdev_get_class >2.95% pmd-c03/id:11 ovs-vswitchd[.] netdev_rxq_recv >2.78% pmd-c03/id:11 libbpf.so.0.2.0 [.] xsk_socket__fd >2.74% pmd-c03/id:11 ovs-vswitchd[.] pmd_perf_start_iteration >2.11% pmd-c03/id:11 libc-2.27.so[.] __clock_gettime >1.32% pmd-c03/id:11 ovs-vswitchd[.] xsk_socket__fd@plt > > OVS rxdrop with HW hints: > - > rxdrop rate: 4.73Mpps > > pmd thread numa_id 0 core_id 7: > packets received: 13686880 > packet recirculations: 0 > avg. datapath passes per packet: 1.00 > emc hits: 13686880 > smc hits: 0 > megaflow hits: 0 > avg. subtable lookups per megaflow hit: 0.00 > miss with success upcall: 0 > miss with failed upcall: 0 > avg. packets per output batch: 0.00 > idle cycles: 3182105544 (46.02%) > processing cycles: 3732023844 (53.98%) > avg cycles per packet: 505.16 (6914129388/13686880) > avg processing cycles per packet: 272.67 (3732023844/13686880) > > pmd thread numa_id 0 core_id 7: > > Iterations: 392909539 (0.18 us/it)
Re: [ovs-dev] [PATCH ovn 1/2] northd-ddlog: Fix lb_force_snat_ip router option.
On Fri, Mar 5, 2021 at 2:15 AM Ben Pfaff wrote: > > On Wed, Mar 03, 2021 at 11:32:22PM +0530, num...@ovn.org wrote: > > From: Numan Siddique > > > > There were few typos because of which lflows related to > > router option lb_force_snat_ip were not generated correctly. > > > > This patch fixes it. > > > > Fixes: 0e77b3bcbfe("ovn-northd-ddlog: New implementation of ovn-northd > > based on ddlog.") > > CC: Ben Pfaff > > Signed-off-by: Numan Siddique > > Thank you for the fix! > > Acked-by: Ben Pfaff Thanks. I applied this patch to master. Numan > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v2] northd: Fix the missing force_snat_for_lb flows when router_ip is configured.
From: Numan Siddique The commit c6e21a23bd8 which supported the option 'lb_force_snat_ip=router_ip' on a gateway router, missed out on - updating the flows in 'lr_in_dnat' to set 'flags.force_snat_for_lb = 1'. - removing the flow to drop if ip.dst == router port IP in 'lr_in_ip_input' stage. This patch fixes these issue and adds a system test to cover the hairpin load balancer traffic for the gateway routers. Fixes: c6e21a23bd8("northd: Provide the Gateway router option 'lb_force_snat_ip' to take router port ips.") CC: Ben Pfaff Signed-off-by: Numan Siddique --- v1 -> v2 --- * Patch 1 of v1 is merged now. * Addressed the review comments from Ben in p2 of v1. - Added the missing lflow in northd-ddlog - Added the missing lflow for IPv6 scenarion which I missed in v1. - Changed to '.force_lb_snat = false'. northd/lrouter.dl | 13 +++- northd/ovn-northd.8.xml | 10 +++ northd/ovn-northd.c | 36 +++--- northd/ovn_northd.dl| 24 +-- tests/ovn-northd.at | 54 +++ tests/system-ovn.at | 141 6 files changed, 263 insertions(+), 15 deletions(-) diff --git a/northd/lrouter.dl b/northd/lrouter.dl index 8b8005b0c..1a7cb2d23 100644 --- a/northd/lrouter.dl +++ b/northd/lrouter.dl @@ -353,6 +353,10 @@ function lb_force_snat_router_ip(lr_options: Map): bool { lr_options.contains_key("chassis") } +function force_snat_for_lb(lr: nb::Logical_Router): bool { +not get_force_snat_ip(lr, "lb").is_empty() or lb_force_snat_router_ip(lr.options) +} + /* For each router, collect the set of IPv4 and IPv6 addresses used for SNAT, * which includes: * @@ -434,7 +438,8 @@ relation ( snat_ips: Map>, lbs:Vec>, mcast_cfg: Ref, -learn_from_arp_request: bool +learn_from_arp_request: bool, +force_lb_snat: bool, ) (.lr = lr, @@ -449,7 +454,8 @@ relation ( .snat_ips = snat_ips, .lbs= lbs, .mcast_cfg = mcast_cfg, -.learn_from_arp_request = learn_from_arp_request) :- +.learn_from_arp_request = learn_from_arp_request, +.force_lb_snat = force_lb_snat) :- lr in nb::Logical_Router(), lr.is_enabled(), LogicalRouterRedirectPort(lr._uuid, l3dgw_port), @@ -457,7 +463,8 @@ relation ( LogicalRouterLBs(lr._uuid, lbs), LogicalRouterSnatIPs(lr._uuid, snat_ips), mcast_cfg in (.datapath = lr._uuid), -var learn_from_arp_request = map_get_bool_def(lr.options, "always_learn_from_arp_request", true). +var learn_from_arp_request = map_get_bool_def(lr.options, "always_learn_from_arp_request", true), +var force_lb_snat = lb_force_snat_router_ip(lr.options). /* RouterLB: many-to-many relation between logical routers and nb::LB */ relation RouterLB(router: Ref, lb: Ref) diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml index a16937a21..c272cc922 100644 --- a/northd/ovn-northd.8.xml +++ b/northd/ovn-northd.8.xml @@ -2611,6 +2611,16 @@ icmp6 { with an action ct_snat; . + + If the Gateway router is configured with + lb_force_snat_ip=router_ip then for every logical router + port P attached to the Gateway router with the router ip + B, a priority-110 flow is added with the match + inport == P ip4.dst == B or + inport == P ip6.dst == B + with an action ct_snat; . + + If the Gateway router has been configured to force SNAT any previously load-balanced packets to B, a priority-100 flow diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index ac872aade..33b8d70d3 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -8576,7 +8576,7 @@ get_force_snat_ip(struct ovn_datapath *od, const char *key_type, static void add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od, struct ds *match, struct ds *actions, int priority, - bool lb_force_snat_ip, struct ovn_lb_vip *lb_vip, + bool force_snat_for_lb, struct ovn_lb_vip *lb_vip, const char *proto, struct nbrec_load_balancer *lb, struct shash *meter_groups, struct sset *nat_entries) { @@ -8585,7 +8585,7 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od, /* A match and actions for new connections. */ char *new_match = xasprintf("ct.new && %s", ds_cstr(match)); -if (lb_force_snat_ip) { +if (force_snat_for_lb) { char *new_actions = xasprintf("flags.force_snat_for_lb = 1; %s", ds_cstr(actions)); ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DNAT, priority, @@ -8598,7 +8598,7 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od, /* A match and actions for established connections. */ char *est_match = xasprintf("ct.est && %s",
Re: [ovs-dev] [PATCH v2 0/2] ovsdb-idl: Preserve references for tracked deleted rows.
On Wed, Mar 3, 2021 at 8:09 PM Dumitru Ceara wrote: > > The first patch of the series makes the ovsdb-idl tests more future > proof by trying to ensure more predictable output from test-ovsdb. > > The second patch of the series fixes a problem with the IDL change > tracking code for deleted records which was causing OVN to crash due to > strong reference fields in IDL records not being preserved at row > deletion. > > Changes in v2: > - Patch 1/2: > - reworked the patch to improve the output of test-ovsdb.c and > test-ovsdb.py themselves. > - Patch 2/2: > - added a test for strong references. > > Dumitru Ceara (2): > ovsdb-idl.at: Make test outputs more predictable. > ovsdb-idl: Preserve references for deleted rows. I'm not completely familiar with the tracking IDL code. The patch series looks good to me. For the whole series Acked-by: Numan Siddique Tested this patch locally. This patch series is applied on our internal openstack + ovn deployment since a couple of days (where the issue is seen). And we have not seen any crashes since then. Tested-by: Numan Siddique Thanks Numan > > > lib/ovsdb-idl.c |9 - > lib/ovsdb-idl.h |2 > tests/ovsdb-idl.at | 581 > ++- > tests/test-ovsdb.c | 231 ++-- > tests/test-ovsdb.py | 87 > 5 files changed, 556 insertions(+), 354 deletions(-) > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev