Re: [ovs-dev] [ovs-dev v7 1/3] ofproto-dpif-upcall: fix push_dp_ops

2022-12-15 Thread Peng He
Eelco Chaudron  于2022年12月13日周二 20:36写道:

>
>
> On 10 Dec 2022, at 1:37, Peng He wrote:
>
> > Patch v5 has statistics issues.
> >
> > In order to solve this issue, we had a discussion.
> >
> > below is the quote of the email.
> >
> > ”
> > After a second thought, I think maybe keeping INCONSISTENT just for the
> > modify error is a better option.
> >
> > With current patch:
> > 1.
> > the modify error case:
> > OPERATIONAL -> INCONSISTENT ->  EVICTING -> EVICTED
> > 2.
> > the delete error case:
> > EVICTING -> EVICTED
> >
> > Change both to INCONSISTENT:
> >
> > the modify error case:
> > did not change.
> >
> > the delete error case:
> > EVICTING -> INCONSISTENT -> EVICTED?
> >
> > “
> >
> > And we agree to take the second solution.
>
> I know, but going over the state meanings again, UKEY_EVICTING means the
> following:
>
>  /* Ukey is in umap, datapath flow delete is queued. */
>
> Which now no longer is the case, so should a new state not make more sense?
>

Why it's no longer valid?

In the patch, only modify failed ukey will be set to EVICTING, is it just
right fit the meaning of
EVICTING? (ukey in the umap, but delete operation is queued?)



>
> Any one else has some input on this??
>
> > Eelco Chaudron  于2022年12月8日周四 18:54写道:
> >
> >>
> >>
> >> On 27 Nov 2022, at 8:28, Peng He wrote:
> >>
> >>> push_dp_ops only handles delete ops errors but ignores the modify
> >>> ops results. It's better to handle all the dp operation errors in
> >>> a consistent way.
> >>>
> >>> We observe in the production environment that sometimes a megaflow
> >>> with wrong actions keep staying in datapath. The coverage command shows
> >>> revalidators have dumped several times, however the correct
> >>> actions are not set. This implies that the ukey's action does not
> >>> equal to the meagaflow's, i.e. revalidators think the underlying
> >>> megaflow's actions are correct however they are not.
> >>>
> >>> We also check the megaflow using the ofproto/trace command, and the
> >>> actions are not matched with the ones in the actual magaflow. By
> >>> performing a revalidator/purge command, the right actions are set.
> >>>
> >>> This patch prevents the inconsistency by considering modify failure
> >>> in revalidators.
> >>>
> >>> To note, we cannot perform two state transitions and change ukey_state
> >>> into UKEY_EVICTED directly here, because, if we do so, the
> >>> sweep will remove the ukey alone and leave dp flow alive. Later, the
> >>> dump will retrieve the dp flow and might even recover it. This will
> >>> contribute the stats of this dp flow twice.
> >>>
> >>> Signed-off-by: Peng He 
> >>> ---
> >>>  ofproto/ofproto-dpif-upcall.c | 34 +++---
> >>>  1 file changed, 23 insertions(+), 11 deletions(-)
> >>>
> >>> diff --git a/ofproto/ofproto-dpif-upcall.c
> >> b/ofproto/ofproto-dpif-upcall.c
> >>> index 7ad728adf..c2cefbeb8 100644
> >>> --- a/ofproto/ofproto-dpif-upcall.c
> >>> +++ b/ofproto/ofproto-dpif-upcall.c
> >>> @@ -2416,26 +2416,30 @@ push_dp_ops(struct udpif *udpif, struct ukey_op
> >> *ops, size_t n_ops)
> >>>
> >>>  for (i = 0; i < n_ops; i++) {
> >>>  struct ukey_op *op = [i];
> >>> -struct dpif_flow_stats *push, *stats, push_buf;
> >>> -
> >>> -stats = op->dop.flow_del.stats;
> >>> -push = _buf;
> >>> -
> >>> -if (op->dop.type != DPIF_OP_FLOW_DEL) {
> >>> -/* Only deleted flows need their stats pushed. */
> >>> -continue;
> >>> -}
> >>>
> >>>  if (op->dop.error) {
> >>> -/* flow_del error, 'stats' is unusable. */
> >>>  if (op->ukey) {
> >>>  ovs_mutex_lock(>ukey->mutex);
> >>> -transition_ukey(op->ukey, UKEY_EVICTED);
> >>> +if (op->dop.type == DPIF_OP_FLOW_DEL) {
> >>> +transition_ukey(op->ukey, UKEY_EVICTED);
> >>> +} else {
>
> I think we could use a comment here to make sure why we set it to
> evicting. Maybe just a reference to the comment in revalidator_sweep__()
> might be enough.
>
> >>> +transition_ukey(op->ukey, UKEY_EVICTING);
> >>> +}
> >>>  ovs_mutex_unlock(>ukey->mutex);
> >>>  }
> >>>  continue;
> >>>  }
> >>>
> >>> +if (op->dop.type != DPIF_OP_FLOW_DEL) {
> >>> +/* Only deleted flows need their stats pushed. */
> >>> +continue;
> >>> +}
> >>> +
> >>> +struct dpif_flow_stats *push, *stats, push_buf;
> >>> +
> >>> +stats = op->dop.flow_del.stats;
> >>> +push = _buf;
> >>> +
> >>>  if (op->ukey) {
> >>>  ovs_mutex_lock(>ukey->mutex);
> >>>  transition_ukey(op->ukey, UKEY_EVICTED);
> >>> @@ -2848,6 +2852,14 @@ revalidator_sweep__(struct revalidator
> >> *revalidator, bool purge)
> >>>  continue;
> >>>  }
> >>>  ukey_state = ukey->state;
> >>> +
> >>> +  

Re: [ovs-dev] [PATCH ovn 2/2] northd: Store skip_snat and force_snat in ct_label/mark

2022-12-15 Thread Ales Musil
On Fri, Dec 16, 2022 at 12:32 AM Dumitru Ceara  wrote:

> On 12/7/22 16:34, Ales Musil wrote:
> > In order to have the SNAT action available for
> > related traffic store the flag in CT register.
> >
> > Extend the ct_lb and ct_lb_mark action to
> > allow additional parameter that sets the
> > corresponding flag if needed in ct_label/ct_mark
> > register. This allows us to later on match on it
> > for related traffic and set the corresponding flags
> > fro additional flows.
> >
> > Currently only two flags are supported "skip_snat"
> > and "force_snat" which are mutually exclusive.
> >
> > Reported-at: https://bugzilla.redhat.com/2126083
> > Signed-off-by: Ales Musil 
> > ---
>
> Hi Ales,
>
> First of all thanks a lot for fixing this, I know it wasn't easy to
> debug!
>
> [...]
>
> > diff --git a/northd/northd.c b/northd/northd.c
> > index a16e8a8a7..cb5b14181 100644
> > --- a/northd/northd.c
> > +++ b/northd/northd.c
>
> [...]
>
> > @@ -10411,6 +10422,11 @@ build_lrouter_nat_flows_for_lb(struct
> ovn_lb_vip *lb_vip,
> >  bool reject = build_lb_vip_actions(lb_vip, vips_nb, action,
> > lb->selection_fields, false,
> > ct_lb_mark);
> > +bool drop = !!strncmp(ds_cstr(action), "ct_lb", strlen("ct_lb"));
> > +if (!drop) {
> > +/* Remove the trailing ");". */
> > +ds_truncate(action, action->length - 2);
> > +}
> >
> >  /* Higher priority rules are added for load-balancing in DNAT
> >   * table.  For every match (on a VIP[:port]), we add two flows.
> > @@ -10427,8 +10443,9 @@ build_lrouter_nat_flows_for_lb(struct ovn_lb_vip
> *lb_vip,
> >  }
> >
> >  if (lb->skip_snat) {
> > -skip_snat_new_action = xasprintf("flags.skip_snat_for_lb = 1;
> %s",
> > - ds_cstr(action));
> > +skip_snat_new_action = xasprintf("flags.skip_snat_for_lb = 1;
> %s%s",
> > + ds_cstr(action),
> > + drop ? "" : "; skip_snat);");
> >  skip_snat_est_action = xasprintf("flags.skip_snat_for_lb = 1; "
> >   "next;");
> >  }
> > @@ -10561,8 +10578,9 @@ build_lrouter_nat_flows_for_lb(struct ovn_lb_vip
> *lb_vip,
> >  skip_snat_new_action, est_match,
> >  skip_snat_est_action, lflows, prio, meter_groups);
> >
> > -char *new_actions = xasprintf("flags.force_snat_for_lb = 1; %s",
> > -  ds_cstr(action));
> > +char *new_actions = xasprintf("flags.force_snat_for_lb = 1; %s%s",
> > +  ds_cstr(action),
> > +  drop ? "" : "; force_snat);");
> >  build_gw_lrouter_nat_flows_for_lb(lb, gw_router_force_snat,
> >  n_gw_router_force_snat, reject, new_match,
> >  new_actions, est_match,
> > @@ -10577,6 +10595,10 @@ build_lrouter_nat_flows_for_lb(struct
> ovn_lb_vip *lb_vip,
> > lb_aff_force_snat_router,
> > n_lb_aff_force_snat_router);
> >
> > +if (!drop) {
> > +ds_put_cstr(action, ");");
> > +}
> > +
>
> I think I should've spoken earlier but I was a bit behind with reviewing
> this patch until now: this whole part seems a bit fragile because we
> make assumptions about how the string built by build_lb_vip_actions()
> looks like.  What if in the future the action string built by
> build_lb_vip_actions() doesn't start with "ct_lb.."?  What if it
> doesn't end with ");"?
>

> I also don't really like the 'drop' variable: it's almost always equal
> to 'reject' with one exception, when healthcheck is enabled and no
> backends are active but the LB VIP is not configured to reject on
> empty backend sets.
>
> But I won't block the release for this part.  Instead, I think we
> should find a way to refactor all this code in the near future so that
> all LB actions are built in a single place.  build_lb_vip_actions() was
> supposed to do that but now part of it is done outside the function.
> What do you think, does it sound feasible to you?
>

I agree it's not the best approach, the refactor I've prepared some time
ago might address some of those concerns.
The problem is that build_lb_vip_actions doesn't know the context so it's
hard to make decision if there should
be a flag or not. I don't know what would be the best solution for that,
but I'll try to think about the solution.
Hopefully I'll be able to include that in the refactor.


>
> I added Mark's ack and pushed the patch to main and to branch-22.12.
>
> Thanks,
> Dumitru
>
>
Thanks,
Ales

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA 

amu...@redhat.comIM: amusil

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn branch-22.12 1/2] Set release date for 22.12.0.

2022-12-15 Thread Dumitru Ceara
On 12/15/22 22:43, Mark Michelson wrote:
> Signed-off-by: Mark Michelson 
> ---
> I more-or-less randomized the time of day since this will be at some
> point early tomorrow for me. I plan to regenerate the patch with the
> correct time when I actually push the commit.
> ---
>  NEWS | 2 +-
>  debian/changelog | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index a19e0dcb5..226c67505 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,4 +1,4 @@
> -OVN v22.12.0 - xx xxx 
> +OVN v22.12.0 - 16 Dec 2022
>  --
>- Add load balancer "affinity_timeout" option to configure load balancing
>  of traffic from a particular client to the same backend for a given
> diff --git a/debian/changelog b/debian/changelog
> index 792467bf7..cda378c12 100644
> --- a/debian/changelog
> +++ b/debian/changelog
> @@ -2,7 +2,7 @@ ovn (22.12.0-1) unstable; urgency=low
>  
> * New upstream version
>  
> - -- OVN team   Mon, 28 Nov 2022 10:42:29 -0500
> + -- OVN team   Thu, 16 Dec 2022 09:07:31 -0500

You mean "Fri, 16 Dec", right? :)

Acked-by: Dumitru Ceara 

Thanks!

>  
>  ovn (22.09.0-1) unstable; urgency=low
>  

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn branch-22.12 2/2] Prepare for 22.12.1.

2022-12-15 Thread Dumitru Ceara
On 12/15/22 22:43, Mark Michelson wrote:
> Signed-off-by: Mark Michelson 
> ---
> I more-or-less randomized the time of day since this will be at some
> point early tomorrow for me. I plan to regenerate the patch with the
> correct time when I actually push the commit.
> ---
>  NEWS | 3 +++
>  configure.ac | 2 +-
>  debian/changelog | 6 ++
>  3 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/NEWS b/NEWS
> index 226c67505..b8a31a496 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,3 +1,6 @@
> +OVN v22.12.1 - xx xxx 
> +--
> +
>  OVN v22.12.0 - 16 Dec 2022
>  --
>- Add load balancer "affinity_timeout" option to configure load balancing
> diff --git a/configure.ac b/configure.ac
> index 101467253..357758e0c 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -13,7 +13,7 @@
>  # limitations under the License.
>  
>  AC_PREREQ(2.63)
> -AC_INIT(ovn, 22.12.0, b...@openvswitch.org)
> +AC_INIT(ovn, 22.12.1, b...@openvswitch.org)
>  AC_CONFIG_MACRO_DIR([m4])
>  AC_CONFIG_AUX_DIR([build-aux])
>  AC_CONFIG_HEADERS([config.h])
> diff --git a/debian/changelog b/debian/changelog
> index cda378c12..54305c26d 100644
> --- a/debian/changelog
> +++ b/debian/changelog
> @@ -1,3 +1,9 @@
> +OVN (22.12.1-1) unstable; urgency=low
> +   [ OVN team ]
> +   * New upstream version
> +
> + -- OVN team   Thu, 16 Dec 2022 09:07:31 -0500

You mean "Fri, 16 Dec", right? :)

Acked-by: Dumitru Ceara 

Thanks!

> +
>  ovn (22.12.0-1) unstable; urgency=low
>  
> * New upstream version

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 2/2] northd: Store skip_snat and force_snat in ct_label/mark

2022-12-15 Thread Dumitru Ceara
On 12/7/22 16:34, Ales Musil wrote:
> In order to have the SNAT action available for
> related traffic store the flag in CT register.
> 
> Extend the ct_lb and ct_lb_mark action to
> allow additional parameter that sets the
> corresponding flag if needed in ct_label/ct_mark
> register. This allows us to later on match on it
> for related traffic and set the corresponding flags
> fro additional flows.
> 
> Currently only two flags are supported "skip_snat"
> and "force_snat" which are mutually exclusive.
> 
> Reported-at: https://bugzilla.redhat.com/2126083
> Signed-off-by: Ales Musil 
> ---

Hi Ales,

First of all thanks a lot for fixing this, I know it wasn't easy to
debug!

[...]

> diff --git a/northd/northd.c b/northd/northd.c
> index a16e8a8a7..cb5b14181 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c

[...]

> @@ -10411,6 +10422,11 @@ build_lrouter_nat_flows_for_lb(struct ovn_lb_vip 
> *lb_vip,
>  bool reject = build_lb_vip_actions(lb_vip, vips_nb, action,
> lb->selection_fields, false,
> ct_lb_mark);
> +bool drop = !!strncmp(ds_cstr(action), "ct_lb", strlen("ct_lb"));
> +if (!drop) {
> +/* Remove the trailing ");". */
> +ds_truncate(action, action->length - 2);
> +}
>  
>  /* Higher priority rules are added for load-balancing in DNAT
>   * table.  For every match (on a VIP[:port]), we add two flows.
> @@ -10427,8 +10443,9 @@ build_lrouter_nat_flows_for_lb(struct ovn_lb_vip 
> *lb_vip,
>  }
>  
>  if (lb->skip_snat) {
> -skip_snat_new_action = xasprintf("flags.skip_snat_for_lb = 1; %s",
> - ds_cstr(action));
> +skip_snat_new_action = xasprintf("flags.skip_snat_for_lb = 1; %s%s",
> + ds_cstr(action),
> + drop ? "" : "; skip_snat);");
>  skip_snat_est_action = xasprintf("flags.skip_snat_for_lb = 1; "
>   "next;");
>  }
> @@ -10561,8 +10578,9 @@ build_lrouter_nat_flows_for_lb(struct ovn_lb_vip 
> *lb_vip,
>  skip_snat_new_action, est_match,
>  skip_snat_est_action, lflows, prio, meter_groups);
>  
> -char *new_actions = xasprintf("flags.force_snat_for_lb = 1; %s",
> -  ds_cstr(action));
> +char *new_actions = xasprintf("flags.force_snat_for_lb = 1; %s%s",
> +  ds_cstr(action),
> +  drop ? "" : "; force_snat);");
>  build_gw_lrouter_nat_flows_for_lb(lb, gw_router_force_snat,
>  n_gw_router_force_snat, reject, new_match,
>  new_actions, est_match,
> @@ -10577,6 +10595,10 @@ build_lrouter_nat_flows_for_lb(struct ovn_lb_vip 
> *lb_vip,
> lb_aff_force_snat_router,
> n_lb_aff_force_snat_router);
>  
> +if (!drop) {
> +ds_put_cstr(action, ");");
> +}
> +

I think I should've spoken earlier but I was a bit behind with reviewing
this patch until now: this whole part seems a bit fragile because we
make assumptions about how the string built by build_lb_vip_actions()
looks like.  What if in the future the action string built by
build_lb_vip_actions() doesn't start with "ct_lb.."?  What if it
doesn't end with ");"?

I also don't really like the 'drop' variable: it's almost always equal
to 'reject' with one exception, when healthcheck is enabled and no
backends are active but the LB VIP is not configured to reject on
empty backend sets.

But I won't block the release for this part.  Instead, I think we
should find a way to refactor all this code in the near future so that
all LB actions are built in a single place.  build_lb_vip_actions() was
supposed to do that but now part of it is done outside the function.
What do you think, does it sound feasible to you?

I added Mark's ack and pushed the patch to main and to branch-22.12.

Thanks,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 1/2] northd: Add logical flow to defrag ICMP traffic

2022-12-15 Thread Dumitru Ceara
On 12/7/22 22:02, Mark Michelson wrote:
> Thanks, Ales!
> 
> Acked-by: Mark Michelson 
> 

Thanks Mark and Ales!

I pushed this to the main branch and to 22.12.

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v3 0/5] OVN IC multiple same routes fixes

2022-12-15 Thread Dumitru Ceara
On 12/15/22 22:30, Vladislav Odintsov wrote:
> Thanks Numan for the review.
> 
> It seems that for some reason appeared a memory leak, which constantly 
> reproduces.

Do you mean this one?

Direct leak of 64 byte(s) in 1 object(s) allocated from:
#0 0x49a052 in calloc 
(/home/runner/work/ovn/ovn/ovn-22.12.90/_build/sub/ic/ovn-ic+0x49a052)
#1 0x751142 in xcalloc__ /home/runner/work/ovn/ovn/ovs/lib/util.c:121:31
#2 0x751170 in xzalloc__ /home/runner/work/ovn/ovn/ovs/lib/util.c:131:12
#3 0x751235 in xzalloc /home/runner/work/ovn/ovn/ovs/lib/util.c:165:12
#4 0x740353 in ovsdb_idl_txn_add_map_op 
/home/runner/work/ovn/ovn/ovs/lib/ovsdb-idl.c:4160:29
#5 0x7402bf in ovsdb_idl_txn_write_partial_map 
/home/runner/work/ovn/ovn/ovs/lib/ovsdb-idl.c:4317:5
#6 0x71c57f in icsbrec_port_binding_update_external_ids_setkey 
/home/runner/work/ovn/ovn/ovn-22.12.90/_build/sub/lib/ovn-ic-sb-idl.c:7220:5
#7 0x4d3696 in update_isb_pb_external_ids 
/home/runner/work/ovn/ovn/ovn-22.12.90/_build/sub/../../ic/ovn-ic.c:573:5
#8 0x4d2017 in create_isb_pb 
/home/runner/work/ovn/ovn/ovn-22.12.90/_build/sub/../../ic/ovn-ic.c:713:5
#9 0x4ce6af in port_binding_run 
/home/runner/work/ovn/ovn/ovn-22.12.90/_build/sub/../../ic/ovn-ic.c:800:21
#10 0x4cbb87 in ovn_db_run 
/home/runner/work/ovn/ovn/ovn-22.12.90/_build/sub/../../ic/ovn-ic.c:1737:5
#11 0x4cac7a in main 
/home/runner/work/ovn/ovn/ovn-22.12.90/_build/sub/../../ic/ovn-ic.c:2026:17
#12 0x7fcd3d9e7082 in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x24082)

From: https://github.com/ovsrobot/ovn/actions/runs/3706396547

If so, I've seen it a few times before in CI and didn't have
time to investigate closely.  I'm quite confident it's not
related to your changes so I think it's fine to push your
series.  Like that it would also make it in time for the
22.12.0 release.

> I’ll give it a time tomorrow to find the source of a problem.

I'll wait at least until Friday morning (CET).

> 
> Regards,
> Vladislav Odintsov

Regards,
Dumitru

> 
>> On 16 Dec 2022, at 00:21, Numan Siddique  wrote:
>>
>> On Thu, Dec 15, 2022 at 12:02 PM Vladislav Odintsov > > wrote:
>>>
>>> v2 -> v3:
>>>  - Split patch #3 by two: first fixes a bug with duplicated route
>>>advertisement and will be considered for back-porting; the second one
>>>changes ovn-ic SB:Route schema and documents ovn-ic upgrade details.
>>>  - Address Dumitru's comment regarding loggin rate-limit.
>>>
>>> v1 -> v2:
>>>  - Split series by two: OVN IC-related changes and northd OF bucket limits
>>>  - Squash two patches in one
>>>  - Fix memory leak in add_to_routes_ad()
>>>  - Addressed review comments by Dumitru
>>>
>>> v1 description here:
>>>  
>>> https://patchwork.ozlabs.org/project/ovn/cover/20221202173147.3032702-1-odiv...@gmail.com/
>>>
>>> Vladislav Odintsov (5):
>>>  ic: remove orphan ovn interconnection routes
>>>  ic: lookup southbound port_binding only if needed
>>>  ic: prevent advertising/learning multiple same routes
>>>  ic: minor code improvements
>>>  ic-sb schema: add index for routes table & document upgrade path
>>
>> For the entire series:
>>
>> Acked-by: Numan Siddique mailto:num...@ovn.org>>
>>
>> Numan
>>
>>>
>>> Documentation/intro/install/ovn-upgrades.rst |  20 +++
>>> NEWS |   4 +
>>> ic/ovn-ic.c  | 142 +--
>>> ovn-ic-sb.ovsschema  |   6 +-
>>> tests/ovn-ic.at   | 133 
>>> +
>>> 5 files changed, 257 insertions(+), 48 deletions(-)
>>>
>>> --
>>> 2.36.1
>>>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn branch-22.12 2/2] Prepare for 22.12.1.

2022-12-15 Thread Mark Michelson
Signed-off-by: Mark Michelson 
---
I more-or-less randomized the time of day since this will be at some
point early tomorrow for me. I plan to regenerate the patch with the
correct time when I actually push the commit.
---
 NEWS | 3 +++
 configure.ac | 2 +-
 debian/changelog | 6 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index 226c67505..b8a31a496 100644
--- a/NEWS
+++ b/NEWS
@@ -1,3 +1,6 @@
+OVN v22.12.1 - xx xxx 
+--
+
 OVN v22.12.0 - 16 Dec 2022
 --
   - Add load balancer "affinity_timeout" option to configure load balancing
diff --git a/configure.ac b/configure.ac
index 101467253..357758e0c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -13,7 +13,7 @@
 # limitations under the License.
 
 AC_PREREQ(2.63)
-AC_INIT(ovn, 22.12.0, b...@openvswitch.org)
+AC_INIT(ovn, 22.12.1, b...@openvswitch.org)
 AC_CONFIG_MACRO_DIR([m4])
 AC_CONFIG_AUX_DIR([build-aux])
 AC_CONFIG_HEADERS([config.h])
diff --git a/debian/changelog b/debian/changelog
index cda378c12..54305c26d 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+OVN (22.12.1-1) unstable; urgency=low
+   [ OVN team ]
+   * New upstream version
+
+ -- OVN team   Thu, 16 Dec 2022 09:07:31 -0500
+
 ovn (22.12.0-1) unstable; urgency=low
 
* New upstream version
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn branch-22.12 0/2] Release patches for v22.12.0.

2022-12-15 Thread Mark Michelson


Mark Michelson (2):
  Set release date for 22.12.0.
  Prepare for 22.12.1.

 NEWS | 5 -
 configure.ac | 2 +-
 debian/changelog | 8 +++-
 3 files changed, 12 insertions(+), 3 deletions(-)

-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn branch-22.12 1/2] Set release date for 22.12.0.

2022-12-15 Thread Mark Michelson
Signed-off-by: Mark Michelson 
---
I more-or-less randomized the time of day since this will be at some
point early tomorrow for me. I plan to regenerate the patch with the
correct time when I actually push the commit.
---
 NEWS | 2 +-
 debian/changelog | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/NEWS b/NEWS
index a19e0dcb5..226c67505 100644
--- a/NEWS
+++ b/NEWS
@@ -1,4 +1,4 @@
-OVN v22.12.0 - xx xxx 
+OVN v22.12.0 - 16 Dec 2022
 --
   - Add load balancer "affinity_timeout" option to configure load balancing
 of traffic from a particular client to the same backend for a given
diff --git a/debian/changelog b/debian/changelog
index 792467bf7..cda378c12 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -2,7 +2,7 @@ ovn (22.12.0-1) unstable; urgency=low
 
* New upstream version
 
- -- OVN team   Mon, 28 Nov 2022 10:42:29 -0500
+ -- OVN team   Thu, 16 Dec 2022 09:07:31 -0500
 
 ovn (22.09.0-1) unstable; urgency=low
 
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v3 0/5] OVN IC multiple same routes fixes

2022-12-15 Thread Numan Siddique
On Thu, Dec 15, 2022 at 12:02 PM Vladislav Odintsov  wrote:
>
> v2 -> v3:
>   - Split patch #3 by two: first fixes a bug with duplicated route
> advertisement and will be considered for back-porting; the second one
> changes ovn-ic SB:Route schema and documents ovn-ic upgrade details.
>   - Address Dumitru's comment regarding loggin rate-limit.
>
> v1 -> v2:
>   - Split series by two: OVN IC-related changes and northd OF bucket limits
>   - Squash two patches in one
>   - Fix memory leak in add_to_routes_ad()
>   - Addressed review comments by Dumitru
>
> v1 description here:
>   
> https://patchwork.ozlabs.org/project/ovn/cover/20221202173147.3032702-1-odiv...@gmail.com/
>
> Vladislav Odintsov (5):
>   ic: remove orphan ovn interconnection routes
>   ic: lookup southbound port_binding only if needed
>   ic: prevent advertising/learning multiple same routes
>   ic: minor code improvements
>   ic-sb schema: add index for routes table & document upgrade path

For the entire series:

Acked-by: Numan Siddique 

Numan

>
>  Documentation/intro/install/ovn-upgrades.rst |  20 +++
>  NEWS |   4 +
>  ic/ovn-ic.c  | 142 +--
>  ovn-ic-sb.ovsschema  |   6 +-
>  tests/ovn-ic.at  | 133 +
>  5 files changed, 257 insertions(+), 48 deletions(-)
>
> --
> 2.36.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] northd: bypass connection tracking for stateless flows when there are LB flows present

2022-12-15 Thread Han Zhou
On Mon, Dec 12, 2022 at 1:28 AM venu iyer  wrote:
>
> Currently, even stateless flows are subject to connection tracking when
there are
> LB rules (for DNAT). However, if a flow needs to be subjected to LB, then
it shouldn't
> be configured as stateless.
>
> Stateless flow means we should not track it, and this change exempts
stateless
> flows from being tracked regardless of whether LB rules are present or
not.
>
> Signed-off-by: venu iyer 
> Acked-by: Han Zhou 
> ---
>  northd/northd.c |  25 +++-
>  northd/ovn-northd.8.xml |  57 
>  ovn-nb.xml  |   3 +
>  tests/ovn-northd.at |  76 +--
>  tests/ovn.at|   4 +-
>  tests/system-ovn.at | 296 
>  6 files changed, 383 insertions(+), 78 deletions(-)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index 7c48bb3b4..5d8ef612f 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -140,8 +140,8 @@ enum ovn_stage {
>  PIPELINE_STAGE(SWITCH, IN,  L2_UNKNOWN,26, "ls_in_l2_unknown")
 \
>
 \
>  /* Logical switch egress stages. */
  \
> -PIPELINE_STAGE(SWITCH, OUT, PRE_LB,   0, "ls_out_pre_lb")
  \
> -PIPELINE_STAGE(SWITCH, OUT, PRE_ACL,  1, "ls_out_pre_acl")
 \
> +PIPELINE_STAGE(SWITCH, OUT, PRE_ACL,  0, "ls_out_pre_acl")
 \
> +PIPELINE_STAGE(SWITCH, OUT, PRE_LB,   1, "ls_out_pre_lb")
  \
>  PIPELINE_STAGE(SWITCH, OUT, PRE_STATEFUL, 2, "ls_out_pre_stateful")
  \
>  PIPELINE_STAGE(SWITCH, OUT, ACL_HINT, 3, "ls_out_acl_hint")
  \
>  PIPELINE_STAGE(SWITCH, OUT, ACL,  4, "ls_out_acl")
 \
> @@ -215,6 +215,7 @@ enum ovn_stage {
>  #define REGBIT_ACL_LABEL  "reg0[13]"
>  #define REGBIT_FROM_RAMP  "reg0[14]"
>  #define REGBIT_PORT_SEC_DROP  "reg0[15]"
> +#define REGBIT_ACL_STATELESS  "reg0[16]"
>
>  #define REG_ORIG_DIP_IPV4 "reg1"
>  #define REG_ORIG_DIP_IPV6 "xxreg1"
> @@ -290,7 +291,7 @@ enum ovn_stage {
>   * | R0 | REGBIT_{CONNTRACK/DHCP/DNS}  |   |
  |
>   * || REGBIT_{HAIRPIN/HAIRPIN_REPLY}   |   |
  |
>   * || REGBIT_ACL_HINT_{ALLOW_NEW/ALLOW/DROP/BLOCK} |   |
  |
> - * || REGBIT_ACL_LABEL | X |
  |
> + * || REGBIT_ACL_{LABEL/STATELESS} | X |
  |
>   * ++--+ X |
  |
>   * | R5 |   UNUSED | X |
LB_L2_AFF_BACKEND_IP6   |
>   * | R1 | ORIG_DIP_IPV4 (>= IN_PRE_STATEFUL)   | R |
  |
> @@ -5693,17 +5694,18 @@ build_stateless_filter(struct ovn_datapath *od,
> const struct nbrec_acl *acl,
> struct hmap *lflows)
>  {
> +const char *action = REGBIT_ACL_STATELESS" = 1; next;";
>  if (!strcmp(acl->direction, "from-lport")) {
>  ovn_lflow_add_with_hint(lflows, od, S_SWITCH_IN_PRE_ACL,
>  acl->priority + OVN_ACL_PRI_OFFSET,
>  acl->match,
> -"next;",
> +action,
>  >header_);
>  } else {
>  ovn_lflow_add_with_hint(lflows, od, S_SWITCH_OUT_PRE_ACL,
>  acl->priority + OVN_ACL_PRI_OFFSET,
>  acl->match,
> -"next;",
> +action,
>  >header_);
>  }
>  }
> @@ -5795,6 +5797,10 @@ build_pre_acls(struct ovn_datapath *od, const
struct hmap *port_groups,
>REGBIT_CONNTRACK_DEFRAG" = 1; next;");
>  ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_ACL, 100, "ip",
>REGBIT_CONNTRACK_DEFRAG" = 1; next;");
> +} else if (od->has_lb_vip) {
> +/* We'll build stateless filters if there are LB rules so that
> + * the stateless flows are not tracked in pre-lb. */
> + build_stateless_filters(od, port_groups, lflows);
>  }
>  }
>
> @@ -5930,6 +5936,12 @@ build_pre_lb(struct ovn_datapath *od, const struct
shash *meter_groups,
>   110, lflows);
>  }
>
> +/* Do not sent statless flows via conntrack */
> +ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, 110,
> +  REGBIT_ACL_STATELESS" == 1", "next;");
> +ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB, 110,
> +  REGBIT_ACL_STATELESS" == 1", "next;");
> +
>  /* 'REGBIT_CONNTRACK_NAT' is set to let the pre-stateful table send
>   * packet to conntrack for defragmentation and possibly for
unNATting.
>   *
> @@ -6935,7 +6947,8 @@ build_lb_rules_pre_stateful(struct hmap *lflows,
struct ovn_northd_lb *lb,
>  }
>  ds_put_format(action, "%s;", ct_lb_mark ? "ct_lb_mark" :

Re: [ovs-dev] [PATCH ovn v3 0/5] OVN IC multiple same routes fixes

2022-12-15 Thread Vladislav Odintsov
Thanks Numan for the review.

It seems that for some reason appeared a memory leak, which constantly 
reproduces.
I’ll give it a time tomorrow to find the source of a problem.

Regards,
Vladislav Odintsov

> On 16 Dec 2022, at 00:21, Numan Siddique  wrote:
> 
> On Thu, Dec 15, 2022 at 12:02 PM Vladislav Odintsov  > wrote:
>> 
>> v2 -> v3:
>>  - Split patch #3 by two: first fixes a bug with duplicated route
>>advertisement and will be considered for back-porting; the second one
>>changes ovn-ic SB:Route schema and documents ovn-ic upgrade details.
>>  - Address Dumitru's comment regarding loggin rate-limit.
>> 
>> v1 -> v2:
>>  - Split series by two: OVN IC-related changes and northd OF bucket limits
>>  - Squash two patches in one
>>  - Fix memory leak in add_to_routes_ad()
>>  - Addressed review comments by Dumitru
>> 
>> v1 description here:
>>  
>> https://patchwork.ozlabs.org/project/ovn/cover/20221202173147.3032702-1-odiv...@gmail.com/
>> 
>> Vladislav Odintsov (5):
>>  ic: remove orphan ovn interconnection routes
>>  ic: lookup southbound port_binding only if needed
>>  ic: prevent advertising/learning multiple same routes
>>  ic: minor code improvements
>>  ic-sb schema: add index for routes table & document upgrade path
> 
> For the entire series:
> 
> Acked-by: Numan Siddique mailto:num...@ovn.org>>
> 
> Numan
> 
>> 
>> Documentation/intro/install/ovn-upgrades.rst |  20 +++
>> NEWS |   4 +
>> ic/ovn-ic.c  | 142 +--
>> ovn-ic-sb.ovsschema  |   6 +-
>> tests/ovn-ic.at   | 133 
>> +
>> 5 files changed, 257 insertions(+), 48 deletions(-)
>> 
>> --
>> 2.36.1
>> 
>> ___
>> dev mailing list
>> d...@openvswitch.org 
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev 
>> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 2/2] northd: Add automatic memory trimming when idle.

2022-12-15 Thread Dumitru Ceara
To do this we factor the memory trimming code out into its own module,
memory-trim.  We use this now for both the lflow cache (in
ovn-controller) and for ovn-northd.

Signed-off-by: Dumitru Ceara 
---
 controller/lflow-cache.c |   68 +-
 lib/automake.mk  |2 +
 lib/memory-trim.c|  120 ++
 lib/memory-trim.h|   33 +
 northd/inc-proc-northd.c |4 +-
 northd/inc-proc-northd.h |2 -
 northd/ovn-northd.c  |   33 -
 ovn-nb.xml   |9 +++
 8 files changed, 212 insertions(+), 59 deletions(-)
 create mode 100644 lib/memory-trim.c
 create mode 100644 lib/memory-trim.h

diff --git a/controller/lflow-cache.c b/controller/lflow-cache.c
index 9fca2d7441..9d0e9cd10b 100644
--- a/controller/lflow-cache.c
+++ b/controller/lflow-cache.c
@@ -24,10 +24,9 @@
 #include "coverage.h"
 #include "lflow-cache.h"
 #include "lib/uuid.h"
-#include "openvswitch/poll-loop.h"
+#include "memory-trim.h"
 #include "openvswitch/vlog.h"
 #include "ovn/expr.h"
-#include "timeval.h"
 
 VLOG_DEFINE_THIS_MODULE(lflow_cache);
 
@@ -54,6 +53,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
 
 struct lflow_cache {
 struct hmap entries[LCACHE_T_MAX];
+struct memory_trimmer *mt;
 uint32_t n_entries;
 uint32_t high_watermark;
 uint32_t capacity;
@@ -61,10 +61,7 @@ struct lflow_cache {
 uint64_t max_mem_usage;
 uint32_t trim_limit;
 uint32_t trim_wmark_perc;
-uint32_t trim_timeout_ms;
 uint64_t trim_count;
-long long int last_active_ms;
-bool recently_active;
 bool enabled;
 };
 
@@ -84,7 +81,6 @@ static struct lflow_cache_value *lflow_cache_add__(
 static void lflow_cache_delete__(struct lflow_cache *lc,
  struct lflow_cache_entry *lce);
 static void lflow_cache_trim__(struct lflow_cache *lc, bool force);
-static void lflow_cache_record_activity__(struct lflow_cache *lc);
 
 struct lflow_cache *
 lflow_cache_create(void)
@@ -94,6 +90,7 @@ lflow_cache_create(void)
 for (size_t i = 0; i < LCACHE_T_MAX; i++) {
 hmap_init(>entries[i]);
 }
+lc->mt = memory_trimmer_create();
 
 return lc;
 }
@@ -127,6 +124,7 @@ lflow_cache_destroy(struct lflow_cache *lc)
 for (size_t i = 0; i < LCACHE_T_MAX; i++) {
 hmap_destroy(>entries[i]);
 }
+memory_trimmer_destroy(lc->mt);
 free(lc);
 }
 
@@ -146,13 +144,6 @@ lflow_cache_enable(struct lflow_cache *lc, bool enabled, 
uint32_t capacity,
 trim_wmark_perc = 100;
 }
 
-if (trim_timeout_ms < 1000) {
-VLOG_WARN_RL(, "Invalid requested trim timeout: "
- "requested %"PRIu32" ms, using 1000 ms instead",
- trim_timeout_ms);
-trim_timeout_ms = 1000;
-}
-
 uint64_t max_mem_usage = max_mem_usage_kb * 1024;
 bool need_flush = false;
 bool need_trim = false;
@@ -172,13 +163,13 @@ lflow_cache_enable(struct lflow_cache *lc, bool enabled, 
uint32_t capacity,
 lc->max_mem_usage = max_mem_usage;
 lc->trim_limit = lflow_trim_limit;
 lc->trim_wmark_perc = trim_wmark_perc;
-lc->trim_timeout_ms = trim_timeout_ms;
+memory_trimmer_set(lc->mt, trim_timeout_ms);
 
 if (need_flush) {
-lflow_cache_record_activity__(lc);
+memory_trimmer_record_activity(lc->mt);
 lflow_cache_flush(lc);
 } else if (need_trim) {
-lflow_cache_record_activity__(lc);
+memory_trimmer_record_activity(lc->mt);
 lflow_cache_trim__(lc, false);
 }
 }
@@ -286,7 +277,7 @@ lflow_cache_delete(struct lflow_cache *lc, const struct 
uuid *lflow_uuid)
 lflow_cache_delete__(lc, CONTAINER_OF(lcv, struct lflow_cache_entry,
   value));
 lflow_cache_trim__(lc, false);
-lflow_cache_record_activity__(lc);
+memory_trimmer_record_activity(lc->mt);
 }
 }
 
@@ -327,41 +318,15 @@ lflow_cache_get_memory_usage(const struct lflow_cache 
*lc, struct simap *usage)
 void
 lflow_cache_run(struct lflow_cache *lc)
 {
-if (!lc->recently_active) {
-return;
-}
-
-long long int now = time_msec();
-if (now < lc->last_active_ms || now < lc->trim_timeout_ms) {
-VLOG_WARN_RL(, "Detected cache last active timestamp overflow");
-lc->recently_active = false;
-lflow_cache_trim__(lc, true);
-return;
-}
-
-if (now < lc->trim_timeout_ms) {
-VLOG_WARN_RL(, "Detected very large trim timeout: "
- "now %lld ms timeout %"PRIu32" ms",
- now, lc->trim_timeout_ms);
-return;
-}
-
-if (now - lc->trim_timeout_ms >= lc->last_active_ms) {
-VLOG_INFO_RL(, "Detected cache inactivity "
-"(last active %lld ms ago): trimming cache",
-now - lc->last_active_ms);
+if (memory_trimmer_run(lc->mt)) {
 lflow_cache_trim__(lc, true);

[ovs-dev] [PATCH ovn 0/2] Add automatic memory trimming to northd.

2022-12-15 Thread Dumitru Ceara
Dumitru Ceara (2):
  ovn-northd: Add IDL memory usage information.
  northd: Add automatic memory trimming when idle.


 controller/lflow-cache.c |  68 --
 lib/automake.mk  |   2 +
 lib/memory-trim.c| 120 +++
 lib/memory-trim.h|  33 +++
 northd/inc-proc-northd.c |   4 +-
 northd/inc-proc-northd.h |   2 +-
 northd/ovn-northd.c  |  33 ++-
 ovn-nb.xml   |   9 +++
 8 files changed, 212 insertions(+), 59 deletions(-)
 create mode 100644 lib/memory-trim.c
 create mode 100644 lib/memory-trim.h

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 1/2] ovn-northd: Add IDL memory usage information.

2022-12-15 Thread Dumitru Ceara
Signed-off-by: Dumitru Ceara 
---
 northd/ovn-northd.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 82d2874d66..3a575b02a5 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -849,7 +849,8 @@ main(int argc, char *argv[])
 if (memory_should_report()) {
 struct simap usage = SIMAP_INITIALIZER();
 
-/* Nothing special to report yet. */
+ovsdb_idl_get_memory_usage(ovnnb_idl_loop.idl, );
+ovsdb_idl_get_memory_usage(ovnsb_idl_loop.idl, );
 memory_report();
 simap_destroy();
 }

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovs-lib: accept optional timeout value for upgrade_cluster

2022-12-15 Thread Frode Nordahl
tor. 15. des. 2022, 20:45 skrev Dan Williams :

> On Thu, 2022-12-15 at 19:46 +0100, Frode Nordahl wrote:
> > On Thu, Dec 15, 2022 at 7:16 PM Dan Williams  wrote:
> > > On Thu, 2022-12-15 at 19:04 +0100, Frode Nordahl wrote:
> >
> > [ snip ]
> >
> > > > Thank you for proposing this patch!
> > > >
> > > > We've been seeing reports of schema upgrades failing in the too
> > > > and
> > > > have waited for some way to reproduce and see if this would be a
> > > > fix.
> > > >
> > > > Are you seeing this with clustered databases, and could your
> > > > problem
> > > > be related to the election timer? If it is, raising the client
> > > > side
> > > > timer alone could be problematic.
> > >
> > > Yes clustered.
> > >
> > > A 450MB LB-heavy database generated by ovn-kubernetes with OVN
> > > 22.06
> > > (which lacks DP groups for SB load balancers) was being upgraded
> > > from
> > > OVN 22.06 -> 22.09 (but using same OVS 2.17 version, so no change
> > > to
> > > ovsdb-server) and when the ovsdb-servers got restarted as part of
> > > the
> > > ovn-kube container upgrade, they took longer to read+parse the
> > > database
> > > than the 30s upgrade_cluster() timer, thus the container failed and
> > > was
> > > put in CrashLoopBackoff by Kubernetes.
> > >
> > > ovn-ctl was never able to update the schema, and thus 22.09 ovn-
> > > northd
> > > was never able to reduce the DB size by rewriting it to use DP
> > > groups
> > > for SB LBs and recover.
> > >
> > > This patch is really a workaround and needs a corresponding ovn-ctl
> > > patch to accept a timeout for the NB/SB DB start functions that our
> > > OpenShift container scripts would pass.
> >
> > Right, so you plan to couple the value of the ovsdb-client timeout to
> > the value of the ovsdb-server election timer, that makes sense.
>
> Actually no, I hadn't. AFAIK they aren't related. Election timer is
> more relevant while ovsdb-server is running (eg, number of connected
> client and what monitors they requested), while the timeout in this
> patch is only about how long the schema upgrade process waits for
> ovsdb-server to respond at startup (eg, size of the on-disk database it
> has to parse in and allocate memory for).
>

This is the crux of the issue, as laid out in the discussion [0] and bug
report [1].

At this point in time, the time spent on a online schema conversion in a
clustered database cannot exceed the server side election timer, increasing
the client side timer in isolation will only prolong the amount of time one
of the cluster nodes will spend retrying the conversion, not having
anywhere to write the result.


> FWIW our OpenShift + ovn-kubernetes election timers are 16 seconds and
> that works in clusters the same scale as the problem DB we have. But
> reading that DB in takes ~28 or 30 seconds. So I'd probably bump the DB
> startup timeout we use to > 45s since there are 2 other databases still
> running to take the load while the 3rd one is restarting.
>

That is reasonable, but the conversion is happening on the server side for
a clustered database, so you would also need to increase the election timer.


> >
> > > The real fix is, like Ilya suggests, "reduce the size of the DB" as
> > > we've found that the most effective scale strategy for OpenShift
> > > and
> > > ovn-kubernetes. And I think that strategy has paid off tremendously
> > > over the last 2 years we've been working on OVN & ovsdb-server
> > > scale.
> > >
> > > Huge credit to Ilya and the OVN team for making that happen...
> >
> > Hear hear, wonders has been worked on in many of the OVS and OVN
> > components over the past years, my hat off to everyone involved.
> >
> > The best place to continue the discussion would probably be in the
> > thread below, but in short, I agree that reducing the size of the DB
> > is of course the best medicine. However, this specific piece of the
> > machinery is involved in upgrades, and upgrades are unfortunately
> > required to move our shared user base forward to the new promised
> > land
> > of software that produces smaller databases :)
>
> Yep. Maybe some of this can be backported to help ease that upgrade
> process.
>

Luckily, the schema conversion process can often be done as the first thing
after an upgrade, so any changes here would definitively be easier to get
out there.

--
Frode Nordahl

Dan
>
> >
> > > > I recently raised a discussion about this on the list to figure
> > > > out
> > > > possible paths forward [0][1].
> > > >
> > > > 0:
> > > >
> https://mail.openvswitch.org/pipermail/ovs-discuss/2022-December/052140.html
> > > > 1: https://bugs.launchpad.net/bugs/1999605
> >
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovs-lib: accept optional timeout value for upgrade_cluster

2022-12-15 Thread Dan Williams
On Thu, 2022-12-15 at 19:46 +0100, Frode Nordahl wrote:
> On Thu, Dec 15, 2022 at 7:16 PM Dan Williams  wrote:
> > On Thu, 2022-12-15 at 19:04 +0100, Frode Nordahl wrote:
> 
> [ snip ]
> 
> > > Thank you for proposing this patch!
> > > 
> > > We've been seeing reports of schema upgrades failing in the too
> > > and
> > > have waited for some way to reproduce and see if this would be a
> > > fix.
> > > 
> > > Are you seeing this with clustered databases, and could your
> > > problem
> > > be related to the election timer? If it is, raising the client
> > > side
> > > timer alone could be problematic.
> > 
> > Yes clustered.
> > 
> > A 450MB LB-heavy database generated by ovn-kubernetes with OVN
> > 22.06
> > (which lacks DP groups for SB load balancers) was being upgraded
> > from
> > OVN 22.06 -> 22.09 (but using same OVS 2.17 version, so no change
> > to
> > ovsdb-server) and when the ovsdb-servers got restarted as part of
> > the
> > ovn-kube container upgrade, they took longer to read+parse the
> > database
> > than the 30s upgrade_cluster() timer, thus the container failed and
> > was
> > put in CrashLoopBackoff by Kubernetes.
> > 
> > ovn-ctl was never able to update the schema, and thus 22.09 ovn-
> > northd
> > was never able to reduce the DB size by rewriting it to use DP
> > groups
> > for SB LBs and recover.
> > 
> > This patch is really a workaround and needs a corresponding ovn-ctl
> > patch to accept a timeout for the NB/SB DB start functions that our
> > OpenShift container scripts would pass.
> 
> Right, so you plan to couple the value of the ovsdb-client timeout to
> the value of the ovsdb-server election timer, that makes sense.

Actually no, I hadn't. AFAIK they aren't related. Election timer is
more relevant while ovsdb-server is running (eg, number of connected
client and what monitors they requested), while the timeout in this
patch is only about how long the schema upgrade process waits for
ovsdb-server to respond at startup (eg, size of the on-disk database it
has to parse in and allocate memory for).

FWIW our OpenShift + ovn-kubernetes election timers are 16 seconds and
that works in clusters the same scale as the problem DB we have. But
reading that DB in takes ~28 or 30 seconds. So I'd probably bump the DB
startup timeout we use to > 45s since there are 2 other databases still
running to take the load while the 3rd one is restarting.

> 
> > The real fix is, like Ilya suggests, "reduce the size of the DB" as
> > we've found that the most effective scale strategy for OpenShift
> > and
> > ovn-kubernetes. And I think that strategy has paid off tremendously
> > over the last 2 years we've been working on OVN & ovsdb-server
> > scale.
> > 
> > Huge credit to Ilya and the OVN team for making that happen...
> 
> Hear hear, wonders has been worked on in many of the OVS and OVN
> components over the past years, my hat off to everyone involved.
> 
> The best place to continue the discussion would probably be in the
> thread below, but in short, I agree that reducing the size of the DB
> is of course the best medicine. However, this specific piece of the
> machinery is involved in upgrades, and upgrades are unfortunately
> required to move our shared user base forward to the new promised
> land
> of software that produces smaller databases :)

Yep. Maybe some of this can be backported to help ease that upgrade
process.

Dan

> 
> > > I recently raised a discussion about this on the list to figure
> > > out
> > > possible paths forward [0][1].
> > > 
> > > 0:
> > > https://mail.openvswitch.org/pipermail/ovs-discuss/2022-December/052140.html
> > > 1: https://bugs.launchpad.net/bugs/1999605
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4 2/2] openflow: Add extension to flush CT by generic match

2022-12-15 Thread Ales Musil
Add extension that allows to flush connections from CT
by specifying fields that the connections should be
matched against. This allows to match only some fields
of the connection e.g. source address for orig direrction.

Reported-at: https://bugzilla.redhat.com/2120546
Signed-off-by: Ales Musil 
---
v4: Allow ovs-ofctl flush/conntrack without any zone/tuple.
v3: Rebase on top of master.
v2: Rebase on top of master.
Use suggestion from Ilya.
---
 NEWS   |   3 +
 include/openflow/nicira-ext.h  |  30 +++
 include/openvswitch/ofp-msgs.h |   4 +
 include/openvswitch/ofp-util.h |   4 +
 lib/ofp-bundle.c   |   1 +
 lib/ofp-ct-util.c  | 146 +
 lib/ofp-ct-util.h  |   9 ++
 lib/ofp-print.c|  20 +
 lib/ofp-util.c |  25 ++
 lib/rconn.c|   1 +
 ofproto/ofproto-dpif.c |   8 +-
 ofproto/ofproto-provider.h |   7 +-
 ofproto/ofproto.c  |  30 ++-
 tests/ofp-print.at |  93 +
 tests/ovs-ofctl.at |  26 ++
 tests/system-traffic.at| 116 ++
 utilities/ovs-ofctl.c  |  38 +
 17 files changed, 503 insertions(+), 58 deletions(-)

diff --git a/NEWS b/NEWS
index ff8904b02..46b8faa41 100644
--- a/NEWS
+++ b/NEWS
@@ -16,6 +16,9 @@ Post-v3.0.0
  by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
- ovs-dpctl and related ovs-appctl commands:
  * "flush-conntrack" is capable of handling partial 5-tuple.
+   - OpenFlow:
+  * New OpenFlow extension NXT_CT_FLUSH to flush connections matching
+the specified fields.
 
 
 v3.0.0 - 15 Aug 2022
diff --git a/include/openflow/nicira-ext.h b/include/openflow/nicira-ext.h
index b68804991..32ce56d31 100644
--- a/include/openflow/nicira-ext.h
+++ b/include/openflow/nicira-ext.h
@@ -1064,4 +1064,34 @@ struct nx_zone_id {
 };
 OFP_ASSERT(sizeof(struct nx_zone_id) == 8);
 
+/* CT flush available TLVs. */
+enum nx_ct_flush_tlv_type {
+/* Outer types. */
+NXT_CT_ORIG_DIRECTION,/* CT orig direction outer type. */
+NXT_CT_REPLY_DIRECTION,   /* CT reply direction outer type. */
+
+/* Nested types. */
+NXT_CT_SRC,   /* CT source IPv6 or mapped IPv4 address. */
+NXT_CT_DST,   /* CT destination IPv6 or mapped IPv4 address. */
+NXT_CT_SRC_PORT,  /* CT source port. */
+NXT_CT_DST_PORT,  /* CT destination port. */
+NXT_CT_ICMP_ID,   /* CT ICMP id. */
+NXT_CT_ICMP_TYPE, /* CT ICMP type. */
+NXT_CT_ICMP_CODE, /* CT ICMP code. */
+
+/* Primitive types. */
+NXT_CT_ZONE_ID,   /* CT zone id. */
+};
+
+/* NXT_CT_FLUSH.
+ *
+ * Flushes the connection tracking specified by 5-tuple.
+ * The struct should be followed by TLVs specifying the matching parameters. */
+struct nx_ct_flush {
+uint8_t ip_proto;  /* IP protocol. */
+uint8_t family;/* L3 address family. */
+uint8_t zero[6];   /* Must be zero. */
+};
+OFP_ASSERT(sizeof(struct nx_ct_flush) == 8);
+
 #endif /* openflow/nicira-ext.h */
diff --git a/include/openvswitch/ofp-msgs.h b/include/openvswitch/ofp-msgs.h
index 921a937e5..659b0a3e7 100644
--- a/include/openvswitch/ofp-msgs.h
+++ b/include/openvswitch/ofp-msgs.h
@@ -526,6 +526,9 @@ enum ofpraw {
 
 /* NXST 1.0+ (4): struct nx_ipfix_stats_reply[]. */
 OFPRAW_NXST_IPFIX_FLOW_REPLY,
+
+/* NXT 1.0+ (32): struct nx_ct_flush, uint8_t[8][]. */
+OFPRAW_NXT_CT_FLUSH,
 };
 
 /* Decoding messages into OFPRAW_* values. */
@@ -772,6 +775,7 @@ enum ofptype {
 OFPTYPE_IPFIX_FLOW_STATS_REQUEST, /* OFPRAW_NXST_IPFIX_FLOW_REQUEST */
 OFPTYPE_IPFIX_FLOW_STATS_REPLY,   /* OFPRAW_NXST_IPFIX_FLOW_REPLY */
 OFPTYPE_CT_FLUSH_ZONE,/* OFPRAW_NXT_CT_FLUSH_ZONE. */
+OFPTYPE_CT_FLUSH,   /* OFPRAW_NXT_CT_FLUSH. */
 
 /* Flow monitor extension. */
 OFPTYPE_FLOW_MONITOR_CANCEL,  /* OFPRAW_NXT_FLOW_MONITOR_CANCEL.
diff --git a/include/openvswitch/ofp-util.h b/include/openvswitch/ofp-util.h
index 84937ae26..e10d90b9f 100644
--- a/include/openvswitch/ofp-util.h
+++ b/include/openvswitch/ofp-util.h
@@ -65,6 +65,10 @@ struct ofpbuf *ofputil_encode_echo_reply(const struct 
ofp_header *);
 
 struct ofpbuf *ofputil_encode_barrier_request(enum ofp_version);
 
+struct ofpbuf *ofputil_ct_match_encode(const struct ofputil_ct_match *match,
+   uint16_t *zone_id,
+   enum ofp_version version);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ofp-bundle.c b/lib/ofp-bundle.c
index 0161c2bc6..941a8370e 100644
--- a/lib/ofp-bundle.c
+++ b/lib/ofp-bundle.c
@@ -292,6 +292,7 @@ ofputil_is_bundlable(enum ofptype type)
 case OFPTYPE_IPFIX_FLOW_STATS_REQUEST:
 case OFPTYPE_IPFIX_FLOW_STATS_REPLY:
 case OFPTYPE_CT_FLUSH_ZONE:
+case OFPTYPE_CT_FLUSH:
 

[ovs-dev] [PATCH v4 1/2] ofp, dpif: Allow CT flush based on partial match

2022-12-15 Thread Ales Musil
Currently, the CT can be flushed by dpctl only be specifying
the whole 5-tuple. This is not very convenient when there are
only some fields known to the user of CT flush. Add new struct
ofputil_ct_match which represents the generic filtering that can
be done for CT flush. The match is done only on fields that are
non-zero with exception to the icmp fields.

This allows the filtering just within dpctl, however
it is a preparation for OpenFlow extension.

Reported-at: https://bugzilla.redhat.com/2120546
Signed-off-by: Ales Musil 
---
v4: Fix a flush all scenario.
v3: Rebase on top of master.
Address the C99 comment and missing dpif_close call.
v2: Rebase on top of master.
Address comments from Paolo.
---
 NEWS   |   2 +
 include/openvswitch/ofp-util.h |  28 +++
 lib/automake.mk|   2 +
 lib/ct-dpif.c  | 208 +
 lib/ct-dpif.h  |   4 +-
 lib/dpctl.c|  42 ++---
 lib/dpctl.man  |   3 +-
 lib/ofp-ct-util.c  | 326 +
 lib/ofp-ct-util.h  |  36 
 tests/system-traffic.at|  82 -
 10 files changed, 586 insertions(+), 147 deletions(-)
 create mode 100644 lib/ofp-ct-util.c
 create mode 100644 lib/ofp-ct-util.h

diff --git a/NEWS b/NEWS
index 265375e1c..ff8904b02 100644
--- a/NEWS
+++ b/NEWS
@@ -14,6 +14,8 @@ Post-v3.0.0
  10 Gbps link speed by default in case the actual link speed cannot be
  determined.  Previously it was 10 Mbps.  Values can still be overridden
  by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
+   - ovs-dpctl and related ovs-appctl commands:
+ * "flush-conntrack" is capable of handling partial 5-tuple.
 
 
 v3.0.0 - 15 Aug 2022
diff --git a/include/openvswitch/ofp-util.h b/include/openvswitch/ofp-util.h
index 091a09cad..84937ae26 100644
--- a/include/openvswitch/ofp-util.h
+++ b/include/openvswitch/ofp-util.h
@@ -19,6 +19,9 @@
 
 #include 
 #include 
+#include 
+#include 
+
 #include "openvswitch/ofp-protocol.h"
 
 struct ofp_header;
@@ -27,6 +30,31 @@ struct ofp_header;
 extern "C" {
 #endif
 
+struct ofputil_ct_tuple {
+struct in6_addr src;
+struct in6_addr dst;
+
+union {
+ovs_be16 src_port;
+ovs_be16 icmp_id;
+};
+union {
+ovs_be16 dst_port;
+struct {
+uint8_t icmp_code;
+uint8_t icmp_type;
+};
+};
+};
+
+struct ofputil_ct_match {
+uint8_t ip_proto;
+uint16_t l3_type;
+
+struct ofputil_ct_tuple tuple_orig;
+struct ofputil_ct_tuple tuple_reply;
+};
+
 bool ofputil_decode_hello(const struct ofp_header *,
   uint32_t *allowed_versions);
 struct ofpbuf *ofputil_encode_hello(uint32_t version_bitmap);
diff --git a/lib/automake.mk b/lib/automake.mk
index a0fabe38f..37135f118 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -227,6 +227,8 @@ lib_libopenvswitch_la_SOURCES = \
lib/ofp-actions.c \
lib/ofp-bundle.c \
lib/ofp-connection.c \
+   lib/ofp-ct-util.c \
+   lib/ofp-ct-util.h \
lib/ofp-ed-props.c \
lib/ofp-errors.c \
lib/ofp-flow.c \
diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index 6f17a26b5..9b324ff79 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -20,6 +20,7 @@
 #include 
 
 #include "ct-dpif.h"
+#include "ofp-ct-util.h"
 #include "openvswitch/ofp-parse.h"
 #include "openvswitch/vlog.h"
 
@@ -71,6 +72,31 @@ ct_dpif_dump_start(struct dpif *dpif, struct 
ct_dpif_dump_state **dump,
 return err;
 }
 
+static void
+ct_dpif_tuple_from_ofputil_ct_tuple(const struct ofputil_ct_tuple *ofp_tuple,
+struct ct_dpif_tuple *tuple,
+uint16_t l3_type, uint8_t ip_proto)
+{
+if (l3_type == AF_INET) {
+tuple->src.ip = in6_addr_get_mapped_ipv4(_tuple->src);
+tuple->dst.ip = in6_addr_get_mapped_ipv4(_tuple->dst);
+} else {
+tuple->src.in6 = ofp_tuple->src;
+tuple->dst.in6 = ofp_tuple->dst;
+}
+
+tuple->l3_type = l3_type;
+tuple->ip_proto = ip_proto;
+tuple->src_port = ofp_tuple->src_port;
+
+if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
+tuple->icmp_code = ofp_tuple->icmp_code;
+tuple->icmp_type = ofp_tuple->icmp_type;
+} else {
+tuple->dst_port = ofp_tuple->dst_port;
+}
+}
+
 /* Dump one connection from a tracker, and put it in 'entry'.
  *
  * 'dump' should have been initialized by ct_dpif_dump_start().
@@ -100,25 +126,77 @@ ct_dpif_dump_done(struct ct_dpif_dump_state *dump)
 ? dpif->dpif_class->ct_dump_done(dpif, dump)
 : EOPNOTSUPP);
 }
-
+
+static int
+ct_dpif_flush_tuple(struct dpif *dpif, const uint16_t *zone,
+const struct ofputil_ct_match *match) {
+struct ct_dpif_dump_state *dump;
+struct ct_dpif_entry cte;
+int error;
+int tot_bkts;
+
+if 

[ovs-dev] [PATCH v4 0/2] Add extension for partial CT flush

2022-12-15 Thread Ales Musil
Add extension that will allow to flush
connection from CT specified by full
orig 5-tuple or partial match that combines
orig or reply direction.

Ales Musil (2):
  ofp, dpif: Allow CT flush based on partial match
  openflow: Add extension to flush CT by generic match

 NEWS   |   5 +
 include/openflow/nicira-ext.h  |  30 +++
 include/openvswitch/ofp-msgs.h |   4 +
 include/openvswitch/ofp-util.h |  32 +++
 lib/automake.mk|   2 +
 lib/ct-dpif.c  | 208 +++
 lib/ct-dpif.h  |   4 +-
 lib/dpctl.c|  42 ++-
 lib/dpctl.man  |   3 +-
 lib/ofp-bundle.c   |   1 +
 lib/ofp-ct-util.c  | 472 +
 lib/ofp-ct-util.h  |  45 
 lib/ofp-print.c|  20 ++
 lib/ofp-util.c |  25 ++
 lib/rconn.c|   1 +
 ofproto/ofproto-dpif.c |   8 +-
 ofproto/ofproto-provider.h |   7 +-
 ofproto/ofproto.c  |  30 ++-
 tests/ofp-print.at |  93 +++
 tests/ovs-ofctl.at |  26 ++
 tests/system-traffic.at| 132 +++--
 utilities/ovs-ofctl.c  |  38 +++
 22 files changed, 1056 insertions(+), 172 deletions(-)
 create mode 100644 lib/ofp-ct-util.c
 create mode 100644 lib/ofp-ct-util.h

-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovs-lib: accept optional timeout value for upgrade_cluster

2022-12-15 Thread Frode Nordahl
On Thu, Dec 15, 2022 at 7:16 PM Dan Williams  wrote:
> On Thu, 2022-12-15 at 19:04 +0100, Frode Nordahl wrote:

[ snip ]

> > Thank you for proposing this patch!
> >
> > We've been seeing reports of schema upgrades failing in the too and
> > have waited for some way to reproduce and see if this would be a fix.
> >
> > Are you seeing this with clustered databases, and could your problem
> > be related to the election timer? If it is, raising the client side
> > timer alone could be problematic.
>
> Yes clustered.
>
> A 450MB LB-heavy database generated by ovn-kubernetes with OVN 22.06
> (which lacks DP groups for SB load balancers) was being upgraded from
> OVN 22.06 -> 22.09 (but using same OVS 2.17 version, so no change to
> ovsdb-server) and when the ovsdb-servers got restarted as part of the
> ovn-kube container upgrade, they took longer to read+parse the database
> than the 30s upgrade_cluster() timer, thus the container failed and was
> put in CrashLoopBackoff by Kubernetes.
>
> ovn-ctl was never able to update the schema, and thus 22.09 ovn-northd
> was never able to reduce the DB size by rewriting it to use DP groups
> for SB LBs and recover.
>
> This patch is really a workaround and needs a corresponding ovn-ctl
> patch to accept a timeout for the NB/SB DB start functions that our
> OpenShift container scripts would pass.

Right, so you plan to couple the value of the ovsdb-client timeout to
the value of the ovsdb-server election timer, that makes sense.

> The real fix is, like Ilya suggests, "reduce the size of the DB" as
> we've found that the most effective scale strategy for OpenShift and
> ovn-kubernetes. And I think that strategy has paid off tremendously
> over the last 2 years we've been working on OVN & ovsdb-server scale.
>
> Huge credit to Ilya and the OVN team for making that happen...

Hear hear, wonders has been worked on in many of the OVS and OVN
components over the past years, my hat off to everyone involved.

The best place to continue the discussion would probably be in the
thread below, but in short, I agree that reducing the size of the DB
is of course the best medicine. However, this specific piece of the
machinery is involved in upgrades, and upgrades are unfortunately
required to move our shared user base forward to the new promised land
of software that produces smaller databases :)

> > I recently raised a discussion about this on the list to figure out
> > possible paths forward [0][1].
> >
> > 0:
> > https://mail.openvswitch.org/pipermail/ovs-discuss/2022-December/052140.html
> > 1: https://bugs.launchpad.net/bugs/1999605

-- 
Frode Nordahl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] rhel: move conf.db to /var/lib/openvswitch, using symlinks

2022-12-15 Thread Ilya Maximets
On 12/15/22 15:38, Timothy Redaelli wrote:
> On Tue, 13 Dec 2022 12:57:08 +0100
> Ilya Maximets  wrote:
> 
>> On 12/5/22 15:36, Ilya Maximets wrote:
>>> On 12/4/22 09:23, Roi Dayan wrote:


 On 30/11/2022 17:55, Ilya Maximets wrote:
> On 11/14/22 20:41, Timothy Redaelli wrote:
>> conf.db is by default at /etc/openvswitch, but it should be at
>> /var/lib/openvswitch like on Debian or like ovnnb_db.db and ovnsb_db.db.
>>
>> If conf.db already exists in /etc/openvswitch then it's moved to
>> /var/lib/openvswitch.
>> Symlinks are created for conf.db and .conf.db.~lock~ into 
>> /etc/openvswitch
>> for backward compatibility.
>>
>> Reported-at: 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.redhat.com%2F1830857data=05%7C01%7Croid%40nvidia.com%7Cd69116141ff645fc2c7308dad2eb4612%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638054205222362304%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=%2BIcIVZBKrfhIpq%2B6r6I3QvjdZ9KvjLsrRSlvi9kFHzc%3Dreserved=0
>> Reported-by: Yedidyah Bar David 
>> Signed-off-by: Timothy Redaelli 
>> ---
>> v1 -> v2:
>> - Use hugetlbfs group instead of openvswitch when the package is built
>>   with dpdk (as reported by Flavio)
>> ---
>>  rhel/openvswitch-fedora.spec.in | 27 +++
>>  1 file changed, 23 insertions(+), 4 deletions(-)
>
> If that works for Fedora, then LGTM.  Applied.
>
> Thanks!
> Best regards, Ilya Maximets.
> ___
> dev mailing list
> d...@openvswitch.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-devdata=05%7C01%7Croid%40nvidia.com%7Cd69116141ff645fc2c7308dad2eb4612%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638054205222362304%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=fZZh4iYeUu%2BL2%2F%2FWTIgPNzpvfhpe%2F9MANkVPLmv57aY%3Dreserved=0


 hi,

 This commit expose some kind of issue and cause openvswitch not
 to start on clean systems.

 If old conf.db file didn't exists it creates an empty conf.db with
 the touch command.
 Empty conf.db cause ovsdb-server not to start.

 #  /usr/share/openvswitch/scripts/ovs-ctl start
 ovsdb-tool: ovsdb error: /etc/openvswitch/conf.db: cannot identify file 
 type
 Starting ovsdb-server ovsdb-server: ovsdb error: /etc/openvswitch/conf.db: 
 cannot identify file type
[FAILED]

 If I remove the conf.db file (can leave the symbolic link in /etc)
 then ovs starts fine.
 # rm /var/lib/openvswitch/conf.db
 #  /usr/share/openvswitch/scripts/ovs-ctl start
 /etc/openvswitch/conf.db does not exist ... (warning).
 Creating empty database /etc/openvswitch/conf.db   [  OK  ]
 Starting ovsdb-server  [  OK  ]
 system ID not configured, please use --system-id ... failed!
 Configuring Open vSwitch system IDs[  OK  ]
 Starting ovs-vswitchd  [  OK  ]
 Enabling remote OVSDB managers [  OK  ]


 I'm not sure where it's better to fix this. either the spec here
 not to create an empty file or in ovsdb/log.c to an accept empty conf.db,
 or maybe even upgrade_db() in ovs-lib bash file to call create_db
 even if conf.db exists but it's empty.
>>>
>>> Thanks, Roi, for the report!
>>> I think, fixing the spec should be the right approach here.
>>
>> Hi, Timothy.  Do you plan to work on the fix for this issue?
> 
> Yes sure, I'll do that today or tomorrow
> 
>> Otherwise we may just revert the change for now until the proper
>> fix is available.  Thoughts?
> 
> I prefer just to fix it, without the revert, if it's possible

If you know how to fix that, than sure.  I tried to quickly come up
with a fix before sending a revert patch, but I wasn't successful.

Here is what I tried:

diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 4a3e6294b..ebb6a46ed 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -339,11 +339,10 @@ for base in conf.db .conf.db.~lock~; do
 if test ! -e $old && test ! -h $old; then
 ln -s $new $old
 fi
-touch $new
 %if %{with dpdk}
-chown openvswitch:hugetlbfs $new
+chown -h openvswitch:hugetlbfs $old
 %else
-chown openvswitch:openvswitch $new
+chown -h openvswitch:openvswitch $old
 %endif
 done
 
---

With that the ovsdb-server service still fails to start on my system.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org

Re: [ovs-dev] [PATCH] ovs-lib: accept optional timeout value for upgrade_cluster

2022-12-15 Thread Dan Williams
On Thu, 2022-12-15 at 19:04 +0100, Frode Nordahl wrote:
> Hello, Dan,
> 
> On Thu, Dec 15, 2022 at 6:57 PM Dan Williams  wrote:
> > 
> > Signed-off-by: Dan Williams 
> > ---
> >  utilities/ovs-lib.in | 13 +
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
> > index 13477a6a9e991..36e63312b6cba 100644
> > --- a/utilities/ovs-lib.in
> > +++ b/utilities/ovs-lib.in
> > @@ -475,11 +475,16 @@ upgrade_db () {
> >  }
> > 
> >  upgrade_cluster () {
> > -    local DB_SCHEMA=$1 DB_SERVER=$2
> > +    local DB_SCHEMA=$1 DB_SERVER=$2 TIMEOUT_SECONDS=$3
> >  local schema_name=$(ovsdb-tool schema-name $1) || return 1
> > 
> > -    action "Waiting for $schema_name to come up" ovsdb-client -t
> > 30 wait "$DB_SERVER" "$schema_name" connected || return $?
> > -    local db_version=$(ovsdb-client -t 10 get-schema-version
> > "$DB_SERVER" "$schema_name") || return $?
> > +    timeout_arg=30
> > +    if [ -n "$TIMEOUT_SECONDS" ]; then
> > +  timeout_arg="$TIMEOUT_SECONDS"
> > +    fi
> > +
> > +    action "Waiting for $schema_name to come up" ovsdb-client -t
> > $timeout_arg wait "$DB_SERVER" "$schema_name" connected || return
> > $?
> > +    local db_version=$(ovsdb-client -t $timeout_arg get-schema-
> > version "$DB_SERVER" "$schema_name") || return $?
> >  local target_version=$(ovsdb-tool schema-version "$DB_SCHEMA")
> > || return $?
> > 
> >  if ovsdb-tool compare-versions "$db_version" ==
> > "$target_version"; then
> > @@ -487,7 +492,7 @@ upgrade_cluster () {
> >  elif ovsdb-tool compare-versions "$db_version" ">"
> > "$target_version"; then
> >  log_warning_msg "Database $schema_name has newer schema
> > version ($db_version) than our local schema ($target_version),
> > possibly an upgrade is partially complete?"
> >  else
> > -    action "Upgrading database $schema_name from schema
> > version $db_version to $target_version" ovsdb-client -t 30 convert
> > "$DB_SERVER" "$DB_SCHEMA"
> > +    action "Upgrading database $schema_name from schema
> > version $db_version to $target_version" ovsdb-client -t
> > $timeout_arg convert "$DB_SERVER" "$DB_SCHEMA"
> >  fi
> >  }
> > 
> > --
> > 2.38.1
> > 
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> Thank you for proposing this patch!
> 
> We've been seeing reports of schema upgrades failing in the too and
> have waited for some way to reproduce and see if this would be a fix.
> 
> Are you seeing this with clustered databases, and could your problem
> be related to the election timer? If it is, raising the client side
> timer alone could be problematic.

Yes clustered.

A 450MB LB-heavy database generated by ovn-kubernetes with OVN 22.06
(which lacks DP groups for SB load balancers) was being upgraded from
OVN 22.06 -> 22.09 (but using same OVS 2.17 version, so no change to
ovsdb-server) and when the ovsdb-servers got restarted as part of the
ovn-kube container upgrade, they took longer to read+parse the database
than the 30s upgrade_cluster() timer, thus the container failed and was
put in CrashLoopBackoff by Kubernetes.

ovn-ctl was never able to update the schema, and thus 22.09 ovn-northd
was never able to reduce the DB size by rewriting it to use DP groups
for SB LBs and recover.

This patch is really a workaround and needs a corresponding ovn-ctl
patch to accept a timeout for the NB/SB DB start functions that our
OpenShift container scripts would pass.

The real fix is, like Ilya suggests, "reduce the size of the DB" as
we've found that the most effective scale strategy for OpenShift and
ovn-kubernetes. And I think that strategy has paid off tremendously
over the last 2 years we've been working on OVN & ovsdb-server scale.

Huge credit to Ilya and the OVN team for making that happen...

Dan


> 
> I recently raised a discussion about this on the list to figure out
> possible paths forward [0][1].
> 
> 0:
> https://mail.openvswitch.org/pipermail/ovs-discuss/2022-December/052140.html
> 1: https://bugs.launchpad.net/bugs/1999605
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovs-lib: accept optional timeout value for upgrade_cluster

2022-12-15 Thread Frode Nordahl
Hello, Dan,

On Thu, Dec 15, 2022 at 6:57 PM Dan Williams  wrote:
>
> Signed-off-by: Dan Williams 
> ---
>  utilities/ovs-lib.in | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
> index 13477a6a9e991..36e63312b6cba 100644
> --- a/utilities/ovs-lib.in
> +++ b/utilities/ovs-lib.in
> @@ -475,11 +475,16 @@ upgrade_db () {
>  }
>
>  upgrade_cluster () {
> -local DB_SCHEMA=$1 DB_SERVER=$2
> +local DB_SCHEMA=$1 DB_SERVER=$2 TIMEOUT_SECONDS=$3
>  local schema_name=$(ovsdb-tool schema-name $1) || return 1
>
> -action "Waiting for $schema_name to come up" ovsdb-client -t 30 wait 
> "$DB_SERVER" "$schema_name" connected || return $?
> -local db_version=$(ovsdb-client -t 10 get-schema-version "$DB_SERVER" 
> "$schema_name") || return $?
> +timeout_arg=30
> +if [ -n "$TIMEOUT_SECONDS" ]; then
> +  timeout_arg="$TIMEOUT_SECONDS"
> +fi
> +
> +action "Waiting for $schema_name to come up" ovsdb-client -t 
> $timeout_arg wait "$DB_SERVER" "$schema_name" connected || return $?
> +local db_version=$(ovsdb-client -t $timeout_arg get-schema-version 
> "$DB_SERVER" "$schema_name") || return $?
>  local target_version=$(ovsdb-tool schema-version "$DB_SCHEMA") || return 
> $?
>
>  if ovsdb-tool compare-versions "$db_version" == "$target_version"; then
> @@ -487,7 +492,7 @@ upgrade_cluster () {
>  elif ovsdb-tool compare-versions "$db_version" ">" "$target_version"; 
> then
>  log_warning_msg "Database $schema_name has newer schema version 
> ($db_version) than our local schema ($target_version), possibly an upgrade is 
> partially complete?"
>  else
> -action "Upgrading database $schema_name from schema version 
> $db_version to $target_version" ovsdb-client -t 30 convert "$DB_SERVER" 
> "$DB_SCHEMA"
> +action "Upgrading database $schema_name from schema version 
> $db_version to $target_version" ovsdb-client -t $timeout_arg convert 
> "$DB_SERVER" "$DB_SCHEMA"
>  fi
>  }
>
> --
> 2.38.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Thank you for proposing this patch!

We've been seeing reports of schema upgrades failing in the too and
have waited for some way to reproduce and see if this would be a fix.

Are you seeing this with clustered databases, and could your problem
be related to the election timer? If it is, raising the client side
timer alone could be problematic.

I recently raised a discussion about this on the list to figure out
possible paths forward [0][1].

0: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-December/052140.html
1: https://bugs.launchpad.net/bugs/1999605

-- 
Frode Nordahl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ovs-lib: accept optional timeout value for upgrade_cluster

2022-12-15 Thread Dan Williams
Signed-off-by: Dan Williams 
---
 utilities/ovs-lib.in | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
index 13477a6a9e991..36e63312b6cba 100644
--- a/utilities/ovs-lib.in
+++ b/utilities/ovs-lib.in
@@ -475,11 +475,16 @@ upgrade_db () {
 }
 
 upgrade_cluster () {
-local DB_SCHEMA=$1 DB_SERVER=$2
+local DB_SCHEMA=$1 DB_SERVER=$2 TIMEOUT_SECONDS=$3
 local schema_name=$(ovsdb-tool schema-name $1) || return 1
 
-action "Waiting for $schema_name to come up" ovsdb-client -t 30 wait 
"$DB_SERVER" "$schema_name" connected || return $?
-local db_version=$(ovsdb-client -t 10 get-schema-version "$DB_SERVER" 
"$schema_name") || return $?
+timeout_arg=30
+if [ -n "$TIMEOUT_SECONDS" ]; then
+  timeout_arg="$TIMEOUT_SECONDS"
+fi
+
+action "Waiting for $schema_name to come up" ovsdb-client -t $timeout_arg 
wait "$DB_SERVER" "$schema_name" connected || return $?
+local db_version=$(ovsdb-client -t $timeout_arg get-schema-version 
"$DB_SERVER" "$schema_name") || return $?
 local target_version=$(ovsdb-tool schema-version "$DB_SCHEMA") || return $?
 
 if ovsdb-tool compare-versions "$db_version" == "$target_version"; then
@@ -487,7 +492,7 @@ upgrade_cluster () {
 elif ovsdb-tool compare-versions "$db_version" ">" "$target_version"; then
 log_warning_msg "Database $schema_name has newer schema version 
($db_version) than our local schema ($target_version), possibly an upgrade is 
partially complete?"
 else
-action "Upgrading database $schema_name from schema version 
$db_version to $target_version" ovsdb-client -t 30 convert "$DB_SERVER" 
"$DB_SCHEMA"
+action "Upgrading database $schema_name from schema version 
$db_version to $target_version" ovsdb-client -t $timeout_arg convert 
"$DB_SERVER" "$DB_SCHEMA"
 fi
 }
 
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] Add IPv6 support for lb health-check

2022-12-15 Thread Numan Siddique
On Tue, Nov 22, 2022 at 12:54 PM Lorenzo Bianconi
 wrote:
>
> Add Similar to IPv4 counterpart, introduce IPv6 load-balancer health
> check support.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2136094
> Signed-off-by: Lorenzo Bianconi 

Hi Lorenzo,

Thanks for adding this missing feature.  Please see a few comments below.


> ---
>  controller/pinctrl.c| 213 -
>  northd/northd.c |  77 ++
>  northd/ovn-northd.8.xml |  17 +++
>  ovn-nb.xml  |   8 +-
>  tests/ovn.at| 201 ++-
>  tests/system-ovn.at | 230 +++-
>  6 files changed, 647 insertions(+), 99 deletions(-)
>
> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> index f44775c7e..89c2f207b 100644
> --- a/controller/pinctrl.c
> +++ b/controller/pinctrl.c
> @@ -6676,7 +6676,7 @@ sync_svc_monitors(struct ovsdb_idl_txn *ovnsb_idl_txn,
>  ovs_be32 ip4;
>  if (ip_parse(sb_svc_mon->ip, )) {
>  ip_addr = in6_addr_mapped_ipv4(ip4);
> -} else {
> +} else if (!ipv6_parse(sb_svc_mon->ip, _addr)) {
>  continue;
>  }
>
> @@ -6689,16 +6689,27 @@ sync_svc_monitors(struct ovsdb_idl_txn *ovnsb_idl_txn,
>  continue;
>  }
>
> -for (size_t j = 0; j < laddrs.n_ipv4_addrs; j++) {
> -if (ip4 == laddrs.ipv4_addrs[j].addr) {
> -ea = laddrs.ea;
> -mac_found = true;
> -break;
> +if (IN6_IS_ADDR_V4MAPPED(_addr)) {
> +for (size_t j = 0; j < laddrs.n_ipv4_addrs; j++) {
> +if (ip4 == laddrs.ipv4_addrs[j].addr) {
> +ea = laddrs.ea;
> +mac_found = true;
> +break;
> +}
> +}
> +} else {
> +for (size_t j = 0; j < laddrs.n_ipv6_addrs; j++) {
> +if (!memcmp(_addr, _addrs[j].addr,
> +sizeof(ovs_be32[4]))) {

It's a bit odd to use size of ovs_be32 here since we are memcmping
'struct in6_addr'.

I'd suggest using the macro - IN6_ARE_ADDR_EQUAL here.


> +ea = laddrs.ea;
> +mac_found = true;
> +break;
> +}
>  }
>  }
>
> -if (!mac_found && !laddrs.n_ipv4_addrs) {
> -/* IPv4 address(es) are not configured. Use the first mac. */
> +if (!mac_found && !laddrs.n_ipv4_addrs && !laddrs.n_ipv6_addrs) {
> +/* IP address(es) are not configured. Use the first mac. */
>  ea = laddrs.ea;
>  mac_found = true;
>  }
> @@ -6732,7 +6743,7 @@ sync_svc_monitors(struct ovsdb_idl_txn *ovnsb_idl_txn,
>  svc_mon->port_key = port_key;
>  svc_mon->proto_port = sb_svc_mon->port;
>  svc_mon->ip = ip_addr;
> -svc_mon->is_ip6 = false;
> +svc_mon->is_ip6 = !IN6_IS_ADDR_V4MAPPED(_addr);
>  svc_mon->state = SVC_MON_S_INIT;
>  svc_mon->status = SVC_MON_ST_UNKNOWN;
>  svc_mon->protocol = protocol;
> @@ -7500,26 +7511,30 @@ svc_monitor_send_tcp_health_check__(struct rconn 
> *swconn,
>  ovs_be32 tcp_ack,
>  ovs_be16 tcp_src)
>  {
> -if (svc_mon->is_ip6) {
> -return;
> -}
> -
>  /* Compose a TCP-SYN packet. */
>  uint64_t packet_stub[128 / 8];
>  struct dp_packet packet;
> +dp_packet_use_stub(, packet_stub, sizeof packet_stub);
>
>  struct eth_addr eth_src;
>  eth_addr_from_string(svc_mon->sb_svc_mon->src_mac, _src);
> -ovs_be32 ip4_src;
> -ip_parse(svc_mon->sb_svc_mon->src_ip, _src);
> -
> -dp_packet_use_stub(, packet_stub, sizeof packet_stub);
> -pinctrl_compose_ipv4(, eth_src, svc_mon->ea,
> - ip4_src, in6_addr_get_mapped_ipv4(_mon->ip),
> - IPPROTO_TCP, 63, TCP_HEADER_LEN);
> +if (svc_mon->is_ip6) {
> +struct in6_addr ip6_src;
> +ipv6_parse(svc_mon->sb_svc_mon->src_ip, _src);
> +pinctrl_compose_ipv6(, eth_src, svc_mon->ea,
> + _src, _mon->ip, IPPROTO_TCP,
> + 63, TCP_HEADER_LEN);
> +} else {
> +ovs_be32 ip4_src;
> +ip_parse(svc_mon->sb_svc_mon->src_ip, _src);
> +pinctrl_compose_ipv4(, eth_src, svc_mon->ea,
> + ip4_src, in6_addr_get_mapped_ipv4(_mon->ip),
> + IPPROTO_TCP, 63, TCP_HEADER_LEN);
> +}
>
>  struct tcp_header *th = dp_packet_l4();
>  dp_packet_set_l4(, th);
> +th->tcp_csum = 0;
>  th->tcp_dst = htons(svc_mon->proto_port);
>  th->tcp_src = tcp_src;
>
> @@ -7530,7 +7545,11 @@ 

Re: [ovs-dev] [PATCH] dpif-netdev: Load based PMD sleeping.

2022-12-15 Thread Kevin Traynor

On 15/12/2022 01:12, Thilak Raj Surendra Babu wrote:

Hi Kevin,
Apologize for the late reply.
Please find my replies inline.



Hi Thilak,

Thanks for your comments. Replies below.

Just a small thing on email format. If possible, could you send replies 
inline (i.e. not top-post and original mail line have ">") or if not, 
mark your comments with your name. This is just to make it easier for 
anyone else reading the thread to know who is saying what!


thanks,
Kevin.


Thanks
Thilak Raj S

-Original Message-
From: Kevin Traynor 
Sent: 08 December 2022 06:39
To: Thilak Raj Surendra Babu ; d...@openvswitch.org
Cc: david.march...@redhat.com; cfont...@redhat.com; cian.ferri...@intel.com
Subject: Re: [PATCH] dpif-netdev: Load based PMD sleeping.

On 07/12/2022 09:28, Thilak Raj Surendra Babu wrote:

Hi Kevin,
Thank you for the patch. It is pretty clean for power saving.


Hi Thilak, Thank you for testing.


I tried the patch below is what I see

Ping test:
Ping from physical machine to br0 ip-address.

Poll:
[root@dargil-4 pktgen-dpdk]# ping 10.15.251.33 -i 0.001 -c 10 PING
10.15.251.33 (10.15.251.33) 56(84) bytes of data.
64 bytes from 10.15.251.33: icmp_seq=1 ttl=64 time=0.081 ms
64 bytes from 10.15.251.33: icmp_seq=2 ttl=64 time=0.061 ms
64 bytes from 10.15.251.33: icmp_seq=3 ttl=64 time=0.050 ms
64 bytes from 10.15.251.33: icmp_seq=4 ttl=64 time=0.049 ms
64 bytes from 10.15.251.33: icmp_seq=5 ttl=64 time=0.048 ms
64 bytes from 10.15.251.33: icmp_seq=6 ttl=64 time=0.048 ms
64 bytes from 10.15.251.33: icmp_seq=7 ttl=64 time=0.047 ms
64 bytes from 10.15.251.33: icmp_seq=8 ttl=64 time=0.047 ms
64 bytes from 10.15.251.33: icmp_seq=9 ttl=64 time=0.047 ms
64 bytes from 10.15.251.33: icmp_seq=10 ttl=64 time=0.049 ms

--- 10.15.251.33 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9ms rtt
min/avg/max/mdev = 0.047/0.052/0.081/0.013 ms

With your changes:
[root@dargil-4 pktgen-dpdk]# ping 10.15.251.33 -i 0.001 -c 10 PING
10.15.251.33 (10.15.251.33) 56(84) bytes of data.
64 bytes from 10.15.251.33: icmp_seq=1 ttl=64 time=0.189 ms
64 bytes from 10.15.251.33: icmp_seq=2 ttl=64 time=0.367 ms
64 bytes from 10.15.251.33: icmp_seq=3 ttl=64 time=0.351 ms
64 bytes from 10.15.251.33: icmp_seq=4 ttl=64 time=0.315 ms
64 bytes from 10.15.251.33: icmp_seq=5 ttl=64 time=0.240 ms
64 bytes from 10.15.251.33: icmp_seq=6 ttl=64 time=0.204 ms
64 bytes from 10.15.251.33: icmp_seq=7 ttl=64 time=0.204 ms
64 bytes from 10.15.251.33: icmp_seq=8 ttl=64 time=0.138 ms
64 bytes from 10.15.251.33: icmp_seq=9 ttl=64 time=0.137 ms
64 bytes from 10.15.251.33: icmp_seq=10 ttl=64 time=0.417 ms

--- 10.15.251.33 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9ms rtt
min/avg/max/mdev = 0.137/0.256/0.417/0.094 ms

At a high level, I was thinking of the below.
Please let me know what you think?

1. Can we have the option of making a particular pmd thread sleep instead of it 
being at the global level.
There are some interfaces for which the flows being serviced by them
are very sensitive towards latency(heartbeats)


To just have a per pmd setting means that the user would need to know which 
PMDs were being used for that port (could be multiple) and react to any changes 
in assignments or changes to PMD cores.

It also means that anyone who wants it on all PMD cores now needs to be aware 
of which PMDs are being used and track changes and then re-set to the new PMDs.

So I think per-pmd config as a replacement for a global one is too difficult 
for a user.

*If* a per-pmd functionality was needed a better option would be to keep the 
global pmd-powersave setting AND add a per-pmd no-sleep override (default 
false).

The expectation then would be if a user is concerned about the latency on a 
particular port they would be prepared to tune and pin that port Rx
queue(s) to a specific PMD and set that PMD to never sleep, regardless of the 
global pmd-powersave mode.

It preserves the easy to use general setting for users, but it gives a user who 
has special requirements about some interfaces and is willing to go deeper into 
tuning the extra functionality.

It still does introduce some extra complexity for user and a bit for code too 
so we'd need to know that it's needed.

Note, by default it would not have an impact on a global pmd-powersave config 
option. In this way a per-pmd override is an add-on functionality and could be 
done at a later time.

That is certainly something that could be added if users felt it is needed and 
we can plan to do it but I feel it would be better to bed down that the core 
operation of sleeping is useful/correct first, before adding on extra ways to 
specify on/off.

So you are suggesting an override at PMD level instead of having this option 
itself at PMD level.
ovs-vsctl set open_vswitch . other_config:pmd-powersave="true" and
ovs-vsctl set open_vswitch . other_config:pmd-powersave-no-sleep:



Yes, that's what i was thinking.


I was thinking, if 

[ovs-dev] [PATCH ovn v3 5/5] ic-sb schema: add index for routes table & document upgrade path

2022-12-15 Thread Vladislav Odintsov
Add new uniq index to OVN IC Southbound Route table to prevent same
routes installation.
Also document ovn-ic upgrade details & note new schema change to known
possible duplicate records, which can fail the schema convert.

Signed-off-by: Vladislav Odintsov 
---
 Documentation/intro/install/ovn-upgrades.rst | 20 
 NEWS |  4 
 ovn-ic-sb.ovsschema  |  6 --
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/Documentation/intro/install/ovn-upgrades.rst 
b/Documentation/intro/install/ovn-upgrades.rst
index 4c131987e..d0865ede8 100644
--- a/Documentation/intro/install/ovn-upgrades.rst
+++ b/Documentation/intro/install/ovn-upgrades.rst
@@ -75,6 +75,20 @@ or if you're using a Linux distribution with systemd::
 
 $ sudo systemctl restart ovn-northd
 
+In case your deployment utilizes OVN Interconnection (OVN IC) functionality,
+it is also needed to restart ovn-ic daemons and separately restart ovn-ic
+databases.
+
+You may perform this restart using the ovn-ctl script::
+
+$ sudo /usr/share/openvswitch/scripts/ovn-ctl restart_ic
+$ sudo /usr/share/openvswitch/scripts/ovn-ctl restart_ic_ovsdb
+
+or if you're using a Linux distribution with systemd::
+
+$ sudo systemctl restart ovn-ic
+$ sudo systemctl restart ovn-ic-db
+
 Schema Change
 ^
 
@@ -102,6 +116,12 @@ of known impactible schema changes and how to fix when 
error encountered.
 
 $ ovn-sbctl chassis-del 
 
+#. Release 22.12: index [transit_switch, availability_zone, route_table,
+   ip_prefix, nexthop] added for OVN Interconnection Southbound DB table Route.
+   If there are duplicated records in this table, users are adviced to upgrade
+   ovn-ic daemons in all availability zones first and after that convert OVS
+   schema (restart ovn-ic database daemon).
+
 
 Upgrading OVN Integration
 -
diff --git a/NEWS b/NEWS
index d46f04943..9e5aae6ce 100644
--- a/NEWS
+++ b/NEWS
@@ -11,6 +11,10 @@ OVN v22.12.0 - xx xxx 
   - Add support for component templates within logical flows and load
 balancers.
   - Add support for remote port mirroring (Experimental).
+  - Add new OVN IC Route table index. This index ensures no duplicate routes
+can be advertized.  When upgrading to this version user must ensure that
+all ovn-ic daemons in all availability zones are upgraded before ovn-ic SB
+database schema is converted.
 
 OVN v22.09.0 - 16 Sep 2022
 --
diff --git a/ovn-ic-sb.ovsschema b/ovn-ic-sb.ovsschema
index 72c9d3f3e..1d60b36d1 100644
--- a/ovn-ic-sb.ovsschema
+++ b/ovn-ic-sb.ovsschema
@@ -1,7 +1,7 @@
 {
 "name": "OVN_IC_Southbound",
-"version": "1.1.0",
-"cksum": "2309827842 6784",
+"version": "1.1.1",
+"cksum": "3684563024 6914",
 "tables": {
 "IC_SB_Global": {
 "columns": {
@@ -101,6 +101,8 @@
 "external_ids": {
 "type": {"key": "string", "value": "string",
  "min": 0, "max": "unlimited"}}},
+"indexes": [["transit_switch", "availability_zone", "route_table",
+ "ip_prefix", "nexthop"]],
 "isRoot": true},
 "Connection": {
 "columns": {
-- 
2.36.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v3 4/5] ic: minor code improvements

2022-12-15 Thread Vladislav Odintsov
1. Remove excess nbrec_logical_router variable.
2. Remove excess call to add_static_to_routes_ad().
3. Remove double nexthop check in ic_route_find().

Signed-off-by: Vladislav Odintsov 
Acked-by: Dumitru Ceara 
---
 ic/ovn-ic.c | 21 +
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
index 518796149..73ce77e5c 100644
--- a/ic/ovn-ic.c
+++ b/ic/ovn-ic.c
@@ -895,8 +895,7 @@ ic_route_find(struct hmap *routes, const struct in6_addr 
*prefix,
 r->plen == plen &&
 ipv6_addr_equals(>nexthop, nexthop) &&
 !strcmp(r->origin, origin) &&
-!strcmp(r->route_table ? r->route_table : "", route_table) &&
-ipv6_addr_equals(>nexthop, nexthop)) {
+!strcmp(r->route_table ? r->route_table : "", route_table)) {
 return r;
 }
 }
@@ -1131,17 +1130,8 @@ add_static_to_routes_ad(
 struct hmap *routes_ad,
 const struct nbrec_logical_router_static_route *nb_route,
 const struct lport_addresses *nexthop_addresses,
-const struct smap *nb_options, const char *route_table)
+const struct smap *nb_options)
 {
-if (strcmp(route_table, nb_route->route_table)) {
-if (VLOG_IS_DBG_ENABLED()) {
-VLOG_DBG("Skip advertising route %s -> %s as its route table %s !="
- " %s of TS port", nb_route->ip_prefix, nb_route->nexthop,
- nb_route->route_table, route_table);
-}
-return;
-}
-
 struct in6_addr prefix, nexthop;
 unsigned int plen;
 if (!parse_route(nb_route->ip_prefix, nb_route->nexthop,
@@ -1566,10 +1556,10 @@ build_ts_routes_to_adv(struct ic_context *ctx,
 nbrec_logical_router_update_static_routes_delvalue(lr,
 nb_route);
 }
-} else {
+} else if (!strcmp(ts_route_table, nb_route->route_table)) {
 /* It may be a route to be advertised */
 add_static_to_routes_ad(routes_ad, nb_route, ts_port_addrs,
-_global->options, ts_route_table);
+_global->options);
 }
 }
 
@@ -1602,7 +1592,6 @@ advertise_lr_routes(struct ic_context *ctx,
 const struct icsbrec_port_binding *isb_pb;
 const char *lrp_name, *ts_name, *route_table;
 struct lport_addresses ts_port_addrs;
-const struct nbrec_logical_router *lr = ic_lr->lr;
 const struct icnbrec_transit_switch *key;
 
 struct hmap routes_ad = HMAP_INITIALIZER(_ad);
@@ -1620,7 +1609,7 @@ advertise_lr_routes(struct ic_context *ctx,
 VLOG_INFO_RL(, "Route sync ignores port %s on ts %s for router"
  " %s because the addresses are invalid.",
  isb_pb->logical_port, isb_pb->transit_switch,
- lr->name);
+ ic_lr->lr->name);
 continue;
 }
 lrp_name = get_lrp_name_by_ts_port_name(ctx, isb_pb->logical_port);
-- 
2.36.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v3 3/5] ic: prevent advertising/learning multiple same routes

2022-12-15 Thread Vladislav Odintsov
Advertize one route (ip_prefix, nexthop, route_table, transit_switch,
availability_zone) only once.

Signed-off-by: Vladislav Odintsov 
---
 ic/ovn-ic.c | 84 ++---
 tests/ovn-ic.at | 60 +++
 2 files changed, 111 insertions(+), 33 deletions(-)

diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
index 0d5e5f5d9..518796149 100644
--- a/ic/ovn-ic.c
+++ b/ic/ovn-ic.c
@@ -884,10 +884,12 @@ ic_route_hash(const struct in6_addr *prefix, unsigned int 
plen,
 static struct ic_route_info *
 ic_route_find(struct hmap *routes, const struct in6_addr *prefix,
   unsigned int plen, const struct in6_addr *nexthop,
-  const char *origin, char *route_table)
+  const char *origin, const char *route_table, uint32_t hash)
 {
 struct ic_route_info *r;
-uint32_t hash = ic_route_hash(prefix, plen, nexthop, origin, route_table);
+if (!hash) {
+hash = ic_route_hash(prefix, plen, nexthop, origin, route_table);
+}
 HMAP_FOR_EACH_WITH_HASH (r, node, hash, routes) {
 if (ipv6_addr_equals(>prefix, prefix) &&
 r->plen == plen &&
@@ -945,8 +947,8 @@ add_to_routes_learned(struct hmap *routes_learned,
 }
 const char *origin = smap_get_def(_route->options, "origin", "");
 if (ic_route_find(routes_learned, , plen, , origin,
-  nb_route->route_table)) {
-/* Route is already added to learned in previous iteration. */
+  nb_route->route_table, 0)) {
+/* Route was added to learned on previous iteration. */
 return true;
 }
 
@@ -1093,10 +1095,43 @@ route_need_advertise(const char *policy,
 }
 
 static void
-add_to_routes_ad(struct hmap *routes_ad,
- const struct nbrec_logical_router_static_route *nb_route,
- const struct lport_addresses *nexthop_addresses,
- const struct smap *nb_options, const char *route_table)
+add_to_routes_ad(struct hmap *routes_ad, const struct in6_addr prefix,
+ unsigned int plen, const struct in6_addr nexthop,
+ const char *origin, const char *route_table,
+ const struct nbrec_logical_router_port *nb_lrp,
+ const struct nbrec_logical_router_static_route *nb_route)
+{
+if (route_table == NULL) {
+route_table = "";
+}
+
+uint hash = ic_route_hash(, plen, , origin, route_table);
+
+if (!ic_route_find(routes_ad, , plen, , origin, route_table,
+   hash)) {
+struct ic_route_info *ic_route = xzalloc(sizeof *ic_route);
+ic_route->prefix = prefix;
+ic_route->plen = plen;
+ic_route->nexthop = nexthop;
+ic_route->nb_route = nb_route;
+ic_route->origin = origin;
+ic_route->route_table = route_table;
+ic_route->nb_lrp = nb_lrp;
+hmap_insert(routes_ad, _route->node, hash);
+} else {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+VLOG_WARN_RL(, "Duplicate route advertisement was suppressed! NB "
+ "route uuid: "UUID_FMT,
+ UUID_ARGS(_route->header_.uuid));
+}
+}
+
+static void
+add_static_to_routes_ad(
+struct hmap *routes_ad,
+const struct nbrec_logical_router_static_route *nb_route,
+const struct lport_addresses *nexthop_addresses,
+const struct smap *nb_options, const char *route_table)
 {
 if (strcmp(route_table, nb_route->route_table)) {
 if (VLOG_IS_DBG_ENABLED()) {
@@ -1145,16 +1180,8 @@ add_to_routes_ad(struct hmap *routes_ad,
 ds_destroy();
 }
 
-struct ic_route_info *ic_route = xzalloc(sizeof *ic_route);
-ic_route->prefix = prefix;
-ic_route->plen = plen;
-ic_route->nexthop = nexthop;
-ic_route->nb_route = nb_route;
-ic_route->origin = ROUTE_ORIGIN_STATIC;
-ic_route->route_table = nb_route->route_table;
-hmap_insert(routes_ad, _route->node,
-ic_route_hash(, plen, , ROUTE_ORIGIN_STATIC,
-  nb_route->route_table));
+add_to_routes_ad(routes_ad, prefix, plen, nexthop, ROUTE_ORIGIN_STATIC,
+ nb_route->route_table, NULL, nb_route);
 }
 
 static void
@@ -1198,18 +1225,9 @@ add_network_to_routes_ad(struct hmap *routes_ad, const 
char *network,
 ds_destroy();
 }
 
-struct ic_route_info *ic_route = xzalloc(sizeof *ic_route);
-ic_route->prefix = prefix;
-ic_route->plen = plen;
-ic_route->nexthop = nexthop;
-ic_route->nb_lrp = nb_lrp;
-ic_route->origin = ROUTE_ORIGIN_CONNECTED;
-
 /* directly-connected routes go to  route table */
-ic_route->route_table = NULL;
-hmap_insert(routes_ad, _route->node,
-ic_route_hash(, plen, ,
-  ROUTE_ORIGIN_CONNECTED, ""));
+add_to_routes_ad(routes_ad, prefix, plen, nexthop, ROUTE_ORIGIN_CONNECTED,
+ NULL, nb_lrp, NULL);
 }
 

[ovs-dev] [PATCH ovn v3 2/5] ic: lookup southbound port_binding only if needed

2022-12-15 Thread Vladislav Odintsov
Signed-off-by: Vladislav Odintsov 
Acked-by: Dumitru Ceara 
---
 ic/ovn-ic.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
index 2bc96d36c..0d5e5f5d9 100644
--- a/ic/ovn-ic.c
+++ b/ic/ovn-ic.c
@@ -757,6 +757,7 @@ port_binding_run(struct ic_context *ctx,
 }
 icsbrec_port_binding_index_destroy_row(isb_pb_key);
 
+const struct sbrec_port_binding *sb_pb;
 const struct icnbrec_transit_switch *ts;
 ICNBREC_TRANSIT_SWITCH_FOR_EACH (ts, ctx->ovninb_idl) {
 const struct nbrec_logical_switch *ls = find_ts_in_nb(ctx, ts->name);
@@ -788,9 +789,9 @@ port_binding_run(struct ic_context *ctx,
 for (int i = 0; i < ls->n_ports; i++) {
 lsp = ls->ports[i];
 
-const struct sbrec_port_binding *sb_pb = find_lsp_in_sb(ctx, lsp);
 if (!strcmp(lsp->type, "router")) {
 /* The port is local. */
+sb_pb = find_lsp_in_sb(ctx, lsp);
 if (!sb_pb) {
 continue;
 }
@@ -807,6 +808,7 @@ port_binding_run(struct ic_context *ctx,
 if (!isb_pb) {
 nbrec_logical_switch_update_ports_delvalue(ls, lsp);
 } else {
+sb_pb = find_lsp_in_sb(ctx, lsp);
 if (!sb_pb) {
 continue;
 }
-- 
2.36.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v3 1/5] ic: remove orphan ovn interconnection routes

2022-12-15 Thread Vladislav Odintsov
Before this patch if one deletes transit switch through which there were
routes in ICSB:Route table, such routes were left forever in the DB.

Now we validate that each ICSB:Route has an appropriate transit switch.

Signed-off-by: Vladislav Odintsov 
Acked-by: Dumitru Ceara 
---
 ic/ovn-ic.c | 41 +++
 tests/ovn-ic.at | 73 +
 2 files changed, 114 insertions(+)

diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
index e5c193d9d..2bc96d36c 100644
--- a/ic/ovn-ic.c
+++ b/ic/ovn-ic.c
@@ -71,6 +71,7 @@ struct ic_context {
 struct ovsdb_idl_index *icsbrec_port_binding_by_az;
 struct ovsdb_idl_index *icsbrec_port_binding_by_ts;
 struct ovsdb_idl_index *icsbrec_port_binding_by_ts_az;
+struct ovsdb_idl_index *icsbrec_route_by_az;
 struct ovsdb_idl_index *icsbrec_route_by_ts;
 struct ovsdb_idl_index *icsbrec_route_by_ts_az;
 };
@@ -1612,6 +1613,39 @@ advertise_lr_routes(struct ic_context *ctx,
 hmap_destroy(_ad);
 }
 
+static void
+delete_orphan_ic_routes(struct ic_context *ctx,
+ const struct icsbrec_availability_zone *az)
+{
+const struct icsbrec_route *isb_route, *isb_route_key =
+icsbrec_route_index_init_row(ctx->icsbrec_route_by_az);
+icsbrec_route_index_set_availability_zone(isb_route_key, az);
+
+const struct icnbrec_transit_switch *t_sw, *t_sw_key;
+
+ICSBREC_ROUTE_FOR_EACH_EQUAL (isb_route, isb_route_key,
+  ctx->icsbrec_route_by_az)
+{
+t_sw_key = icnbrec_transit_switch_index_init_row(
+ctx->icnbrec_transit_switch_by_name);
+icnbrec_transit_switch_index_set_name(t_sw_key,
+isb_route->transit_switch);
+t_sw = icnbrec_transit_switch_index_find(
+ctx->icnbrec_transit_switch_by_name, t_sw_key);
+icnbrec_transit_switch_index_destroy_row(t_sw_key);
+
+if (!t_sw) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+VLOG_INFO_RL(, "Deleting orphan ICDB:Route: %s->%s (%s, rtb:%s,"
+ " transit switch: %s)", isb_route->ip_prefix,
+ isb_route->nexthop, isb_route->origin,
+ isb_route->route_table, isb_route->transit_switch);
+icsbrec_route_delete(isb_route);
+}
+}
+icsbrec_route_index_destroy_row(isb_route_key);
+}
+
 static void
 route_run(struct ic_context *ctx,
   const struct icsbrec_availability_zone *az)
@@ -1620,6 +1654,8 @@ route_run(struct ic_context *ctx,
 return;
 }
 
+delete_orphan_ic_routes(ctx, az);
+
 struct hmap ic_lrs = HMAP_INITIALIZER(_lrs);
 const struct icsbrec_port_binding *isb_pb;
 const struct icsbrec_port_binding *isb_pb_key =
@@ -1908,6 +1944,10 @@ main(int argc, char *argv[])
   _port_binding_col_transit_switch,
   _port_binding_col_availability_zone);
 
+struct ovsdb_idl_index *icsbrec_route_by_az
+= ovsdb_idl_index_create1(ovnisb_idl_loop.idl,
+  _route_col_availability_zone);
+
 struct ovsdb_idl_index *icsbrec_route_by_ts
 = ovsdb_idl_index_create1(ovnisb_idl_loop.idl,
   _route_col_transit_switch);
@@ -1962,6 +2002,7 @@ main(int argc, char *argv[])
 .icsbrec_port_binding_by_az = icsbrec_port_binding_by_az,
 .icsbrec_port_binding_by_ts = icsbrec_port_binding_by_ts,
 .icsbrec_port_binding_by_ts_az = icsbrec_port_binding_by_ts_az,
+.icsbrec_route_by_az = icsbrec_route_by_az,
 .icsbrec_route_by_ts = icsbrec_route_by_ts,
 .icsbrec_route_by_ts_az = icsbrec_route_by_ts_az,
 };
diff --git a/tests/ovn-ic.at b/tests/ovn-ic.at
index 0bdfc55e6..e234b7fb9 100644
--- a/tests/ovn-ic.at
+++ b/tests/ovn-ic.at
@@ -121,6 +121,79 @@ OVN_CLEANUP_IC
 AT_CLEANUP
 ])
 
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([ovn-ic -- route deletion upon TS deletion])
+
+ovn_init_ic_db
+net_add n1
+
+# 1 GW per AZ
+for i in 1 2; do
+az=az$i
+ovn_start $az
+sim_add gw-$az
+as gw-$az
+check ovs-vsctl add-br br-phys
+ovn_az_attach $az n1 br-phys 192.168.1.$i
+check ovs-vsctl set open . external-ids:ovn-is-interconn=true
+check ovn-nbctl set nb-global . \
+options:ic-route-adv=true \
+options:ic-route-adv-default=true \
+options:ic-route-learn=true \
+options:ic-route-learn-default=true
+done
+
+create_ic_infra() {
+az_id=$1
+ts_id=$2
+az=az$i
+
+lsp=lsp${az_id}-${ts_id}
+lrp=lrp${az_id}-${ts_id}
+ts=ts${az_id}-${ts_id}
+lr=lr${az_id}-${ts_id}
+
+ovn_as $az
+
+check ovn-ic-nbctl ts-add $ts
+check ovn-nbctl lr-add $lr
+check ovn-nbctl lrp-add $lr $lrp 00:00:00:00:00:0$az_id 10.0.$az_id.1/24
+check ovn-nbctl lrp-set-gateway-chassis $lrp gw-$az
+
+check 

[ovs-dev] [PATCH ovn v3 0/5] OVN IC multiple same routes fixes

2022-12-15 Thread Vladislav Odintsov
v2 -> v3:
  - Split patch #3 by two: first fixes a bug with duplicated route
advertisement and will be considered for back-porting; the second one
changes ovn-ic SB:Route schema and documents ovn-ic upgrade details.
  - Address Dumitru's comment regarding loggin rate-limit.

v1 -> v2:
  - Split series by two: OVN IC-related changes and northd OF bucket limits
  - Squash two patches in one
  - Fix memory leak in add_to_routes_ad()
  - Addressed review comments by Dumitru

v1 description here:
  
https://patchwork.ozlabs.org/project/ovn/cover/20221202173147.3032702-1-odiv...@gmail.com/

Vladislav Odintsov (5):
  ic: remove orphan ovn interconnection routes
  ic: lookup southbound port_binding only if needed
  ic: prevent advertising/learning multiple same routes
  ic: minor code improvements
  ic-sb schema: add index for routes table & document upgrade path

 Documentation/intro/install/ovn-upgrades.rst |  20 +++
 NEWS |   4 +
 ic/ovn-ic.c  | 142 +--
 ovn-ic-sb.ovsschema  |   6 +-
 tests/ovn-ic.at  | 133 +
 5 files changed, 257 insertions(+), 48 deletions(-)

-- 
2.36.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 2/2] openflow: Add extension to flush CT by generic match

2022-12-15 Thread Paolo Valerio
Ales Musil  writes:

> Add extension that allows to flush connections from CT
> by specifying fields that the connections should be
> matched against. This allows to match only some fields
> of the connection e.g. source address for orig direrction.
>
> Reported-at: https://bugzilla.redhat.com/2120546
> Signed-off-by: Ales Musil 
> ---
> v3: Rebase on top of master.
> v2: Rebase on top of master.
> Use suggestion from Ilya.
> ---

Although a second opinion would be nice to have here,
the patch LGTM and the tests succeeded.

Acked-by: Paolo Valerio 

>  NEWS   |   3 +
>  include/openflow/nicira-ext.h  |  30 +++
>  include/openvswitch/ofp-msgs.h |   4 +
>  include/openvswitch/ofp-util.h |   4 +
>  lib/ofp-bundle.c   |   1 +
>  lib/ofp-ct-util.c  | 146 +
>  lib/ofp-ct-util.h  |   9 ++
>  lib/ofp-print.c|  20 +
>  lib/ofp-util.c |  25 ++
>  lib/rconn.c|   1 +
>  ofproto/ofproto-dpif.c |   8 +-
>  ofproto/ofproto-provider.h |   7 +-
>  ofproto/ofproto.c  |  30 ++-
>  tests/ofp-print.at |  93 +
>  tests/ovs-ofctl.at |  12 +++
>  tests/system-traffic.at| 116 ++
>  utilities/ovs-ofctl.c  |  38 +
>  17 files changed, 489 insertions(+), 58 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index ff8904b02..46b8faa41 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -16,6 +16,9 @@ Post-v3.0.0
>   by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
> - ovs-dpctl and related ovs-appctl commands:
>   * "flush-conntrack" is capable of handling partial 5-tuple.
> +   - OpenFlow:
> +  * New OpenFlow extension NXT_CT_FLUSH to flush connections matching
> +the specified fields.
>  
>  
>  v3.0.0 - 15 Aug 2022
> diff --git a/include/openflow/nicira-ext.h b/include/openflow/nicira-ext.h
> index b68804991..32ce56d31 100644
> --- a/include/openflow/nicira-ext.h
> +++ b/include/openflow/nicira-ext.h
> @@ -1064,4 +1064,34 @@ struct nx_zone_id {
>  };
>  OFP_ASSERT(sizeof(struct nx_zone_id) == 8);
>  
> +/* CT flush available TLVs. */
> +enum nx_ct_flush_tlv_type {
> +/* Outer types. */
> +NXT_CT_ORIG_DIRECTION,/* CT orig direction outer type. */
> +NXT_CT_REPLY_DIRECTION,   /* CT reply direction outer type. */
> +
> +/* Nested types. */
> +NXT_CT_SRC,   /* CT source IPv6 or mapped IPv4 address. */
> +NXT_CT_DST,   /* CT destination IPv6 or mapped IPv4 address. 
> */
> +NXT_CT_SRC_PORT,  /* CT source port. */
> +NXT_CT_DST_PORT,  /* CT destination port. */
> +NXT_CT_ICMP_ID,   /* CT ICMP id. */
> +NXT_CT_ICMP_TYPE, /* CT ICMP type. */
> +NXT_CT_ICMP_CODE, /* CT ICMP code. */
> +
> +/* Primitive types. */
> +NXT_CT_ZONE_ID,   /* CT zone id. */
> +};
> +
> +/* NXT_CT_FLUSH.
> + *
> + * Flushes the connection tracking specified by 5-tuple.
> + * The struct should be followed by TLVs specifying the matching parameters. 
> */
> +struct nx_ct_flush {
> +uint8_t ip_proto;  /* IP protocol. */
> +uint8_t family;/* L3 address family. */
> +uint8_t zero[6];   /* Must be zero. */
> +};
> +OFP_ASSERT(sizeof(struct nx_ct_flush) == 8);
> +
>  #endif /* openflow/nicira-ext.h */
> diff --git a/include/openvswitch/ofp-msgs.h b/include/openvswitch/ofp-msgs.h
> index 921a937e5..659b0a3e7 100644
> --- a/include/openvswitch/ofp-msgs.h
> +++ b/include/openvswitch/ofp-msgs.h
> @@ -526,6 +526,9 @@ enum ofpraw {
>  
>  /* NXST 1.0+ (4): struct nx_ipfix_stats_reply[]. */
>  OFPRAW_NXST_IPFIX_FLOW_REPLY,
> +
> +/* NXT 1.0+ (32): struct nx_ct_flush, uint8_t[8][]. */
> +OFPRAW_NXT_CT_FLUSH,
>  };
>  
>  /* Decoding messages into OFPRAW_* values. */
> @@ -772,6 +775,7 @@ enum ofptype {
>  OFPTYPE_IPFIX_FLOW_STATS_REQUEST, /* OFPRAW_NXST_IPFIX_FLOW_REQUEST */
>  OFPTYPE_IPFIX_FLOW_STATS_REPLY,   /* OFPRAW_NXST_IPFIX_FLOW_REPLY */
>  OFPTYPE_CT_FLUSH_ZONE,/* OFPRAW_NXT_CT_FLUSH_ZONE. */
> +OFPTYPE_CT_FLUSH,   /* OFPRAW_NXT_CT_FLUSH. */
>  
>  /* Flow monitor extension. */
>  OFPTYPE_FLOW_MONITOR_CANCEL,  /* OFPRAW_NXT_FLOW_MONITOR_CANCEL.
> diff --git a/include/openvswitch/ofp-util.h b/include/openvswitch/ofp-util.h
> index 84937ae26..e10d90b9f 100644
> --- a/include/openvswitch/ofp-util.h
> +++ b/include/openvswitch/ofp-util.h
> @@ -65,6 +65,10 @@ struct ofpbuf *ofputil_encode_echo_reply(const struct 
> ofp_header *);
>  
>  struct ofpbuf *ofputil_encode_barrier_request(enum ofp_version);
>  
> +struct ofpbuf *ofputil_ct_match_encode(const struct ofputil_ct_match *match,
> +   uint16_t *zone_id,
> +   enum ofp_version version);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git 

[ovs-dev] [PATCH] ovsdb-tool: clarify clustered-to-standalone help text

2022-12-15 Thread Dan Williams
Make it clear which of the DB args is the clusterd one.

Signed-off-by: Dan Williams 
---
 ovsdb/ovsdb-tool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c
index 60f353197bfd7..33719d38a4d09 100644
--- a/ovsdb/ovsdb-tool.c
+++ b/ovsdb/ovsdb-tool.c
@@ -187,7 +187,7 @@ usage(void)
"  compare-versions A OP B  compare OVSDB schema version numbers\n"
"  query [DB] TRNS execute read-only transaction on DB\n"
"  transact [DB] TRNS  execute read/write transaction on DB\n"
-   "  cluster-to-standalone DB DBConvert clustered DB to\n"
+   "  cluster-to-standalone DB clustered-DB  Convert clustered DB to\n"
"  standalone DB when cluster is down and cannot be\n"
"revived\n"
"  [-m]... show-log [DB]   print DB's log entries\n"
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4] netdev-dpdk: add control plane protection support

2022-12-15 Thread David Marchand
Coucou Robin,

On Mon, Dec 12, 2022 at 4:16 PM Robin Jarry  wrote:
>
> Some control protocols are used to maintain link status between
> forwarding engines (e.g. LACP). When the system is not sized properly,
> the PMD threads may not be able to process all incoming traffic from the
> configured Rx queues. When a signaling packet of such protocols is
> dropped, it can cause link flapping, worsening the situation.
>
> Use the RTE flow API to redirect these protocols into a dedicated Rx
> queue. The assumption is made that the ratio between control protocol
> traffic and user data traffic is very low and thus this dedicated Rx
> queue will never get full. The RSS redirection table is re-programmed to
> only use the other Rx queues. The RSS table size is stored in the
> netdev_dpdk structure at port initialization to avoid requesting the
> information again when changing the port configuration.
>
> The additional Rx queue will be assigned a PMD core like any other Rx
> queue. Polling that extra queue may introduce increased latency and
> a slight performance penalty at the benefit of preventing link flapping.
>
> This feature must be enabled per port on specific protocols via the
> cp-protection option. This option takes a coma-separated list of
> protocol names. It is only supported on ethernet ports.
>
> If the user has already configured multiple Rx queues on the port, an
> additional one will be allocated for control plane packets. If the
> hardware cannot satisfy the requested number of requested Rx queues, the
> last Rx queue will be assigned for control plane. If only one Rx queue
> is available, the cp-protection feature will be disabled. If the
> hardware does not support the RTE flow matchers/actions, the feature
> will be disabled.
>
> It cannot be enabled when other_config:hw-offload=true as it may
> conflict with the offloaded RTE flows. Similarly, if hw-offload is
> enabled while some ports already have cp-protection enabled, the RTE
> flow offloading will be disabled on these ports.
>
> Example use:
>
>  ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \
>set interface phy0 type=dpdk options:dpdk-devargs=:ca:00.0 -- \
>set interface phy0 options:cp-protection=lacp -- \
>set interface phy1 type=dpdk options:dpdk-devargs=:ca:00.1 -- \
>set interface phy1 options:cp-protection=lacp
>
> As a starting point, only one protocol is supported: LACP. Other
> protocols can be added in the future. NIC compatibility should be
> checked.
>
> To validate that this works as intended, I used a traffic generator to
> generate random traffic slightly above the machine capacity at line rate
> on a two ports bond interface. OVS is configured to receive traffic on
> two VLANs and pop/push them in a br-int bridge based on tags set on
> patch ports.
>
>+--+
>| DUT  |
>|++|
>||   br-int   || default flow, action=NORMAL
>||||
>|| patch10patch11 ||
>|+---|---|+|
>||   | |
>|+---|---|+|
>|| patch00patch01 ||
>||  tag:10tag:20  ||
>||||
>||   br-phy   || default flow, action=NORMAL
>||||
>||   bond0|| balance-slb, lacp=passive, lacp-time=fast
>||phy0   phy1 ||
>|+--|-|---+|
>+---|-|+
>| |
>+---|-|+
>| port0  port1 | balance L3/L4, lacp=active, lacp-time=fast
>| lag  | mode trunk VLANs 10, 20
>|  |
>|switch|
>|  |
>|  vlan 10vlan 20  |  mode access
>|   port2  port3   |
>+-|--|-+
>  |  |
>+-|--|-+
>|   port0  port1   |  Random traffic that is properly balanced
>|  |  across the bond ports in both directions.
>|  traffic generator   |
>+--+
>
> Without cp-protection, the bond0 links are randomly switching to
> "defaulted" when one of the LACP packets sent by the switch is dropped
> because the RX queues are full and the PMD threads did not process them
> fast enough. When that happens, all traffic must go through a single
> link which causes above line rate traffic to be dropped.
>
> When cp-protection is enabled, no LACP packet is dropped and the bond
> links remain enabled at all times, maximizing the throughput.
>
> This feature may be considered as "QoS". However, it does not work by
> limiting the rate of traffic explicitly. It only guarantees that some
> protocols have a lower chance of being dropped because the PMD cores
> cannot keep up with regular traffic.
>
> The choice of protocols is limited on purpose. This is not meant to be
> configurable by users. Some limited configurability could be considered
> in the future but it would 

Re: [ovs-dev] [PATCH ovn] controller: Restore MAC and vlan for DVR scenario

2022-12-15 Thread Jakub Libosvar
Thanks Dumitru! Is there any BZ/Jira ticket I could follow?

Thanks,
Kuba

On 12/12/22 7:41 AM, Dumitru Ceara wrote:
> On 12/5/22 17:07, Dumitru Ceara wrote:
>> On 9/30/22 12:43, Dumitru Ceara wrote:
>>> On 9/20/22 22:18, Mark Michelson wrote:
 Thanks Ales,

 Acked-by: Mark Michelson 

>>>
>>> I applied this to the main branch and backported it to all stable
>>> branches down to branch-22.03.
>>>
>>
>> Hi all,
>>
>> There was an internal request from within Red Hat (from Jakub in CC) to
>> backport this fix to branch-21.12 too.  If nobody is it I can take care
> 
> Oops, s/is it/is against it/
> 
> But I hope the intention of the question was clear.
> 
>> of porting the patch to 21.12.
>>
>> I'll wait a day or two to give people time to reply.
>>
> 
> I went ahead and applied this to branch 21.12 too.
> 
> Regards,
> Dumitru
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/2] ofp, dpif: Allow CT flush based on partial match

2022-12-15 Thread Paolo Valerio
Ales Musil  writes:

> On Thu, Dec 15, 2022 at 4:28 PM Paolo Valerio  wrote:
>
> Ales Musil  writes:
>
> > Currently, the CT can be flushed by dpctl only be specifying
> > the whole 5-tuple. This is not very convenient when there are
> > only some fields known to the user of CT flush. Add new struct
> > ofputil_ct_match which represents the generic filtering that can
> > be done for CT flush. The match is done only on fields that are
> > non-zero with exception to the icmp fields.
> >
> > This allows the filtering just within dpctl, however
> > it is a preparation for OpenFlow extension.
> >
> > Reported-at: https://bugzilla.redhat.com/2120546
> > Signed-off-by: Ales Musil 
> > ---
> > v3: Rebase on top of master.
> >     Address the C99 comment and missing dpif_close call.
> > v2: Rebase on top of master.
> >     Address comments from Paolo.
> > ---
> >  NEWS                           |   2 +
> >  include/openvswitch/ofp-util.h |  28 +++
> >  lib/automake.mk                |   2 +
> >  lib/ct-dpif.c                  | 201 +
> >  lib/ct-dpif.h                  |   4 +-
> >  lib/dpctl.c                    |  45 +++--
> >  lib/dpctl.man                  |   3 +-
> >  lib/ofp-ct-util.c              | 311 +
> >  lib/ofp-ct-util.h              |  34 
> >  tests/system-traffic.at        |  82 -
> >  10 files changed, 568 insertions(+), 144 deletions(-)
> >  create mode 100644 lib/ofp-ct-util.c
> >  create mode 100644 lib/ofp-ct-util.h
> >
> > diff --git a/NEWS b/NEWS
> > index 265375e1c..ff8904b02 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -14,6 +14,8 @@ Post-v3.0.0
> >       10 Gbps link speed by default in case the actual link speed cannot
> be
> >       determined.  Previously it was 10 Mbps.  Values can still be
> overridden
> >       by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
> > +   - ovs-dpctl and related ovs-appctl commands:
> > +     * "flush-conntrack" is capable of handling partial 5-tuple.
> > 
> > 
> >  v3.0.0 - 15 Aug 2022
> > diff --git a/include/openvswitch/ofp-util.h b/include/openvswitch/
> ofp-util.h
> > index 091a09cad..84937ae26 100644
> > --- a/include/openvswitch/ofp-util.h
> > +++ b/include/openvswitch/ofp-util.h
> > @@ -19,6 +19,9 @@
> > 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> > +
> >  #include "openvswitch/ofp-protocol.h"
> > 
> >  struct ofp_header;
> > @@ -27,6 +30,31 @@ struct ofp_header;
> >  extern "C" {
> >  #endif
> > 
> > +struct ofputil_ct_tuple {
> > +    struct in6_addr src;
> > +    struct in6_addr dst;
> > +
> > +    union {
> > +        ovs_be16 src_port;
> > +        ovs_be16 icmp_id;
> > +    };
> > +    union {
> > +        ovs_be16 dst_port;
> > +        struct {
> > +            uint8_t icmp_code;
> > +            uint8_t icmp_type;
> > +        };
> > +    };
> > +};
> > +
> > +struct ofputil_ct_match {
> > +    uint8_t ip_proto;
> > +    uint16_t l3_type;
> > +
> > +    struct ofputil_ct_tuple tuple_orig;
> > +    struct ofputil_ct_tuple tuple_reply;
> > +};
> > +
> >  bool ofputil_decode_hello(const struct ofp_header *,
> >                            uint32_t *allowed_versions);
> >  struct ofpbuf *ofputil_encode_hello(uint32_t version_bitmap);
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index a0fabe38f..37135f118 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -227,6 +227,8 @@ lib_libopenvswitch_la_SOURCES = \
> >       lib/ofp-actions.c \
> >       lib/ofp-bundle.c \
> >       lib/ofp-connection.c \
> > +     lib/ofp-ct-util.c \
> > +     lib/ofp-ct-util.h \
> >       lib/ofp-ed-props.c \
> >       lib/ofp-errors.c \
> >       lib/ofp-flow.c \
> > diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
> > index 6f17a26b5..906e827c1 100644
> > --- a/lib/ct-dpif.c
> > +++ b/lib/ct-dpif.c
> > @@ -20,6 +20,7 @@
> >  #include 
> > 
> >  #include "ct-dpif.h"
> > +#include "ofp-ct-util.h"
> >  #include "openvswitch/ofp-parse.h"
> >  #include "openvswitch/vlog.h"
> > 
> > @@ -71,6 +72,31 @@ ct_dpif_dump_start(struct dpif *dpif, struct
> ct_dpif_dump_state **dump,
> >      return err;
> >  }
> > 
> > +static void
> > +ct_dpif_tuple_from_ofputil_ct_tuple(const struct ofputil_ct_tuple
> *ofp_tuple,
> > +                                    struct ct_dpif_tuple *tuple,
> > +                                    uint16_t l3_type, uint8_t ip_proto)
> > +{
> > +    if (l3_type == AF_INET) {
> > +        tuple->src.ip = 

Re: [ovs-dev] [PATCH 0/6] AF_XDP build fixes and enhancements.

2022-12-15 Thread Frode Nordahl
On Tue, Dec 13, 2022 at 12:02 AM Frode Nordahl
 wrote:
>
> On Mon, Dec 12, 2022 at 5:25 PM Ilya Maximets  wrote:
> >
> > On 12/12/22 17:17, Frode Nordahl wrote:
> > > On Mon, Dec 12, 2022 at 2:55 PM Frode Nordahl
> > >  wrote:
> > >>
> > >> On Mon, Dec 12, 2022 at 12:20 PM Ilya Maximets  
> > >> wrote:
> > >>>
> > >>> On 12/12/22 09:40, Eelco Chaudron wrote:
> > 
> > 
> >  On 10 Dec 2022, at 3:16, Ilya Maximets wrote:
> > 
> > > This patch set allows OVS to build with libxdp and newer libbpf.
> > > It also enables AF_XDP support by default as long as all the
> > > dependencies are available.
> > 
> >  Hi Ilya,
> > 
> >  I did not yet review the patch, but I do not like enabling AF_XPD 
> >  support by default for the following reasons:
> > >>>
> > >>> Hi, Eelco.  Thanks for taking a look!
> > >>>
> > 
> >  - I still believe AF_XDP support in OVS has not had many field trials,
> >    so I would still classify this as an “experimental” feature.
> > >>>
> > >>> This patch set doesn't make the feature non-experimental.
> > >>> There is still a warning about that in the documentation.
> > >>> See the patch 3/6.
> > >>>
> > >>> In general, we do build a lot of experimental features by
> > >>> default and users can enable these features.  For example,
> > >>> some PMD management features, userspace TSO or AVX512 support.
> > >>>
> > >>> Enabling the build may increase the number of field trials
> > >>> performed by enthusiastic users, so should have some
> > >>> positive impact on development in general.  I've seen an
> > >>> increased interest in trying out this feature recently.
> > >>> Also, the feature doesn't impact existing users, they still
> > >>> need to explicitly create ports of afxdp type in order to
> > >>> use them.  The code is pretty much independent from other
> > >>> parts of OVS.
> > >>>
> >  - Reversing build option logic might cause problems for distributions> 
> >    (or at least require people to think, as now they have to explicitly 
> >  disable it).
> > >>>
> > >>> This is true for every other feature.  And I added an explicit
> > >>> disabling for Debian and Fedora packaging in the patch 3/6.
> > >>> In general, distributions should not blindly take every
> > >>> new release without checking the NEWS anyway.
> > >>
> > >> fwiw; we have interest in AF_XDP too and I'm currently evaluating if
> > >> we have the bits aligned to enable this by default in Debian too.
> > >>
> > >> One challenge is that in Debian libbpf 0.8 is a compatibility package
> > >> and the development package is only available as a 1.x, and at the
> > >> same time xdp-tools is linked with libbpf 0.8. So when building OVS it
> > >> will try to link with both libbpf 0.8 via libxdp and libbpf 1.x, which
> > >> I think may be preventing this from working right now.
> > >>
> > >> That is our problem to sort though, and we have no issue with having
> > >> this important feature be enabled by default in the upstream build
> > >> system.
> > >
> > > ftr; This was already sorted in Debian, I was using an out of date system.
> > >
> > > With that fixed, it does appear that OVS or xdp-tools may need an
> > > update to work with libbpf 1.x:
> > > 2022-12-12T16:01:32.800Z|00067|netdev_afxdp|WARN|libbpf: prog
> > > 'xdp_dispatcher': missing BPF prog type, check ELF section name
> > > 'xdp/dispatcher'
> > > 2022-12-12T16:01:32.800Z|00068|netdev_afxdp|WARN|libbpf: prog
> > > 'xdp_dispatcher': failed to load: -22
> > >
> >
> > Hmm.  OVS doesn't load xdp programs on its own, it relies on libbpf
> > to load a default program.
>
> Yes, I believe libxdp is at fault here and not this series nor OVS.
>
> > BTW, are you testing with the second patch of this patch set applied?
> > Otherwise, OVS will use some deprecated APIs that might be broken.
>
> WIth the whole set, It works fine on a pristine Fedora 37 system with
> libbpf 0.8, but on for example Debian with libbpf 1.0.1. I've
> confirmed it works also on Debian with libbpf 0.8.
>
> I suspect xdp-tools needs to be updated along the lines of [0] after
> libbpf was updated in [1].
>
> 0: 
> https://lore.kernel.org/lkml/tencent_7dd02046a8398be3324f85e0f56ed41eb...@qq.com/T/
> 1: 
> https://github.com/torvalds/linux/commit/450b167fb9be91a8164d3f3d734674f5fe95b22d

An update on this, with the new v1.2.9 release of xdp-tools [2] this
is now resolved and we just put this series into a Open vSwitch 3.1.0
git snapshot in Debian experimental [3].

Cheers!

2: https://github.com/xdp-project/xdp-tools/issues/254
3: https://buildd.debian.org/status/package.php?p=openvswitch=experimental

-- 
Frode Nordahl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/2] ofp, dpif: Allow CT flush based on partial match

2022-12-15 Thread Ales Musil
On Thu, Dec 15, 2022 at 4:28 PM Paolo Valerio  wrote:

> Ales Musil  writes:
>
> > Currently, the CT can be flushed by dpctl only be specifying
> > the whole 5-tuple. This is not very convenient when there are
> > only some fields known to the user of CT flush. Add new struct
> > ofputil_ct_match which represents the generic filtering that can
> > be done for CT flush. The match is done only on fields that are
> > non-zero with exception to the icmp fields.
> >
> > This allows the filtering just within dpctl, however
> > it is a preparation for OpenFlow extension.
> >
> > Reported-at: https://bugzilla.redhat.com/2120546
> > Signed-off-by: Ales Musil 
> > ---
> > v3: Rebase on top of master.
> > Address the C99 comment and missing dpif_close call.
> > v2: Rebase on top of master.
> > Address comments from Paolo.
> > ---
> >  NEWS   |   2 +
> >  include/openvswitch/ofp-util.h |  28 +++
> >  lib/automake.mk|   2 +
> >  lib/ct-dpif.c  | 201 +
> >  lib/ct-dpif.h  |   4 +-
> >  lib/dpctl.c|  45 +++--
> >  lib/dpctl.man  |   3 +-
> >  lib/ofp-ct-util.c  | 311 +
> >  lib/ofp-ct-util.h  |  34 
> >  tests/system-traffic.at|  82 -
> >  10 files changed, 568 insertions(+), 144 deletions(-)
> >  create mode 100644 lib/ofp-ct-util.c
> >  create mode 100644 lib/ofp-ct-util.h
> >
> > diff --git a/NEWS b/NEWS
> > index 265375e1c..ff8904b02 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -14,6 +14,8 @@ Post-v3.0.0
> >   10 Gbps link speed by default in case the actual link speed cannot
> be
> >   determined.  Previously it was 10 Mbps.  Values can still be
> overridden
> >   by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
> > +   - ovs-dpctl and related ovs-appctl commands:
> > + * "flush-conntrack" is capable of handling partial 5-tuple.
> >
> >
> >  v3.0.0 - 15 Aug 2022
> > diff --git a/include/openvswitch/ofp-util.h
> b/include/openvswitch/ofp-util.h
> > index 091a09cad..84937ae26 100644
> > --- a/include/openvswitch/ofp-util.h
> > +++ b/include/openvswitch/ofp-util.h
> > @@ -19,6 +19,9 @@
> >
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> > +
> >  #include "openvswitch/ofp-protocol.h"
> >
> >  struct ofp_header;
> > @@ -27,6 +30,31 @@ struct ofp_header;
> >  extern "C" {
> >  #endif
> >
> > +struct ofputil_ct_tuple {
> > +struct in6_addr src;
> > +struct in6_addr dst;
> > +
> > +union {
> > +ovs_be16 src_port;
> > +ovs_be16 icmp_id;
> > +};
> > +union {
> > +ovs_be16 dst_port;
> > +struct {
> > +uint8_t icmp_code;
> > +uint8_t icmp_type;
> > +};
> > +};
> > +};
> > +
> > +struct ofputil_ct_match {
> > +uint8_t ip_proto;
> > +uint16_t l3_type;
> > +
> > +struct ofputil_ct_tuple tuple_orig;
> > +struct ofputil_ct_tuple tuple_reply;
> > +};
> > +
> >  bool ofputil_decode_hello(const struct ofp_header *,
> >uint32_t *allowed_versions);
> >  struct ofpbuf *ofputil_encode_hello(uint32_t version_bitmap);
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index a0fabe38f..37135f118 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -227,6 +227,8 @@ lib_libopenvswitch_la_SOURCES = \
> >   lib/ofp-actions.c \
> >   lib/ofp-bundle.c \
> >   lib/ofp-connection.c \
> > + lib/ofp-ct-util.c \
> > + lib/ofp-ct-util.h \
> >   lib/ofp-ed-props.c \
> >   lib/ofp-errors.c \
> >   lib/ofp-flow.c \
> > diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
> > index 6f17a26b5..906e827c1 100644
> > --- a/lib/ct-dpif.c
> > +++ b/lib/ct-dpif.c
> > @@ -20,6 +20,7 @@
> >  #include 
> >
> >  #include "ct-dpif.h"
> > +#include "ofp-ct-util.h"
> >  #include "openvswitch/ofp-parse.h"
> >  #include "openvswitch/vlog.h"
> >
> > @@ -71,6 +72,31 @@ ct_dpif_dump_start(struct dpif *dpif, struct
> ct_dpif_dump_state **dump,
> >  return err;
> >  }
> >
> > +static void
> > +ct_dpif_tuple_from_ofputil_ct_tuple(const struct ofputil_ct_tuple
> *ofp_tuple,
> > +struct ct_dpif_tuple *tuple,
> > +uint16_t l3_type, uint8_t ip_proto)
> > +{
> > +if (l3_type == AF_INET) {
> > +tuple->src.ip = in6_addr_get_mapped_ipv4(_tuple->src);
> > +tuple->dst.ip = in6_addr_get_mapped_ipv4(_tuple->dst);
> > +} else {
> > +tuple->src.in6 = ofp_tuple->src;
> > +tuple->dst.in6 = ofp_tuple->dst;
> > +}
> > +
> > +tuple->l3_type = l3_type;
> > +tuple->ip_proto = ip_proto;
> > +tuple->src_port = ofp_tuple->src_port;
> > +
> > +if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
> > +tuple->icmp_code = ofp_tuple->icmp_code;
> > +tuple->icmp_type = ofp_tuple->icmp_type;
> > +} else {
> > +

Re: [ovs-dev] [PATCH v3 1/2] ofp, dpif: Allow CT flush based on partial match

2022-12-15 Thread Paolo Valerio
Ales Musil  writes:

> Currently, the CT can be flushed by dpctl only be specifying
> the whole 5-tuple. This is not very convenient when there are
> only some fields known to the user of CT flush. Add new struct
> ofputil_ct_match which represents the generic filtering that can
> be done for CT flush. The match is done only on fields that are
> non-zero with exception to the icmp fields.
>
> This allows the filtering just within dpctl, however
> it is a preparation for OpenFlow extension.
>
> Reported-at: https://bugzilla.redhat.com/2120546
> Signed-off-by: Ales Musil 
> ---
> v3: Rebase on top of master.
> Address the C99 comment and missing dpif_close call.
> v2: Rebase on top of master.
> Address comments from Paolo.
> ---
>  NEWS   |   2 +
>  include/openvswitch/ofp-util.h |  28 +++
>  lib/automake.mk|   2 +
>  lib/ct-dpif.c  | 201 +
>  lib/ct-dpif.h  |   4 +-
>  lib/dpctl.c|  45 +++--
>  lib/dpctl.man  |   3 +-
>  lib/ofp-ct-util.c  | 311 +
>  lib/ofp-ct-util.h  |  34 
>  tests/system-traffic.at|  82 -
>  10 files changed, 568 insertions(+), 144 deletions(-)
>  create mode 100644 lib/ofp-ct-util.c
>  create mode 100644 lib/ofp-ct-util.h
>
> diff --git a/NEWS b/NEWS
> index 265375e1c..ff8904b02 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -14,6 +14,8 @@ Post-v3.0.0
>   10 Gbps link speed by default in case the actual link speed cannot be
>   determined.  Previously it was 10 Mbps.  Values can still be overridden
>   by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
> +   - ovs-dpctl and related ovs-appctl commands:
> + * "flush-conntrack" is capable of handling partial 5-tuple.
>  
>  
>  v3.0.0 - 15 Aug 2022
> diff --git a/include/openvswitch/ofp-util.h b/include/openvswitch/ofp-util.h
> index 091a09cad..84937ae26 100644
> --- a/include/openvswitch/ofp-util.h
> +++ b/include/openvswitch/ofp-util.h
> @@ -19,6 +19,9 @@
>  
>  #include 
>  #include 
> +#include 
> +#include 
> +
>  #include "openvswitch/ofp-protocol.h"
>  
>  struct ofp_header;
> @@ -27,6 +30,31 @@ struct ofp_header;
>  extern "C" {
>  #endif
>  
> +struct ofputil_ct_tuple {
> +struct in6_addr src;
> +struct in6_addr dst;
> +
> +union {
> +ovs_be16 src_port;
> +ovs_be16 icmp_id;
> +};
> +union {
> +ovs_be16 dst_port;
> +struct {
> +uint8_t icmp_code;
> +uint8_t icmp_type;
> +};
> +};
> +};
> +
> +struct ofputil_ct_match {
> +uint8_t ip_proto;
> +uint16_t l3_type;
> +
> +struct ofputil_ct_tuple tuple_orig;
> +struct ofputil_ct_tuple tuple_reply;
> +};
> +
>  bool ofputil_decode_hello(const struct ofp_header *,
>uint32_t *allowed_versions);
>  struct ofpbuf *ofputil_encode_hello(uint32_t version_bitmap);
> diff --git a/lib/automake.mk b/lib/automake.mk
> index a0fabe38f..37135f118 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -227,6 +227,8 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/ofp-actions.c \
>   lib/ofp-bundle.c \
>   lib/ofp-connection.c \
> + lib/ofp-ct-util.c \
> + lib/ofp-ct-util.h \
>   lib/ofp-ed-props.c \
>   lib/ofp-errors.c \
>   lib/ofp-flow.c \
> diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
> index 6f17a26b5..906e827c1 100644
> --- a/lib/ct-dpif.c
> +++ b/lib/ct-dpif.c
> @@ -20,6 +20,7 @@
>  #include 
>  
>  #include "ct-dpif.h"
> +#include "ofp-ct-util.h"
>  #include "openvswitch/ofp-parse.h"
>  #include "openvswitch/vlog.h"
>  
> @@ -71,6 +72,31 @@ ct_dpif_dump_start(struct dpif *dpif, struct 
> ct_dpif_dump_state **dump,
>  return err;
>  }
>  
> +static void
> +ct_dpif_tuple_from_ofputil_ct_tuple(const struct ofputil_ct_tuple *ofp_tuple,
> +struct ct_dpif_tuple *tuple,
> +uint16_t l3_type, uint8_t ip_proto)
> +{
> +if (l3_type == AF_INET) {
> +tuple->src.ip = in6_addr_get_mapped_ipv4(_tuple->src);
> +tuple->dst.ip = in6_addr_get_mapped_ipv4(_tuple->dst);
> +} else {
> +tuple->src.in6 = ofp_tuple->src;
> +tuple->dst.in6 = ofp_tuple->dst;
> +}
> +
> +tuple->l3_type = l3_type;
> +tuple->ip_proto = ip_proto;
> +tuple->src_port = ofp_tuple->src_port;
> +
> +if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
> +tuple->icmp_code = ofp_tuple->icmp_code;
> +tuple->icmp_type = ofp_tuple->icmp_type;
> +} else {
> +tuple->dst_port = ofp_tuple->dst_port;
> +}
> +}
> +
>  /* Dump one connection from a tracker, and put it in 'entry'.
>   *
>   * 'dump' should have been initialized by ct_dpif_dump_start().
> @@ -100,7 +126,62 @@ ct_dpif_dump_done(struct ct_dpif_dump_state *dump)
>  ? dpif->dpif_class->ct_dump_done(dpif, dump)
>  : 

[ovs-dev] [PATCH net v3] openvswitch: Fix flow lookup to use unmasked key

2022-12-15 Thread Eelco Chaudron
The commit mentioned below causes the ovs_flow_tbl_lookup() function
to be called with the masked key. However, it's supposed to be called
with the unmasked key. This due to the fact that the datapath supports
installing wider flows, and OVS relies on this behavior. For example
if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider
flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/
128.0.0.0) is allowed to be added.

However, if we try to add a wildcard rule, the installation fails:

$ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
  ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2
$ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
  ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2
ovs-vswitchd: updating flow table (File exists)

The reason is that the key used to determine if the flow is already
present in the system uses the original key ANDed with the mask.
This results in the IP address not being part of the (miniflow) key,
i.e., being substituted with an all-zero value. When doing the actual
lookup, this results in the key wrongfully matching the first flow,
and therefore the flow does not get installed.

This change reverses the commit below, but rather than having the key
on the stack, it's allocated.

Fixes: 190aa3e77880 ("openvswitch: Fix Frame-size larger than 1024 bytes 
warning.")

Signed-off-by: Eelco Chaudron 
---
Version history:
  v3: Updated commit message to explain the problem in more details.
  v2: Fixed ENOMEN error :( Forgot to do a stg refresh.

net/openvswitch/datapath.c |   25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 861dfb8daf4a..55b697c4d576 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -948,6 +948,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
struct sw_flow_mask mask;
struct sk_buff *reply;
struct datapath *dp;
+   struct sw_flow_key *key;
struct sw_flow_actions *acts;
struct sw_flow_match match;
u32 ufid_flags = ovs_nla_get_ufid_flags(a[OVS_FLOW_ATTR_UFID_FLAGS]);
@@ -975,24 +976,26 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
}
 
/* Extract key. */
-   ovs_match_init(, _flow->key, false, );
+   key = kzalloc(sizeof(*key), GFP_KERNEL);
+   if (!key) {
+   error = -ENOMEM;
+   goto err_kfree_key;
+   }
+
+   ovs_match_init(, key, false, );
error = ovs_nla_get_match(net, , a[OVS_FLOW_ATTR_KEY],
  a[OVS_FLOW_ATTR_MASK], log);
if (error)
goto err_kfree_flow;
 
+   ovs_flow_mask_key(_flow->key, key, true, );
+
/* Extract flow identifier. */
error = ovs_nla_get_identifier(_flow->id, a[OVS_FLOW_ATTR_UFID],
-  _flow->key, log);
+  key, log);
if (error)
goto err_kfree_flow;
 
-   /* unmasked key is needed to match when ufid is not used. */
-   if (ovs_identifier_is_key(_flow->id))
-   match.key = new_flow->id.unmasked_key;
-
-   ovs_flow_mask_key(_flow->key, _flow->key, true, );
-
/* Validate actions. */
error = ovs_nla_copy_actions(net, a[OVS_FLOW_ATTR_ACTIONS],
 _flow->key, , log);
@@ -1019,7 +1022,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
if (ovs_identifier_is_ufid(_flow->id))
flow = ovs_flow_tbl_lookup_ufid(>table, _flow->id);
if (!flow)
-   flow = ovs_flow_tbl_lookup(>table, _flow->key);
+   flow = ovs_flow_tbl_lookup(>table, key);
if (likely(!flow)) {
rcu_assign_pointer(new_flow->sf_acts, acts);
 
@@ -1089,6 +1092,8 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
 
if (reply)
ovs_notify(_flow_genl_family, reply, info);
+
+   kfree(key);
return 0;
 
 err_unlock_ovs:
@@ -1098,6 +1103,8 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
ovs_nla_free_flow_actions(acts);
 err_kfree_flow:
ovs_flow_free(new_flow, false);
+err_kfree_key:
+   kfree(key);
 error:
return error;
 }

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] rhel: move conf.db to /var/lib/openvswitch, using symlinks

2022-12-15 Thread Timothy Redaelli
On Tue, 13 Dec 2022 12:57:08 +0100
Ilya Maximets  wrote:

> On 12/5/22 15:36, Ilya Maximets wrote:
> > On 12/4/22 09:23, Roi Dayan wrote:
> >>
> >>
> >> On 30/11/2022 17:55, Ilya Maximets wrote:
> >>> On 11/14/22 20:41, Timothy Redaelli wrote:
>  conf.db is by default at /etc/openvswitch, but it should be at
>  /var/lib/openvswitch like on Debian or like ovnnb_db.db and ovnsb_db.db.
> 
>  If conf.db already exists in /etc/openvswitch then it's moved to
>  /var/lib/openvswitch.
>  Symlinks are created for conf.db and .conf.db.~lock~ into 
>  /etc/openvswitch
>  for backward compatibility.
> 
>  Reported-at: 
>  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.redhat.com%2F1830857data=05%7C01%7Croid%40nvidia.com%7Cd69116141ff645fc2c7308dad2eb4612%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638054205222362304%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=%2BIcIVZBKrfhIpq%2B6r6I3QvjdZ9KvjLsrRSlvi9kFHzc%3Dreserved=0
>  Reported-by: Yedidyah Bar David 
>  Signed-off-by: Timothy Redaelli 
>  ---
>  v1 -> v2:
>  - Use hugetlbfs group instead of openvswitch when the package is built
>    with dpdk (as reported by Flavio)
>  ---
>   rhel/openvswitch-fedora.spec.in | 27 +++
>   1 file changed, 23 insertions(+), 4 deletions(-)
> >>>
> >>> If that works for Fedora, then LGTM.  Applied.
> >>>
> >>> Thanks!
> >>> Best regards, Ilya Maximets.
> >>> ___
> >>> dev mailing list
> >>> d...@openvswitch.org
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-devdata=05%7C01%7Croid%40nvidia.com%7Cd69116141ff645fc2c7308dad2eb4612%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638054205222362304%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=fZZh4iYeUu%2BL2%2F%2FWTIgPNzpvfhpe%2F9MANkVPLmv57aY%3Dreserved=0
> >>
> >>
> >> hi,
> >>
> >> This commit expose some kind of issue and cause openvswitch not
> >> to start on clean systems.
> >>
> >> If old conf.db file didn't exists it creates an empty conf.db with
> >> the touch command.
> >> Empty conf.db cause ovsdb-server not to start.
> >>
> >> #  /usr/share/openvswitch/scripts/ovs-ctl start
> >> ovsdb-tool: ovsdb error: /etc/openvswitch/conf.db: cannot identify file 
> >> type
> >> Starting ovsdb-server ovsdb-server: ovsdb error: /etc/openvswitch/conf.db: 
> >> cannot identify file type
> >>[FAILED]
> >>
> >> If I remove the conf.db file (can leave the symbolic link in /etc)
> >> then ovs starts fine.
> >> # rm /var/lib/openvswitch/conf.db
> >> #  /usr/share/openvswitch/scripts/ovs-ctl start
> >> /etc/openvswitch/conf.db does not exist ... (warning).
> >> Creating empty database /etc/openvswitch/conf.db   [  OK  ]
> >> Starting ovsdb-server  [  OK  ]
> >> system ID not configured, please use --system-id ... failed!
> >> Configuring Open vSwitch system IDs[  OK  ]
> >> Starting ovs-vswitchd  [  OK  ]
> >> Enabling remote OVSDB managers [  OK  ]
> >>
> >>
> >> I'm not sure where it's better to fix this. either the spec here
> >> not to create an empty file or in ovsdb/log.c to an accept empty conf.db,
> >> or maybe even upgrade_db() in ovs-lib bash file to call create_db
> >> even if conf.db exists but it's empty.
> > 
> > Thanks, Roi, for the report!
> > I think, fixing the spec should be the right approach here.
> 
> Hi, Timothy.  Do you plan to work on the fix for this issue?

Yes sure, I'll do that today or tomorrow

> Otherwise we may just revert the change for now until the proper
> fix is available.  Thoughts?

I prefer just to fix it, without the revert, if it's possible

> Best regards, Ilya Maximets.
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net v2] openvswitch: Fix flow lookup to use unmasked key

2022-12-15 Thread Eelco Chaudron



On 15 Dec 2022, at 12:45, Ilya Maximets wrote:

> On 12/14/22 17:33, Eelco Chaudron wrote:
>> The commit mentioned below causes the ovs_flow_tbl_lookup() function
>> to be called with the masked key. However, it's supposed to be called
>> with the unmasked key.
>
> Hi, Eelco.  Thanks for the fix!
>
> Could you, please, add more information to the commit message on
> why this is a problem, with some examples?  This will be useful
> for someone in the future trying to understand why we actually
> have to use an unmasked key here.

ACK will add the same description as in the OVS dpif-netdev commit.

> Also, I suppose, 'Cc: sta...@vger.kernel.org' tag is needed in the
> commit message since it's a fix for a bug that is actually impacts
> users and needs to be backported.

Will do in the v3.

> Best regards, Ilya Maximets.
>
>>
>> This change reverses the commit below, but rather than having the key
>> on the stack, it's allocated.
>>
>> Fixes: 190aa3e77880 ("openvswitch: Fix Frame-size larger than 1024 bytes 
>> warning.")
>>
>> Signed-off-by: Eelco Chaudron 
>>
>> ---
>> Version history:
>>  - v2: Fixed ENOME(N/M) error. Forgot to do a stg refresh.
>>
>>  net/openvswitch/datapath.c |   25 -
>>  1 file changed, 16 insertions(+), 9 deletions(-)

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 00/12] Fixes to multiple Unit and System Tests

2022-12-15 Thread Numan Siddique
On Tue, Dec 13, 2022 at 10:07 AM Xavier Simonart  wrote:
>
> Hi Mark, Ales
>
> Thanks for your review and feedback.
> I'll send v2 for patch 4 and 7, and comments on patch 10.
>
> Thanks
> Xavier
>
> On Mon, Dec 12, 2022 at 10:33 PM Mark Michelson  wrote:
>
> > Hi Xavier. I did a review of this series.
> >
> > For patches 1, 2, 3, 5, 6, 8, 9, 11, and 12:
> >
> > Acked-by: Mark Michelson 

Thanks.  I applied the patches - 1, 2, 3, 5, 6, 8, 9, 11, and 12 to
the main branch.

Numan

> >
> > For patches 4, 7, and 10 I have some comments.
> >
> > On 12/2/22 08:51, Xavier Simonart wrote:
> > > Xavier Simonart (12):
> > >tests: Fixed flaky system tests "ACL reject" and "ACL after lb -
> > >  reject"
> > >tests: Fixed flaky system-test ACL log_related
> > >tests: Fixed system-test "load balancing affinity sessions"
> > >tests: Fixed load balancing system-tests
> > >tests: Fixed typo in macros
> > >tests: Fixed flaky system-test "ECMP symmetric reply"
> > >tests: Fixed flaky ACL fair Meters
> > >tests: Fixed typo in "incremetal processing" test
> > >tests: Removed macro reset_iface_pcap_file
> > >tests: Fixed some tests failing on (very) slow systems
> > >tests: Fixed "vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS"
> > >tests: Fixed "IPv6 periodic RA" and "snat-ct-zone with common NAT
> > >  zone"
> > >
> > >   tests/atlocal.in  |   3 +
> > >   tests/network-functions.at|  18 +-
> > >   tests/ovn-macros.at   |  21 --
> > >   tests/ovn-northd.at   |   7 +-
> > >   tests/ovn-performance.at  |   2 +-
> > >   tests/ovn.at  |  54 ++--
> > >   tests/system-common-macros.at |   2 +-
> > >   tests/system-kmod-macros.at   |   1 -
> > >   tests/system-ovn.at   | 456 +++---
> > >   9 files changed, 317 insertions(+), 247 deletions(-)
> > >
> >
> >
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v8] odp-execute: Add ISA implementation of set_masked IPv6 action

2022-12-15 Thread Phelan, Michael


> -Original Message-
> From: dev  On Behalf Of Eelco Chaudron
> Sent: Wednesday 14 December 2022 11:17
> To: Finn, Emma 
> Cc: d...@openvswitch.org; i.maxim...@ovn.org
> Subject: Re: [ovs-dev] [v8] odp-execute: Add ISA implementation of set_masked
> IPv6 action
> 
> 
> 
> On 8 Dec 2022, at 17:01, Emma Finn wrote:
> 
> > This commit adds support for the AVX512 implementation of the
> > ipv6_set_addrs action as well as an AVX512 implementation of updating
> > the L4 checksums.
> >
> > Here are some relative performance numbers for this patch:
> > +-++
> > | Actions | AVX with patch |
> > +-++
> > | ipv6_src| 1.14x  |
> > +-++
> > | ipv6_src + ipv6_dst | 1.40x  |
> > +-++
> > | ipv6_label  | 1.14x  |
> > +-++
> > | mod_ipv6 4 x field  | 1.43x  |
> > +-++
> >
> > Signed-off-by: Emma Finn 
> >
> > ---
> > v8:
> >   - Added check for L4 header length.
> > v7:
> >   - Added clearing of connection tracking fields.
> > v6:
> >   - Added check for ipv6 extension headers.
> > v5:
> >   - Fixed load for ip6 src and dst mask for checksum check.
> > v4:
> >   - Reworked and moved check for checksum outside loop.
> >   - Code cleanup based on review from Eelco.
> > v3:
> >   - Added a runtime check for AVX512 vbmi.
> > v2:
> >   - Added check for availbility of s6_addr32 field of struct in6_addr.
> >   - Fixed network headers for freebsd builds.
> > ---
> 
> Thanks for following this through! The changes look good to me, and found an
> AVX machine to run some tests :)
> 
> Acked-by: Eelco Chaudron 
Hi Emma,
Saw some failing tests on the Intel CI, this was related to an issue on our 
side where the NICs were unbound from DPDK resulting in failures on some of the 
OVS DPDK unit tests testing physical ports.

I reran the failed tests and they were all fine so the failures weren't patch 
related.

Thanks,
Michael.
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] netdev-offload-tc: Preserve tc statistics when flow gets modified.

2022-12-15 Thread Eelco Chaudron
When a flow gets modified, i.e. the actions are changes, the tc layer will
remove, and re-add the flow. This is causing all the counters to be reset.

This patch will remember the previous tc counters and adjust any requests
for statistics. This is done in a similar way as the rte_flow implementation.

It also updates the check_pkt_len tc test to purge the flows, so we do
not use updated tc flows, with their counters.

Signed-off-by: Eelco Chaudron 
---
Please note that for now two copies of the test case exists, but I will clean
this up once this gets applied by submitting a new revision of the
'tests: Add system-traffic.at tests to check-offloads' series.

v2: Do not update the stats->used, as in terse dump they should be 0.

 lib/netdev-offload-tc.c  |   92 --
 lib/tc.h |1 
 tests/system-offloads-traffic.at |   65 +--
 tests/system-traffic.at  |   64 ++
 4 files changed, 201 insertions(+), 21 deletions(-)

diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index ce7f8ad97..03c25fc38 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -97,6 +97,12 @@ static int netdev_tc_parse_nl_actions(struct netdev *netdev,
   bool *recirc_act, bool more_actions,
   struct tc_action **need_jump_update);
 
+static void parse_tc_flower_to_stats(struct tc_flower *flower,
+ struct dpif_flow_stats *stats);
+
+static int get_ufid_adjust_stats(const ovs_u128 *ufid,
+ struct dpif_flow_stats *stats);
+
 static bool
 is_internal_port(const char *type)
 {
@@ -193,6 +199,9 @@ static struct ovs_mutex ufid_lock = OVS_MUTEX_INITIALIZER;
  * @ufid: ufid assigned to the flow
  * @id: tc filter id (tcf_id)
  * @netdev: netdev associated with the tc rule
+ * @adjust_stats: When flow gets updated with new actions, we need to adjust
+ *the reported stats to include previous values as the hardware
+ *rule is removed and re-added. This stats copy is used for it.
  */
 struct ufid_tc_data {
 struct hmap_node ufid_to_tc_node;
@@ -200,6 +209,7 @@ struct ufid_tc_data {
 ovs_u128 ufid;
 struct tcf_id id;
 struct netdev *netdev;
+struct dpif_flow_stats adjust_stats;
 };
 
 static void
@@ -233,12 +243,36 @@ del_ufid_tc_mapping(const ovs_u128 *ufid)
 ovs_mutex_unlock(_lock);
 }
 
+static void
+netdev_tc_adjust_stats(struct dpif_flow_stats *stats,
+   struct dpif_flow_stats *adjust_stats)
+{
+/* Do not try to restore the stats->used, as in terse
+ * mode dumps we will always report them as 0. */
+stats->n_bytes += adjust_stats->n_bytes;
+stats->n_packets += adjust_stats->n_packets;
+}
+
 /* Wrapper function to delete filter and ufid tc mapping */
 static int
-del_filter_and_ufid_mapping(struct tcf_id *id, const ovs_u128 *ufid)
+del_filter_and_ufid_mapping(struct tcf_id *id, const ovs_u128 *ufid,
+struct dpif_flow_stats *stats)
 {
+struct tc_flower flower;
 int err;
 
+if (stats) {
+memset(stats, 0, sizeof *stats);
+if (!tc_get_flower(id, )) {
+struct dpif_flow_stats adjust_stats;
+
+parse_tc_flower_to_stats(, stats);
+if (!get_ufid_adjust_stats(ufid, _stats)) {
+netdev_tc_adjust_stats(stats, _stats);
+}
+}
+}
+
 err = tc_del_filter(id);
 if (!err) {
 del_ufid_tc_mapping(ufid);
@@ -249,7 +283,7 @@ del_filter_and_ufid_mapping(struct tcf_id *id, const 
ovs_u128 *ufid)
 /* Add ufid entry to ufid_to_tc hashmap. */
 static void
 add_ufid_tc_mapping(struct netdev *netdev, const ovs_u128 *ufid,
-struct tcf_id *id)
+struct tcf_id *id, struct dpif_flow_stats *stats)
 {
 struct ufid_tc_data *new_data = xzalloc(sizeof *new_data);
 size_t ufid_hash = hash_bytes(ufid, sizeof *ufid, 0);
@@ -261,6 +295,9 @@ add_ufid_tc_mapping(struct netdev *netdev, const ovs_u128 
*ufid,
 new_data->ufid = *ufid;
 new_data->id = *id;
 new_data->netdev = netdev_ref(netdev);
+if (stats) {
+new_data->adjust_stats = *stats;
+}
 
 ovs_mutex_lock(_lock);
 hmap_insert(_to_tc, _data->ufid_to_tc_node, ufid_hash);
@@ -292,6 +329,25 @@ get_ufid_tc_mapping(const ovs_u128 *ufid, struct tcf_id 
*id)
 return ENOENT;
 }
 
+static int
+get_ufid_adjust_stats(const ovs_u128 *ufid, struct dpif_flow_stats *stats)
+{
+size_t ufid_hash = hash_bytes(ufid, sizeof *ufid, 0);
+struct ufid_tc_data *data;
+
+ovs_mutex_lock(_lock);
+HMAP_FOR_EACH_WITH_HASH (data, ufid_to_tc_node, ufid_hash, _to_tc) {
+if (ovs_u128_equals(*ufid, data->ufid)) {
+*stats = data->adjust_stats;
+ovs_mutex_unlock(_lock);
+return 0;
+}
+}
+

Re: [ovs-dev] [PATCH ovn] ovn-controller: Only set monitor conditions on available tables.

2022-12-15 Thread Dumitru Ceara
On 12/14/22 18:07, Dumitru Ceara wrote:
> On 12/14/22 16:40, Mark Michelson wrote:
>> On 12/14/22 10:30, Dumitru Ceara wrote:
>>> On 12/14/22 16:28, Mark Michelson wrote:
 This looks good to me. Thanks Dumitru!

 Acked-by: Mark Michelson 

>>>
>>> Thanks!
>>>
 Just to double-check, this is a candidate for main and branch-22.12
 only, right?
>>>
>>> That's what I thought initially.  That's because on branches < 22.12 we
>>> didn't add new tables that need conditional monitoring.
>>>
>>> Just to be on the safe side we could do a custom backport for other
>>> branches.  What do you think?
>>
>> Sure, I suppose it can't hurt.
>>
> 
> I pushed this to the main branch and 22.12 for now.  I'm working on
> backporting it down to 22.03.
> 

Now it's backported to all stable branches down to and including 22.03.

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] Revert "rhel: Move conf.db to /var/lib/openvswitch, using symlinks."

2022-12-15 Thread Ilya Maximets
This reverts commit 59e8cb8a053d50f49629be8b6fd614562d066404.

Commit broke the package install on a clean system and also doesn't
seem to manage access rights for created symlinks correctly.

Revert it until a proper solution is proposed.

Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-dev/2022-December/400045.html
Reported-by: Roi Dayan 
Signed-off-by: Ilya Maximets 
---
 rhel/openvswitch-fedora.spec.in | 27 ---
 1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 4a3e6294b..de6cb30ea 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -238,6 +238,8 @@ rm -rf $RPM_BUILD_ROOT/%{_datadir}/openvswitch/python/
 
 install -d -m 0755 $RPM_BUILD_ROOT/%{_sharedstatedir}/openvswitch
 
+touch $RPM_BUILD_ROOT%{_sysconfdir}/openvswitch/conf.db
+touch $RPM_BUILD_ROOT%{_sysconfdir}/openvswitch/.conf.db.~lock~
 touch $RPM_BUILD_ROOT%{_sysconfdir}/openvswitch/system-id.conf
 
 install -p -m 644 -D selinux/openvswitch-custom.pp \
@@ -326,27 +328,6 @@ if [ $1 -eq 1 ]; then
 fi
 %endif
 
-# Ensure that /etc/openvswitch/conf.db links to /var/lib/openvswitch,
-# moving an existing file if there is one.
-#
-# Ditto for .conf.db.~lock~.
-for base in conf.db .conf.db.~lock~; do
-new=/var/lib/openvswitch/$base
-old=/etc/openvswitch/$base
-if test -f $old && test ! -e $new; then
-mv $old $new
-fi
-if test ! -e $old && test ! -h $old; then
-ln -s $new $old
-fi
-touch $new
-%if %{with dpdk}
-chown openvswitch:hugetlbfs $new
-%else
-chown openvswitch:openvswitch $new
-%endif
-done
-
 %if 0%{?systemd_post:1}
 # This may not enable openvswitch service or do daemon-reload.
 %systemd_post %{name}.service
@@ -432,8 +413,8 @@ fi
 %endif
 %dir %{_sysconfdir}/openvswitch
 %{_sysconfdir}/openvswitch/default.conf
-%config %ghost %{_sharedstatedir}/openvswitch/conf.db
-%ghost %{_sharedstatedir}/openvswitch/.conf.db.~lock~
+%config %ghost %{_sysconfdir}/openvswitch/conf.db
+%ghost %{_sysconfdir}/openvswitch/.conf.db.~lock~
 %config %ghost %{_sysconfdir}/openvswitch/system-id.conf
 %config(noreplace) %{_sysconfdir}/sysconfig/openvswitch
 %defattr(-,root,root)
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 1/2] ofp, dpif: Allow CT flush based on partial match

2022-12-15 Thread Ales Musil
Currently, the CT can be flushed by dpctl only be specifying
the whole 5-tuple. This is not very convenient when there are
only some fields known to the user of CT flush. Add new struct
ofputil_ct_match which represents the generic filtering that can
be done for CT flush. The match is done only on fields that are
non-zero with exception to the icmp fields.

This allows the filtering just within dpctl, however
it is a preparation for OpenFlow extension.

Reported-at: https://bugzilla.redhat.com/2120546
Signed-off-by: Ales Musil 
---
v3: Rebase on top of master.
Address the C99 comment and missing dpif_close call.
v2: Rebase on top of master.
Address comments from Paolo.
---
 NEWS   |   2 +
 include/openvswitch/ofp-util.h |  28 +++
 lib/automake.mk|   2 +
 lib/ct-dpif.c  | 201 +
 lib/ct-dpif.h  |   4 +-
 lib/dpctl.c|  45 +++--
 lib/dpctl.man  |   3 +-
 lib/ofp-ct-util.c  | 311 +
 lib/ofp-ct-util.h  |  34 
 tests/system-traffic.at|  82 -
 10 files changed, 568 insertions(+), 144 deletions(-)
 create mode 100644 lib/ofp-ct-util.c
 create mode 100644 lib/ofp-ct-util.h

diff --git a/NEWS b/NEWS
index 265375e1c..ff8904b02 100644
--- a/NEWS
+++ b/NEWS
@@ -14,6 +14,8 @@ Post-v3.0.0
  10 Gbps link speed by default in case the actual link speed cannot be
  determined.  Previously it was 10 Mbps.  Values can still be overridden
  by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
+   - ovs-dpctl and related ovs-appctl commands:
+ * "flush-conntrack" is capable of handling partial 5-tuple.
 
 
 v3.0.0 - 15 Aug 2022
diff --git a/include/openvswitch/ofp-util.h b/include/openvswitch/ofp-util.h
index 091a09cad..84937ae26 100644
--- a/include/openvswitch/ofp-util.h
+++ b/include/openvswitch/ofp-util.h
@@ -19,6 +19,9 @@
 
 #include 
 #include 
+#include 
+#include 
+
 #include "openvswitch/ofp-protocol.h"
 
 struct ofp_header;
@@ -27,6 +30,31 @@ struct ofp_header;
 extern "C" {
 #endif
 
+struct ofputil_ct_tuple {
+struct in6_addr src;
+struct in6_addr dst;
+
+union {
+ovs_be16 src_port;
+ovs_be16 icmp_id;
+};
+union {
+ovs_be16 dst_port;
+struct {
+uint8_t icmp_code;
+uint8_t icmp_type;
+};
+};
+};
+
+struct ofputil_ct_match {
+uint8_t ip_proto;
+uint16_t l3_type;
+
+struct ofputil_ct_tuple tuple_orig;
+struct ofputil_ct_tuple tuple_reply;
+};
+
 bool ofputil_decode_hello(const struct ofp_header *,
   uint32_t *allowed_versions);
 struct ofpbuf *ofputil_encode_hello(uint32_t version_bitmap);
diff --git a/lib/automake.mk b/lib/automake.mk
index a0fabe38f..37135f118 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -227,6 +227,8 @@ lib_libopenvswitch_la_SOURCES = \
lib/ofp-actions.c \
lib/ofp-bundle.c \
lib/ofp-connection.c \
+   lib/ofp-ct-util.c \
+   lib/ofp-ct-util.h \
lib/ofp-ed-props.c \
lib/ofp-errors.c \
lib/ofp-flow.c \
diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index 6f17a26b5..906e827c1 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -20,6 +20,7 @@
 #include 
 
 #include "ct-dpif.h"
+#include "ofp-ct-util.h"
 #include "openvswitch/ofp-parse.h"
 #include "openvswitch/vlog.h"
 
@@ -71,6 +72,31 @@ ct_dpif_dump_start(struct dpif *dpif, struct 
ct_dpif_dump_state **dump,
 return err;
 }
 
+static void
+ct_dpif_tuple_from_ofputil_ct_tuple(const struct ofputil_ct_tuple *ofp_tuple,
+struct ct_dpif_tuple *tuple,
+uint16_t l3_type, uint8_t ip_proto)
+{
+if (l3_type == AF_INET) {
+tuple->src.ip = in6_addr_get_mapped_ipv4(_tuple->src);
+tuple->dst.ip = in6_addr_get_mapped_ipv4(_tuple->dst);
+} else {
+tuple->src.in6 = ofp_tuple->src;
+tuple->dst.in6 = ofp_tuple->dst;
+}
+
+tuple->l3_type = l3_type;
+tuple->ip_proto = ip_proto;
+tuple->src_port = ofp_tuple->src_port;
+
+if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
+tuple->icmp_code = ofp_tuple->icmp_code;
+tuple->icmp_type = ofp_tuple->icmp_type;
+} else {
+tuple->dst_port = ofp_tuple->dst_port;
+}
+}
+
 /* Dump one connection from a tracker, and put it in 'entry'.
  *
  * 'dump' should have been initialized by ct_dpif_dump_start().
@@ -100,7 +126,62 @@ ct_dpif_dump_done(struct ct_dpif_dump_state *dump)
 ? dpif->dpif_class->ct_dump_done(dpif, dump)
 : EOPNOTSUPP);
 }
-
+
+static int
+ct_dpif_flush_tuple(struct dpif *dpif, const uint16_t *zone,
+const struct ofputil_ct_match *match) {
+struct ct_dpif_dump_state *dump;
+struct ct_dpif_entry cte;
+int error;
+int tot_bkts;
+
+if (!dpif->dpif_class->ct_flush) {
+ 

[ovs-dev] [PATCH v3 2/2] openflow: Add extension to flush CT by generic match

2022-12-15 Thread Ales Musil
Add extension that allows to flush connections from CT
by specifying fields that the connections should be
matched against. This allows to match only some fields
of the connection e.g. source address for orig direrction.

Reported-at: https://bugzilla.redhat.com/2120546
Signed-off-by: Ales Musil 
---
v3: Rebase on top of master.
v2: Rebase on top of master.
Use suggestion from Ilya.
---
 NEWS   |   3 +
 include/openflow/nicira-ext.h  |  30 +++
 include/openvswitch/ofp-msgs.h |   4 +
 include/openvswitch/ofp-util.h |   4 +
 lib/ofp-bundle.c   |   1 +
 lib/ofp-ct-util.c  | 146 +
 lib/ofp-ct-util.h  |   9 ++
 lib/ofp-print.c|  20 +
 lib/ofp-util.c |  25 ++
 lib/rconn.c|   1 +
 ofproto/ofproto-dpif.c |   8 +-
 ofproto/ofproto-provider.h |   7 +-
 ofproto/ofproto.c  |  30 ++-
 tests/ofp-print.at |  93 +
 tests/ovs-ofctl.at |  12 +++
 tests/system-traffic.at| 116 ++
 utilities/ovs-ofctl.c  |  38 +
 17 files changed, 489 insertions(+), 58 deletions(-)

diff --git a/NEWS b/NEWS
index ff8904b02..46b8faa41 100644
--- a/NEWS
+++ b/NEWS
@@ -16,6 +16,9 @@ Post-v3.0.0
  by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
- ovs-dpctl and related ovs-appctl commands:
  * "flush-conntrack" is capable of handling partial 5-tuple.
+   - OpenFlow:
+  * New OpenFlow extension NXT_CT_FLUSH to flush connections matching
+the specified fields.
 
 
 v3.0.0 - 15 Aug 2022
diff --git a/include/openflow/nicira-ext.h b/include/openflow/nicira-ext.h
index b68804991..32ce56d31 100644
--- a/include/openflow/nicira-ext.h
+++ b/include/openflow/nicira-ext.h
@@ -1064,4 +1064,34 @@ struct nx_zone_id {
 };
 OFP_ASSERT(sizeof(struct nx_zone_id) == 8);
 
+/* CT flush available TLVs. */
+enum nx_ct_flush_tlv_type {
+/* Outer types. */
+NXT_CT_ORIG_DIRECTION,/* CT orig direction outer type. */
+NXT_CT_REPLY_DIRECTION,   /* CT reply direction outer type. */
+
+/* Nested types. */
+NXT_CT_SRC,   /* CT source IPv6 or mapped IPv4 address. */
+NXT_CT_DST,   /* CT destination IPv6 or mapped IPv4 address. */
+NXT_CT_SRC_PORT,  /* CT source port. */
+NXT_CT_DST_PORT,  /* CT destination port. */
+NXT_CT_ICMP_ID,   /* CT ICMP id. */
+NXT_CT_ICMP_TYPE, /* CT ICMP type. */
+NXT_CT_ICMP_CODE, /* CT ICMP code. */
+
+/* Primitive types. */
+NXT_CT_ZONE_ID,   /* CT zone id. */
+};
+
+/* NXT_CT_FLUSH.
+ *
+ * Flushes the connection tracking specified by 5-tuple.
+ * The struct should be followed by TLVs specifying the matching parameters. */
+struct nx_ct_flush {
+uint8_t ip_proto;  /* IP protocol. */
+uint8_t family;/* L3 address family. */
+uint8_t zero[6];   /* Must be zero. */
+};
+OFP_ASSERT(sizeof(struct nx_ct_flush) == 8);
+
 #endif /* openflow/nicira-ext.h */
diff --git a/include/openvswitch/ofp-msgs.h b/include/openvswitch/ofp-msgs.h
index 921a937e5..659b0a3e7 100644
--- a/include/openvswitch/ofp-msgs.h
+++ b/include/openvswitch/ofp-msgs.h
@@ -526,6 +526,9 @@ enum ofpraw {
 
 /* NXST 1.0+ (4): struct nx_ipfix_stats_reply[]. */
 OFPRAW_NXST_IPFIX_FLOW_REPLY,
+
+/* NXT 1.0+ (32): struct nx_ct_flush, uint8_t[8][]. */
+OFPRAW_NXT_CT_FLUSH,
 };
 
 /* Decoding messages into OFPRAW_* values. */
@@ -772,6 +775,7 @@ enum ofptype {
 OFPTYPE_IPFIX_FLOW_STATS_REQUEST, /* OFPRAW_NXST_IPFIX_FLOW_REQUEST */
 OFPTYPE_IPFIX_FLOW_STATS_REPLY,   /* OFPRAW_NXST_IPFIX_FLOW_REPLY */
 OFPTYPE_CT_FLUSH_ZONE,/* OFPRAW_NXT_CT_FLUSH_ZONE. */
+OFPTYPE_CT_FLUSH,   /* OFPRAW_NXT_CT_FLUSH. */
 
 /* Flow monitor extension. */
 OFPTYPE_FLOW_MONITOR_CANCEL,  /* OFPRAW_NXT_FLOW_MONITOR_CANCEL.
diff --git a/include/openvswitch/ofp-util.h b/include/openvswitch/ofp-util.h
index 84937ae26..e10d90b9f 100644
--- a/include/openvswitch/ofp-util.h
+++ b/include/openvswitch/ofp-util.h
@@ -65,6 +65,10 @@ struct ofpbuf *ofputil_encode_echo_reply(const struct 
ofp_header *);
 
 struct ofpbuf *ofputil_encode_barrier_request(enum ofp_version);
 
+struct ofpbuf *ofputil_ct_match_encode(const struct ofputil_ct_match *match,
+   uint16_t *zone_id,
+   enum ofp_version version);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ofp-bundle.c b/lib/ofp-bundle.c
index 0161c2bc6..941a8370e 100644
--- a/lib/ofp-bundle.c
+++ b/lib/ofp-bundle.c
@@ -292,6 +292,7 @@ ofputil_is_bundlable(enum ofptype type)
 case OFPTYPE_IPFIX_FLOW_STATS_REQUEST:
 case OFPTYPE_IPFIX_FLOW_STATS_REPLY:
 case OFPTYPE_CT_FLUSH_ZONE:
+case OFPTYPE_CT_FLUSH:
 break;
 }
 
diff --git a/lib/ofp-ct-util.c 

[ovs-dev] [PATCH v3 0/2] Add extension for partial CT flush

2022-12-15 Thread Ales Musil
Add extension that will allow to flush
connection from CT specified by full
orig 5-tuple or partial match that combines
orig or reply direction.

Ales Musil (2):
  ofp, dpif: Allow CT flush based on partial match
  openflow: Add extension to flush CT by generic match

 NEWS   |   5 +
 include/openflow/nicira-ext.h  |  30 +++
 include/openvswitch/ofp-msgs.h |   4 +
 include/openvswitch/ofp-util.h |  32 +++
 lib/automake.mk|   2 +
 lib/ct-dpif.c  | 201 +++
 lib/ct-dpif.h  |   4 +-
 lib/dpctl.c|  45 ++--
 lib/dpctl.man  |   3 +-
 lib/ofp-bundle.c   |   1 +
 lib/ofp-ct-util.c  | 457 +
 lib/ofp-ct-util.h  |  43 
 lib/ofp-print.c|  20 ++
 lib/ofp-util.c |  25 ++
 lib/rconn.c|   1 +
 ofproto/ofproto-dpif.c |   8 +-
 ofproto/ofproto-provider.h |   7 +-
 ofproto/ofproto.c  |  30 ++-
 tests/ofp-print.at |  93 +++
 tests/ovs-ofctl.at |  12 +
 tests/system-traffic.at| 132 --
 utilities/ovs-ofctl.c  |  38 +++
 22 files changed, 1024 insertions(+), 169 deletions(-)
 create mode 100644 lib/ofp-ct-util.c
 create mode 100644 lib/ofp-ct-util.h

-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net v2] openvswitch: Fix flow lookup to use unmasked key

2022-12-15 Thread Ilya Maximets
On 12/14/22 17:33, Eelco Chaudron wrote:
> The commit mentioned below causes the ovs_flow_tbl_lookup() function
> to be called with the masked key. However, it's supposed to be called
> with the unmasked key.

Hi, Eelco.  Thanks for the fix!

Could you, please, add more information to the commit message on
why this is a problem, with some examples?  This will be useful
for someone in the future trying to understand why we actually
have to use an unmasked key here.

Also, I suppose, 'Cc: sta...@vger.kernel.org' tag is needed in the
commit message since it's a fix for a bug that is actually impacts
users and needs to be backported.

Best regards, Ilya Maximets.

> 
> This change reverses the commit below, but rather than having the key
> on the stack, it's allocated.
> 
> Fixes: 190aa3e77880 ("openvswitch: Fix Frame-size larger than 1024 bytes 
> warning.")
> 
> Signed-off-by: Eelco Chaudron 
> 
> ---
> Version history:
>  - v2: Fixed ENOME(N/M) error. Forgot to do a stg refresh.
> 
>  net/openvswitch/datapath.c |   25 -
>  1 file changed, 16 insertions(+), 9 deletions(-)

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] ovn-trace: Use the original ovnact for execute_load

2022-12-15 Thread Ales Musil
Compiling with GCC and -fsanitize=undefined throws
a warning [0]. To avoid that, use the original
ovnact pointer with the inner size rather than getting
the inner "struct ovnact" from the "struct ovnact_load".

At the same time simplify the computation of size for
ovnacts_format call.

[0]
In function ‘execute_load’,
inlined from ‘trace_actions.part.0.isra’ at utilities/ovn-trace.c:3075:13:
utilities/ovn-trace.c:1501:5: error: ‘ovnacts_encode’ reading 4 bytes from a 
region of size 1 [-Werror=stringop-overread]
 1501 | ovnacts_encode(>ovnact, sizeof *load, , );
  | ^~
utilities/ovn-trace.c:1501:5: note: referencing argument 1 of type ‘const 
struct ovnact[0]’
In file included from utilities/ovn-trace.c:35:
./include/ovn/actions.h: In function ‘trace_actions.part.0.isra’:
./include/ovn/actions.h:869:6: note: in a call to function ‘ovnacts_encode’
  869 | void ovnacts_encode(const struct ovnact[], size_t ovnacts_len,
  |  ^~
In function ‘execute_load’,
inlined from ‘trace_actions.part.0.isra’ at utilities/ovn-trace.c:3075:13:
utilities/ovn-trace.c:1509:13: error: ‘ovnacts_format’ reading 4 bytes from a 
region of size 1 [-Werror=stringop-overread]
 1509 | ovnacts_format(>ovnact, OVNACT_LOAD_SIZE, );
  | ^~~
utilities/ovn-trace.c:1509:13: note: referencing argument 1 of type ‘const 
struct ovnact[0]’
./include/ovn/actions.h: In function ‘trace_actions.part.0.isra’:
./include/ovn/actions.h:792:6: note: in a call to function ‘ovnacts_format’
  792 | void ovnacts_format(const struct ovnact[], size_t ovnacts_len, struct 
ds *);
  |  ^~

Signed-off-by: Ales Musil 
---
 utilities/ovn-trace.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c
index 07ebac5e5..e5766ed67 100644
--- a/utilities/ovn-trace.c
+++ b/utilities/ovn-trace.c
@@ -1486,9 +1486,8 @@ ovntrace_node_prune_hard(struct ovs_list *nodes)
 }
 
 static void
-execute_load(const struct ovnact_load *load,
- const struct ovntrace_datapath *dp, struct flow *uflow,
- struct ovs_list *super OVS_UNUSED)
+execute_load(const struct ovnact *ovnact, const struct ovntrace_datapath *dp,
+ struct flow *uflow, struct ovs_list *super OVS_UNUSED)
 {
 const struct ovnact_encode_params ep = {
 .lookup_port = ovntrace_lookup_port,
@@ -1498,7 +1497,7 @@ execute_load(const struct ovnact_load *load,
 uint64_t stub[512 / 8];
 struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(stub);
 
-ovnacts_encode(>ovnact, sizeof *load, , );
+ovnacts_encode(ovnact, OVNACT_ALIGN(ovnact->len), , );
 
 struct ofpact *a;
 OFPACT_FOR_EACH (a, ofpacts.data, ofpacts.size) {
@@ -1506,12 +1505,11 @@ execute_load(const struct ovnact_load *load,
 
 if (!mf_is_register(sf->field->id)) {
 struct ds s = DS_EMPTY_INITIALIZER;
-ovnacts_format(>ovnact, OVNACT_LOAD_SIZE, );
-ds_chomp(, ';');
 
-char *friendly = ovntrace_make_names_friendly(ds_cstr());
-ovntrace_node_append(super, OVNTRACE_NODE_MODIFY, "%s", friendly);
-free(friendly);
+ovnacts_format(ovnact, OVNACT_ALIGN(ovnact->len), );
+ds_chomp(, ';');
+ovntrace_node_append(super, OVNTRACE_NODE_MODIFY, "%s",
+ ds_cstr());
 
 ds_destroy();
 }
@@ -3057,7 +3055,7 @@ trace_actions(const struct ovnact *ovnacts, size_t 
ovnacts_len,
 const struct ovnact *a;
 OVNACT_FOR_EACH (a, ovnacts, ovnacts_len) {
 ds_clear();
-ovnacts_format(a, sizeof *a * (ovnact_next(a) - a), );
+ovnacts_format(a, OVNACT_ALIGN(a->len), );
 char *friendly = ovntrace_make_names_friendly(ds_cstr());
 ovntrace_node_append(super, OVNTRACE_NODE_ACTION, "%s", friendly);
 free(friendly);
@@ -3072,7 +3070,7 @@ trace_actions(const struct ovnact *ovnacts, size_t 
ovnacts_len,
 break;
 
 case OVNACT_LOAD:
-execute_load(ovnact_get_LOAD(a), dp, uflow, super);
+execute_load(a, dp, uflow, super);
 break;
 
 case OVNACT_MOVE:
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2 3/4] ic: prevent advertising/learning multiple same routes

2022-12-15 Thread Dumitru Ceara
On 12/15/22 09:31, Vladislav Odintsov wrote:
> Hi Dumitru,
> 
> Regards,
> Vladislav Odintsov
> 
>> On 12 Dec 2022, at 17:57, Dumitru Ceara  wrote:
>>
>> On 12/12/22 14:39, Vladislav Odintsov wrote:
>>> Hi Numan, Dumitru,
>>>
>>> I forgot one thing about a possible upgrade recommendation for the users.
>>> Fix can be applied much easier: just upgrading all ovn-ic in all AZs first 
>>> and then converting DB schema. This will remove all duplicates.
>>> The latter can be requested manually invoking systemctl restart ovn-ic-db 
>>> (at least for RH, there is a separate systemd unit for IC databases) or 
>>> calling ovsdb-client convert … command directly.
>>>
>>
>> This is a good point, especially because we don't define anywhere
>> (AFAIK) an order of upgrading components when it comes to ovn-ic.
>>
>> We should document this in ovn-upgrades.rst.
>>
>>> Would it be an acceptable compromise for this situation?
>>
>> This has a long term implication though: ovn-ic must always be backwards
>> compatible with ovn-ic-SB schema changes.  That is not the case with
>> ovn-northd and SB schema.  If people think that's ok, I'm fine with that
>> too but we need to make sure we properly document this.
>>
> 
> As nobody answers, should I:
> 1. Split patch #3 by two: fix of duplicates (in order it do be backportable 
> to older branches) and the second part is an index changes + upgrade docs, 
> which will be a part of a major upgrade and will not be considered for 
> backporting.
> 2. Describe in the second part, that if user upgrades from any version prior 
> to this major release it should upgrade all ovn-ic daemons on all 
> availability zones first and then upgrade ovn-ic-sb schema?
> 
> If that has no objections, I’ll send a new patchset for a final review.
> 

I think that makes sense.  Thanks again for working on this, looking
forward to the new revision.

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2 3/4] ic: prevent advertising/learning multiple same routes

2022-12-15 Thread Vladislav Odintsov
Hi Dumitru,

Regards,
Vladislav Odintsov

> On 12 Dec 2022, at 17:57, Dumitru Ceara  wrote:
> 
> On 12/12/22 14:39, Vladislav Odintsov wrote:
>> Hi Numan, Dumitru,
>> 
>> I forgot one thing about a possible upgrade recommendation for the users.
>> Fix can be applied much easier: just upgrading all ovn-ic in all AZs first 
>> and then converting DB schema. This will remove all duplicates.
>> The latter can be requested manually invoking systemctl restart ovn-ic-db 
>> (at least for RH, there is a separate systemd unit for IC databases) or 
>> calling ovsdb-client convert … command directly.
>> 
> 
> This is a good point, especially because we don't define anywhere
> (AFAIK) an order of upgrading components when it comes to ovn-ic.
> 
> We should document this in ovn-upgrades.rst.
> 
>> Would it be an acceptable compromise for this situation?
> 
> This has a long term implication though: ovn-ic must always be backwards
> compatible with ovn-ic-SB schema changes.  That is not the case with
> ovn-northd and SB schema.  If people think that's ok, I'm fine with that
> too but we need to make sure we properly document this.
> 

As nobody answers, should I:
1. Split patch #3 by two: fix of duplicates (in order it do be backportable to 
older branches) and the second part is an index changes + upgrade docs, which 
will be a part of a major upgrade and will not be considered for backporting.
2. Describe in the second part, that if user upgrades from any version prior to 
this major release it should upgrade all ovn-ic daemons on all availability 
zones first and then upgrade ovn-ic-sb schema?

If that has no objections, I’ll send a new patchset for a final review.

> Regards,
> Dumitru
> 
>> 
>> Regards,
>> Vladislav Odintsov
>> 
>>> On 12 Dec 2022, at 16:20, Numan Siddique  wrote:
>>> 
>>> On Mon, Dec 12, 2022 at 5:41 AM Dumitru Ceara  wrote:
 
 On 12/9/22 19:42, Vladislav Odintsov wrote:
> Hi Dumitru,
 
 Hi Vladislav,
 
> 
> please see answers inline.
> 
> Regards,
> Vladislav Odintsov
> 
>> On 9 Dec 2022, at 17:37, Dumitru Ceara  wrote:
>> 
>> On 12/6/22 11:20, Vladislav Odintsov wrote:
>>> Signed-off-by: Vladislav Odintsov >> >
>>> ---
>> 
>> Hi Vladislav,
>> 
>>> ic/ovn-ic.c | 83 +++--
>>> ovn-ic-sb.ovsschema |  6 ++--
>>> tests/ovn-ic.at | 60 
>>> 3 files changed, 114 insertions(+), 35 deletions(-)
>>> 
>>> diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
>>> index 9e2369fef..64307d8c4 100644
>>> --- a/ic/ovn-ic.c
>>> +++ b/ic/ovn-ic.c
>>> @@ -884,10 +884,12 @@ ic_route_hash(const struct in6_addr *prefix, 
>>> unsigned int plen,
>>> static struct ic_route_info *
>>> ic_route_find(struct hmap *routes, const struct in6_addr *prefix,
>>> unsigned int plen, const struct in6_addr *nexthop,
>>> -  const char *origin, char *route_table)
>>> +  const char *origin, const char *route_table, uint32_t 
>>> hash)
>>> {
>>>   struct ic_route_info *r;
>>> -uint32_t hash = ic_route_hash(prefix, plen, nexthop, origin, 
>>> route_table);
>>> +if (!hash) {
>>> +hash = ic_route_hash(prefix, plen, nexthop, origin, 
>>> route_table);
>>> +}
>>>   HMAP_FOR_EACH_WITH_HASH (r, node, hash, routes) {
>>>   if (ipv6_addr_equals(>prefix, prefix) &&
>>>   r->plen == plen &&
>>> @@ -945,8 +947,8 @@ add_to_routes_learned(struct hmap *routes_learned,
>>>   }
>>>   const char *origin = smap_get_def(_route->options, "origin", "");
>>>   if (ic_route_find(routes_learned, , plen, , origin,
>>> -  nb_route->route_table)) {
>>> -/* Route is already added to learned in previous iteration. */
>>> +  nb_route->route_table, 0)) {
>>> +/* Route was added to learned on previous iteration. */
>>>   return true;
>>>   }
>>> 
>>> @@ -1093,10 +1095,42 @@ route_need_advertise(const char *policy,
>>> }
>>> 
>>> static void
>>> -add_to_routes_ad(struct hmap *routes_ad,
>>> - const struct nbrec_logical_router_static_route 
>>> *nb_route,
>>> - const struct lport_addresses *nexthop_addresses,
>>> - const struct smap *nb_options, const char 
>>> *route_table)
>>> +add_to_routes_ad(struct hmap *routes_ad, const struct in6_addr prefix,
>>> + unsigned int plen, const struct in6_addr nexthop,
>>> + const char *origin, const char *route_table,
>>> + const struct nbrec_logical_router_port *nb_lrp,
>>> + const struct nbrec_logical_router_static_route 
>>> *nb_route)
>>> +{
>>> +if (route_table == NULL) {
>>> +