Re: [ovs-dev] [PATCH ovn v5 00/16] northd lflow incremental processing

2024-01-24 Thread Han Zhou
On Thu, Jan 11, 2024 at 7:28 AM  wrote:
>
> From: Numan Siddique 
>
> This patch series adds incremental processing in the lflow engine
> node to handle changes to northd and other engine nodes.
> Changed related to load balancers and NAT are mainly handled in
> this patch series.
>
> This patch series can also be found here -
https://github.com/numansiddique/ovn/tree/northd_lbnatacl_lflow/v5
>
> Prior to this patch series, most of the changes to northd engine
> resulted in full recomputation of logical flows.  This series
> aims to improve the performance of ovn-northd by adding the I-P
> support.  In order to add this support, some of the northd engine
> node data (from struct ovn_datapath) is split and moved over to
> new engine nodes - mainly related to load balancers, NAT and ACLs.
>
> Below are the scale testing results done with these patches applied
> using ovn-heater.  The test ran the scenario  -
> ocp-500-density-heavy.yml [1].
>
> With all the lflow I-P patches applied, the resuts are:
>
>
---
> Min (s) Median (s)  90%ile (s)
 99%ile (s)  Max (s) Mean (s)Total (s)   Count
Failed
>
---
> Iteration Total 0.1368831.1290161.192001
 1.2041671.2127280.66501783.127099   125 0
> Namespace.add_ports 0.0052160.0057360.007034
 0.0154860.0189780.0062110.776373125 0
> WorkerNode.bind_port0.0350300.0460820.052469
 0.0582930.0603110.04597311.493259   250 0
> WorkerNode.ping_port0.0050570.0067271.047692
 1.0692531.0713360.26689666.724094   250 0
>
---
>
> The results with the present main are:
>
>
---
> Min (s) Median (s)  90%ile (s)
 99%ile (s)  Max (s) Mean (s)Total (s)   Count
Failed
>
---
> Iteration Total 0.1354912.2238053.311270
 3.3390783.3453461.729172216.146495  125 0
> Namespace.add_ports 0.0053800.0057440.006819
 0.0187730.0208000.0062920.786532125 0
> WorkerNode.bind_port0.0341790.0460550.053488
 0.0588010.0710430.04611711.529311   250 0
> WorkerNode.ping_port0.0049560.0069523.086952
 3.1917433.1928070.791544197.886026  250 0
>
---
>
> Please see the link [2] which has a high level description of the
> changes done in this patch series.
>
>
> [1] -
https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
> [2] -
https://mail.openvswitch.org/pipermail/ovs-dev/2023-December/410053.html
>
> v4 -> v5
> ---
>* Rebased to latest main and resolved the conflicts.
>
>* Addressed the review comments from Han in patch 15 (and in p8).
Removed the
>  assert if SB dp group is missing and handled it by returning false
>  so that lflow engine recomputes.  Added test cases to cover this
>  scenario for both lflows (p8) and SB load balancers (p15) .

Thanks Numan. I went through this version of the series. I tried my best to
review in details but I can't say I examined every lines of the changes.
The major comments are about the implicit dependency related to p4, p5, p6,
p7, and some pending discussions for p8 (for which I am also going to do
more performance test). For the rest of patches, please consider them as:
Acked-by: Han Zhou 

Thanks,
Han

>
> v3 -> v4
> ---
>* Addressed most of the review comments from Dumitru and Han.
>
>* Found a couple of bugs in v3 patch 9 -
>  "northd: Refactor lflow management into a separate module."
>  and addressed them in v4.
>  To brief  the issue, if a logical flow L(M, A) is referenced
>  by 2 lflow_ref's which belong to the same datapath, then the lflow
>  was deleted even if one lflow_ref was cleared due to any changes.
>  It is addressed now by maintaining a reference 

Re: [ovs-dev] [PATCH ovn v5 11/16] northd: Handle lb changes in lflow engine.

2024-01-24 Thread Han Zhou
On Thu, Jan 11, 2024 at 7:33 AM  wrote:
>
> From: Numan Siddique 
>
> Since northd tracked data has the changed lb data, northd
> engine handler for lflow engine now handles the lb changes
> incrementally.  All the lflows generated for each lb is
> stored in the ovn_lb_datapaths->lflow_ref and this is used
> similar to how we handle ovn_port changes.
>
> Signed-off-by: Numan Siddique 
> ---
>  northd/en-lflow.c   | 11 ++---
>  northd/lb.c |  3 ++
>  northd/lb.h | 26 
>  northd/lflow-mgr.c  | 47 +-
>  northd/northd.c | 98 +++--
>  northd/northd.h |  4 ++
>  tests/ovn-northd.at | 30 ++
>  7 files changed, 184 insertions(+), 35 deletions(-)
>
> diff --git a/northd/en-lflow.c b/northd/en-lflow.c
> index fef9a1352d..205605578f 100644
> --- a/northd/en-lflow.c
> +++ b/northd/en-lflow.c
> @@ -123,11 +123,6 @@ lflow_northd_handler(struct engine_node *node,
>  return false;
>  }
>
> -/* Fall back to recompute if load balancers have changed. */
> -if (northd_has_lbs_in_tracked_data(_data->trk_data)) {
> -return false;
> -}
> -
>  const struct engine_context *eng_ctx = engine_get_context();
>  struct lflow_data *lflow_data = data;
>
> @@ -140,6 +135,12 @@ lflow_northd_handler(struct engine_node *node,
>  return false;
>  }
>
> +if (!lflow_handle_northd_lb_changes(eng_ctx->ovnsb_idl_txn,
> +_data->trk_data.trk_lbs,
> +_input, lflow_data->lflow_table)) {
> +return false;
> +}
> +
>  engine_set_node_state(node, EN_UPDATED);
>  return true;
>  }
> diff --git a/northd/lb.c b/northd/lb.c
> index e6c8a51911..5fca41e049 100644
> --- a/northd/lb.c
> +++ b/northd/lb.c
> @@ -21,6 +21,7 @@
>
>  /* OVN includes */
>  #include "lb.h"
> +#include "lflow-mgr.h"
>  #include "lib/lb.h"
>  #include "northd.h"
>
> @@ -33,6 +34,7 @@ ovn_lb_datapaths_create(const struct ovn_northd_lb *lb,
size_t n_ls_datapaths,
>  lb_dps->lb = lb;
>  lb_dps->nb_ls_map = bitmap_allocate(n_ls_datapaths);
>  lb_dps->nb_lr_map = bitmap_allocate(n_lr_datapaths);
> +lb_dps->lflow_ref = lflow_ref_create();
>
>  return lb_dps;
>  }
> @@ -42,6 +44,7 @@ ovn_lb_datapaths_destroy(struct ovn_lb_datapaths
*lb_dps)
>  {
>  bitmap_free(lb_dps->nb_lr_map);
>  bitmap_free(lb_dps->nb_ls_map);
> +lflow_ref_destroy(lb_dps->lflow_ref);
>  free(lb_dps);
>  }
>
> diff --git a/northd/lb.h b/northd/lb.h
> index eeef2ea260..de677ca4ef 100644
> --- a/northd/lb.h
> +++ b/northd/lb.h
> @@ -20,6 +20,8 @@
>  #include "openvswitch/hmap.h"
>  #include "uuid.h"
>
> +struct lflow_ref;
> +
>  struct ovn_lb_datapaths {
>  struct hmap_node hmap_node;
>
> @@ -29,6 +31,30 @@ struct ovn_lb_datapaths {
>
>  size_t n_nb_lr;
>  unsigned long *nb_lr_map;
> +
> +/* Reference of lflows generated for this load balancer.
> + *
> + * This data is initialized and destroyed by the en_northd node, but
> + * populated and used only by the en_lflow node. Ideally this data
should
> + * be maintained as part of en_lflow's data (struct lflow_data): a
hash
> + * index from ovn_port key to lflows.  However, it would be less
efficient
> + * and more complex:
> + *
> + * 1. It would require an extra search (using the index) to find the
> + * lflows.
> + *
> + * 2. Building the index needs to be thread-safe, using either a
global
> + * lock which is obviously less efficient, or hash-based lock array
which
> + * is more complex.
> + *
> + * Maintaining the lflow_ref here is more straightforward. The
drawback is
> + * that we need to keep in mind that this data belongs to en_lflow
node,
> + * so never access it from any other nodes.
> + *
> + * 'lflow_ref' is used to reference logical flows generated for this
> + *  load balancer.
> + */
> +struct lflow_ref *lflow_ref;
>  };
>
>  struct ovn_lb_datapaths *ovn_lb_datapaths_create(const struct
ovn_northd_lb *,
> diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c
> index 3cf9696f6e..6cb2a367fe 100644
> --- a/northd/lflow-mgr.c
> +++ b/northd/lflow-mgr.c
> @@ -375,7 +375,15 @@ struct lflow_ref_node {
>  /* The lflow. */
>  struct ovn_lflow *lflow;
>
> -/* Index id of the datapath this lflow_ref_node belongs to. */
> +/* Indicates of the lflow was added with dp_group or not using
> + * ovn_lflow_add_with_dp_group() macro. */

nit: the sentence is a little confusing. Probably more clear to say:
indicates whether the lflow was added with a dp_group using the
ovn_lflow_add_with_dp_group() macro.

> +bool dpgrp_lflow;
> +/* dpgrp bitmap and bitmap length.  Valid only of dpgrp_lflow is
true. */
> +unsigned long *dpgrp_bitmap;
> +size_t dpgrp_bitmap_len;
> +
> +/* Index id of the datapath this lflow_ref_node belongs to.
> + * Valid only if dpgrp_lflow is 

Re: [ovs-dev] [PATCH ovn v5 08/16] northd: Refactor lflow management into a separate module.

2024-01-24 Thread Han Zhou
On Wed, Jan 24, 2024 at 10:07 PM Han Zhou  wrote:
>
>
>
> On Thu, Jan 11, 2024 at 7:32 AM  wrote:
> >
> > From: Numan Siddique 
> >
> > ovn_lflow_add() and other related functions/macros are now moved
> > into a separate module - lflow-mgr.c.  This module maintains a
> > table 'struct lflow_table' for the logical flows.  lflow table
> > maintains a hmap to store the logical flows.
> >
> > It also maintains the logical switch and router dp groups.
> >
> > Previous commits which added lflow incremental processing for
> > the VIF logical ports, stored the references to
> > the logical ports' lflows using 'struct lflow_ref_list'.  This
> > struct is renamed to 'struct lflow_ref' and is part of lflow-mgr.c.
> > It is  modified a bit to store the resource to lflow references.
> >
> > Example usage of 'struct lflow_ref'.
> >
> > 'struct ovn_port' maintains 2 instances of lflow_ref.  i,e
> >
> > struct ovn_port {
> >...
> >...
> >struct lflow_ref *lflow_ref;
> >struct lflow_ref *stateful_lflow_ref;
>
> Hi Numan,
>
> In addition to the lock discussion with you and Dumitru, I still want to
discuss another thing of this patch regarding the second lflow_ref:
stateful_lflow_ref.
> I understand that you added this to achieve finer grained I-P especially
for router ports. I am wondering how much performance gain is from this.
For my understanding it shouldn't matter much since each ovn_port should be
associated with a very limited number of lflows. Could you provide more
insight/data on this? I think it would be better to keep things simple
(i.e. one object, one lflow_ref list) unless the benefit is obvious.
>
> I am also trying to run another performance regression test for
recompute, since I am a little concerned about the DP refcnt hmap
associated with each lflow. I understand it's necessary to handle the
duplicated lflow cases, but it looks heavy and removes the opportunities
for more efficient bitmap operations. Let me know if you have already had
evaluated its performance.
>
> Thanks,
> Han
>

Sorry that I forgot to mention another minor comment for the struct
lflow_ref_node. It would helpful to add more comments about the typical
life cycle of the lflow_ref_node, so that it is easier to understand why
hmap is used in lflow_ref, and why linked is needed in lflow_ref_node. For
my understanding the life cycle is something like:
1. created and linked at lflow_table_add_lflow()
2. unlinked when handling a change of the object that references it.
3. it may be re-linked when handling the same object change.
4. it is used to sync the lflow change (e.g. adding/removing DPs) to SB.
5. it is destroyed after syncing a unlinked lflow_ref to SB.

I think it would help people understand the code more easily. What do you
think?

Thanks,
Han

> > };
> >
> > All the logical flows generated by
> > build_lswitch_and_lrouter_iterate_by_lsp() uses the ovn_port->lflow_ref.
> >
> > All the logical flows generated by build_lsp_lflows_for_lbnats()
> > uses the ovn_port->stateful_lflow_ref.
> >
> > When handling the ovn_port changes incrementally, the lflows referenced
> > in 'struct ovn_port' are cleared and regenerated and synced to the
> > SB logical flows.
> >
> > eg.
> >
> > lflow_ref_clear_lflows(op->lflow_ref);
> > build_lswitch_and_lrouter_iterate_by_lsp(op, ...);
> > lflow_ref_sync_lflows_to_sb(op->lflow_ref, ...);
> >
> > This patch does few more changes:
> >   -  Logical flows are now hashed without the logical
> >  datapaths.  If a logical flow is referenced by just one
> >  datapath, we don't rehash it.
> >
> >   -  The synthetic 'hash' column of sbrec_logical_flow now
> >  doesn't use the logical datapath.  This means that
> >  when ovn-northd is updated/upgraded and has this commit,
> >  all the logical flows with 'logical_datapath' column
> >  set will get deleted and re-added causing some disruptions.
> >
> >   -  With the commit [1] which added I-P support for logical
> >  port changes, multiple logical flows with same match 'M'
> >  and actions 'A' are generated and stored without the
> >  dp groups, which was not the case prior to
> >  that patch.
> >  One example to generate these lflows is:
> >  ovn-nbctl lsp-set-addresses sw0p1 "MAC1 IP1"
> >  ovn-nbctl lsp-set-addresses sw1p1 "MAC1 IP1"
> >  ovn-nbctl lsp-set-addresses sw2p1 "MAC1 IP1"
> >
> >  Now with this patch we go back to the earlier way.  i.e
> >  one logical flow with logical_dp_groups set.
> >
> >   -  With this patch any updates to a logical port which
> >  doesn't result in new logical flows will not result in
> >  deletion and addition of same logical flows.
> >  Eg.
> >  ovn-nbctl set logical_switch_port sw0p1 external_ids:foo=bar
> >  will be a no-op to the SB logical flow table.
> >
> > [1] - 8bbd678("northd: Incremental processing of VIF additions in
'lflow' node.")
> >
> > Signed-off-by: Numan Siddique 
>

Re: [ovs-dev] [PATCH ovn v5 08/16] northd: Refactor lflow management into a separate module.

2024-01-24 Thread Han Zhou
On Thu, Jan 11, 2024 at 7:32 AM  wrote:
>
> From: Numan Siddique 
>
> ovn_lflow_add() and other related functions/macros are now moved
> into a separate module - lflow-mgr.c.  This module maintains a
> table 'struct lflow_table' for the logical flows.  lflow table
> maintains a hmap to store the logical flows.
>
> It also maintains the logical switch and router dp groups.
>
> Previous commits which added lflow incremental processing for
> the VIF logical ports, stored the references to
> the logical ports' lflows using 'struct lflow_ref_list'.  This
> struct is renamed to 'struct lflow_ref' and is part of lflow-mgr.c.
> It is  modified a bit to store the resource to lflow references.
>
> Example usage of 'struct lflow_ref'.
>
> 'struct ovn_port' maintains 2 instances of lflow_ref.  i,e
>
> struct ovn_port {
>...
>...
>struct lflow_ref *lflow_ref;
>struct lflow_ref *stateful_lflow_ref;

Hi Numan,

In addition to the lock discussion with you and Dumitru, I still want to
discuss another thing of this patch regarding the second lflow_ref:
stateful_lflow_ref.
I understand that you added this to achieve finer grained I-P especially
for router ports. I am wondering how much performance gain is from this.
For my understanding it shouldn't matter much since each ovn_port should be
associated with a very limited number of lflows. Could you provide more
insight/data on this? I think it would be better to keep things simple
(i.e. one object, one lflow_ref list) unless the benefit is obvious.

I am also trying to run another performance regression test for recompute,
since I am a little concerned about the DP refcnt hmap associated with each
lflow. I understand it's necessary to handle the duplicated lflow cases,
but it looks heavy and removes the opportunities for more efficient bitmap
operations. Let me know if you have already had evaluated its performance.

Thanks,
Han

> };
>
> All the logical flows generated by
> build_lswitch_and_lrouter_iterate_by_lsp() uses the ovn_port->lflow_ref.
>
> All the logical flows generated by build_lsp_lflows_for_lbnats()
> uses the ovn_port->stateful_lflow_ref.
>
> When handling the ovn_port changes incrementally, the lflows referenced
> in 'struct ovn_port' are cleared and regenerated and synced to the
> SB logical flows.
>
> eg.
>
> lflow_ref_clear_lflows(op->lflow_ref);
> build_lswitch_and_lrouter_iterate_by_lsp(op, ...);
> lflow_ref_sync_lflows_to_sb(op->lflow_ref, ...);
>
> This patch does few more changes:
>   -  Logical flows are now hashed without the logical
>  datapaths.  If a logical flow is referenced by just one
>  datapath, we don't rehash it.
>
>   -  The synthetic 'hash' column of sbrec_logical_flow now
>  doesn't use the logical datapath.  This means that
>  when ovn-northd is updated/upgraded and has this commit,
>  all the logical flows with 'logical_datapath' column
>  set will get deleted and re-added causing some disruptions.
>
>   -  With the commit [1] which added I-P support for logical
>  port changes, multiple logical flows with same match 'M'
>  and actions 'A' are generated and stored without the
>  dp groups, which was not the case prior to
>  that patch.
>  One example to generate these lflows is:
>  ovn-nbctl lsp-set-addresses sw0p1 "MAC1 IP1"
>  ovn-nbctl lsp-set-addresses sw1p1 "MAC1 IP1"
>  ovn-nbctl lsp-set-addresses sw2p1 "MAC1 IP1"
>
>  Now with this patch we go back to the earlier way.  i.e
>  one logical flow with logical_dp_groups set.
>
>   -  With this patch any updates to a logical port which
>  doesn't result in new logical flows will not result in
>  deletion and addition of same logical flows.
>  Eg.
>  ovn-nbctl set logical_switch_port sw0p1 external_ids:foo=bar
>  will be a no-op to the SB logical flow table.
>
> [1] - 8bbd678("northd: Incremental processing of VIF additions in 'lflow'
node.")
>
> Signed-off-by: Numan Siddique 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v5 08/16] northd: Refactor lflow management into a separate module.

2024-01-24 Thread Han Zhou
On Wed, Jan 24, 2024 at 8:39 PM Numan Siddique  wrote:
>
> On Wed, Jan 24, 2024 at 10:53 PM Han Zhou  wrote:
> >
> > On Wed, Jan 24, 2024 at 4:23 AM Dumitru Ceara  wrote:
> > >
> > > On 1/24/24 06:01, Han Zhou wrote:
> > > > On Fri, Jan 19, 2024 at 2:50 AM Dumitru Ceara 
wrote:
> > > >>
> > > >> On 1/11/24 16:31, num...@ovn.org wrote:
> > > >>> +
> > > >>> +void
> > > >>> +lflow_table_add_lflow(struct lflow_table *lflow_table,
> > > >>> +  const struct ovn_datapath *od,
> > > >>> +  const unsigned long *dp_bitmap, size_t
> > > > dp_bitmap_len,
> > > >>> +  enum ovn_stage stage, uint16_t priority,
> > > >>> +  const char *match, const char *actions,
> > > >>> +  const char *io_port, const char
*ctrl_meter,
> > > >>> +  const struct ovsdb_idl_row *stage_hint,
> > > >>> +  const char *where,
> > > >>> +  struct lflow_ref *lflow_ref)
> > > >>> +OVS_EXCLUDED(fake_hash_mutex)
> > > >>> +{
> > > >>> +struct ovs_mutex *hash_lock;
> > > >>> +uint32_t hash;
> > > >>> +
> > > >>> +ovs_assert(!od ||
> > > >>> +   ovn_stage_to_datapath_type(stage) ==
> > > > ovn_datapath_get_type(od));
> > > >>> +
> > > >>> +hash = ovn_logical_flow_hash(ovn_stage_get_table(stage),
> > > >>> + ovn_stage_get_pipeline(stage),
> > > >>> + priority, match,
> > > >>> + actions);
> > > >>> +
> > > >>> +hash_lock = lflow_hash_lock(_table->entries, hash);
> > > >>> +struct ovn_lflow *lflow =
> > > >>> +do_ovn_lflow_add(lflow_table, od, dp_bitmap,
> > > >>> + dp_bitmap_len, hash, stage,
> > > >>> + priority, match, actions,
> > > >>> + io_port, ctrl_meter, stage_hint, where);
> > > >>> +
> > > >>> +if (lflow_ref) {
> > > >>> +/* lflow referencing is only supported if 'od' is not
NULL.
> > */
> > > >>> +ovs_assert(od);
> > > >>> +
> > > >>> +struct lflow_ref_node *lrn =
> > > >>> +lflow_ref_node_find(_ref->lflow_ref_nodes,
lflow,
> > > > hash);
> > > >>> +if (!lrn) {
> > > >>> +lrn = xzalloc(sizeof *lrn);
> > > >>> +lrn->lflow = lflow;
> > > >>> +lrn->dp_index = od->index;
> > > >>> +ovs_list_insert(_ref->lflows_ref_list,
> > > >>> +>lflow_list_node);
> > > >>> +inc_dp_refcnt(>dp_refcnts_map, lrn->dp_index);
> > > >>> +ovs_list_insert(>referenced_by,
> > > > >ref_list_node);
> > > >>> +
> > > >>> +hmap_insert(_ref->lflow_ref_nodes,
>ref_node,
> > > > hash);
> > > >>> +}
> > > >>> +
> > > >>> +lrn->linked = true;
> > > >>> +}
> > > >>> +
> > > >>> +lflow_hash_unlock(hash_lock);
> > > >>> +
> > > >>> +}
> > > >>> +
> > > >>
> > > >> This part is not thread safe.
> > > >>
> > > >> If two threads try to add logical flows that have different hashes
and
> > > >> lflow_ref is not NULL we're going to have a race condition when
> > > >> inserting to the _ref->lflow_ref_nodes hash map because the
two
> > > >> threads will take different locks.
> > > >>
> > > >
> > > > I think it is safe because a lflow_ref is always associated with an
> > object,
> > > > e.g. port, datapath, lb, etc., and lflow generation for a single
such
> > > > object is never executed in parallel, which is how the parallel
lflow
> > build
> > > > is designed.
> > > > Does it make sense?
> > >
> > > It happens that it's safe in this current patch set because indeed we
> > > always process individual ports, datapaths, lbs, etc, in the same
> > > thread.  However, this code (lflow_table_add_lflow()) is generic and
> > > there's nothing (not even a comment) that would warn developers in the
> > > future about the potential race if the lflow_ref is shared.
> > >
> > > I spoke to Numan offline a bit about this and I think the current plan
> > > is to leave it as is and add proper locking as a follow up (or in v7).
> > > But I think we still need a clear comment here warning users about
this.
> > >  Maybe we should add a comment where the lflow_ref structure is
defined
> > too.
> > >
> > > What do you think?
> >
> > I totally agree with you about adding comments to explain the thread
safety
> > considerations, and make it clear that the lflow_ref should always be
> > associated with the object that is being processed by the thread.
> > With regard to any follow up change for proper locking, I am not sure
what
> > scenario is your concern. I think if we always make sure the lflow_ref
> > passed in is the one owned by the object then the current locking is
> > sufficient. And this is the natural case.
> >
> > However, I did think about a situation where there might be a potential
> > problem in the future when we need to maintain references for 

Re: [ovs-dev] [PATCH ovn v2] fix segfault due to ssl-ciphers

2024-01-24 Thread Numan Siddique
On Thu, Jan 18, 2024 at 1:42 AM Ales Musil  wrote:
>
> On Wed, Jan 17, 2024 at 9:13 PM  wrote:
>
> > From: Aliasgar Ginwala 
> >
> > To avoid invalidating existing certs setup by old version of ovs pki.
> > openssl supports setting ciphers but it fails with ovn as below
> > ovn-controller --ssl-ciphers='HIGH:!aNULL:!MD5:@SECLEVEL=1'
> > Aborted (core dumped)
> >
> > Avoid invalidating existing certs when bumping to new ovn version
> > SSL_connect: error:1416F086:SSL
> > routines:tls_process_server_certificate:certificate verify failed while
> > connecting to control plane.
> >
> > Also amend ovn ic northd and ovn ctl utilities
> >
> > Add tests for ssl ciphers
> >
> > Signed-off-by: Aliasgar Ginwala 
> > ---
> >
>
> Hi Aliasgar,
>
> thank you for the v2.
>
>
> >  controller/ovn-controller.c |   7 ++
> >  ic/ovn-ic.c |   8 ++
> >  northd/ovn-northd.c |   8 ++
> >  tests/ovn-controller.at |  23 +
> >  tests/ovn.at| 182 
> >  utilities/ovn-dbctl.c   |   8 ++
> >  6 files changed, 236 insertions(+)
> >
> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > index 856e5e270..4b16818a6 100644
> > --- a/controller/ovn-controller.c
> > +++ b/controller/ovn-controller.c
> > @@ -6166,6 +6166,13 @@ parse_options(int argc, char *argv[])
> >  ssl_ca_cert_file = optarg;
> >  break;
> >
> > +case OPT_SSL_PROTOCOLS:
> > +stream_ssl_set_protocols(optarg);
> > +break;
> > +
> > +case OPT_SSL_CIPHERS:
> > +stream_ssl_set_ciphers(optarg);
> > +break;
> >
> >  case OPT_PEER_CA_CERT:
> >  stream_ssl_set_peer_ca_cert_file(optarg);
> > diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
> > index 8ceb34d7c..6f8f5734d 100644
> > --- a/ic/ovn-ic.c
> > +++ b/ic/ovn-ic.c
> > @@ -1846,6 +1846,14 @@ parse_options(int argc OVS_UNUSED, char *argv[]
> > OVS_UNUSED)
> >  ssl_ca_cert_file = optarg;
> >  break;
> >
> > +case OPT_SSL_PROTOCOLS:
> > +stream_ssl_set_protocols(optarg);
> > +break;
> > +
> > +case OPT_SSL_CIPHERS:
> > +stream_ssl_set_ciphers(optarg);
> > +break;
> > +
> >  case 'd':
> >  ovnsb_db = optarg;
> >  break;
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index f3868068d..dadc1af38 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -611,6 +611,14 @@ parse_options(int argc OVS_UNUSED, char *argv[]
> > OVS_UNUSED,
> >  ssl_ca_cert_file = optarg;
> >  break;
> >
> > +case OPT_SSL_PROTOCOLS:
> > +stream_ssl_set_protocols(optarg);
> > +break;
> > +
> > +case OPT_SSL_CIPHERS:
> > +stream_ssl_set_ciphers(optarg);
> > +break;
> > +
> >  case 'd':
> >  ovnsb_db = optarg;
> >  break;
> > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> > index 9d2a37c72..9cc8730e9 100644
> > --- a/tests/ovn-controller.at
> > +++ b/tests/ovn-controller.at
> > @@ -2712,3 +2712,26 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int
> > table=40 | grep -q controller], [1]
> >  OVN_CLEANUP([hv1])
> >  AT_CLEANUP
> >  ])
> > +
> > +
> >
>
> nit: Extra empty line, this can be addressed during merge.
>
>
> > +AT_SETUP([ovn-controller - ssl ciphers using command line options])
> > +AT_KEYWORDS([ovn])
> > +AT_SKIP_IF([test "$HAVE_OPENSSL" = no])
> > +ovn_start
> > +
> > +net_add n1
> > +sim_add hv1
> > +ovs-vsctl add-br br-phys
> > +ovn_attach n1 br-phys 192.168.0.20
> > +
> > +# Set cipher and and it should connect
> > +OVS_APP_EXIT_AND_WAIT([ovn-controller])
> > +start_daemon ovn-controller --ssl-ciphers='HIGH:!aNULL:!MD5:@SECLEVEL=1'
> > --ssl-protocols='TLSv1,TLSv1.1,TLSv1.2'
> > +
> > +OVS_WAIT_FOR_OUTPUT([ovn-appctl -t ovn-controller connection-status],
> > [0], [connected
> > +])
> > +
> > +cat hv1/ovn-controller.log
> > +
> > +OVN_CLEANUP([hv1])
> > +AT_CLEANUP
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index c3644ac78..34f277ef9 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -37588,3 +37588,185 @@ OVN_CLEANUP([hv1])
> >
> >  AT_CLEANUP
> >  ])
> > +
> > +AT_SETUP([read-only sb db:pssl access with ssl-ciphers and ssl-protocols])
> > +AT_SKIP_IF([test "$HAVE_OPENSSL" = no])
> > +PKIDIR="$(cd $abs_top_builddir/tests && pwd)"
> > +AT_SKIP_IF([expr "$PKIDIR" : ".*[[ '\"
> > +\\]]"])
> > +
> > +: > .$1.db.~lock~
> > +ovsdb-tool create ovn-sb.db "$abs_top_srcdir"/ovn-sb.ovsschema
> > +
> > +# Add read-only remote to sb ovsdb-server
> > +AT_CHECK(
> > +  [ovsdb-tool transact ovn-sb.db \
> > + ['["OVN_Southbound",
> > +   {"op": "insert",
> > +"table": "SB_Global",
> > +"row": {
> > +  "connections": ["set", [["named-uuid", "xyz"]]]}},
> > +   {"op": "insert",
> > +"table": "Connection",
> > +

Re: [ovs-dev] [PATCH ovn v5 08/16] northd: Refactor lflow management into a separate module.

2024-01-24 Thread Numan Siddique
On Wed, Jan 24, 2024 at 10:53 PM Han Zhou  wrote:
>
> On Wed, Jan 24, 2024 at 4:23 AM Dumitru Ceara  wrote:
> >
> > On 1/24/24 06:01, Han Zhou wrote:
> > > On Fri, Jan 19, 2024 at 2:50 AM Dumitru Ceara  wrote:
> > >>
> > >> On 1/11/24 16:31, num...@ovn.org wrote:
> > >>> +
> > >>> +void
> > >>> +lflow_table_add_lflow(struct lflow_table *lflow_table,
> > >>> +  const struct ovn_datapath *od,
> > >>> +  const unsigned long *dp_bitmap, size_t
> > > dp_bitmap_len,
> > >>> +  enum ovn_stage stage, uint16_t priority,
> > >>> +  const char *match, const char *actions,
> > >>> +  const char *io_port, const char *ctrl_meter,
> > >>> +  const struct ovsdb_idl_row *stage_hint,
> > >>> +  const char *where,
> > >>> +  struct lflow_ref *lflow_ref)
> > >>> +OVS_EXCLUDED(fake_hash_mutex)
> > >>> +{
> > >>> +struct ovs_mutex *hash_lock;
> > >>> +uint32_t hash;
> > >>> +
> > >>> +ovs_assert(!od ||
> > >>> +   ovn_stage_to_datapath_type(stage) ==
> > > ovn_datapath_get_type(od));
> > >>> +
> > >>> +hash = ovn_logical_flow_hash(ovn_stage_get_table(stage),
> > >>> + ovn_stage_get_pipeline(stage),
> > >>> + priority, match,
> > >>> + actions);
> > >>> +
> > >>> +hash_lock = lflow_hash_lock(_table->entries, hash);
> > >>> +struct ovn_lflow *lflow =
> > >>> +do_ovn_lflow_add(lflow_table, od, dp_bitmap,
> > >>> + dp_bitmap_len, hash, stage,
> > >>> + priority, match, actions,
> > >>> + io_port, ctrl_meter, stage_hint, where);
> > >>> +
> > >>> +if (lflow_ref) {
> > >>> +/* lflow referencing is only supported if 'od' is not NULL.
> */
> > >>> +ovs_assert(od);
> > >>> +
> > >>> +struct lflow_ref_node *lrn =
> > >>> +lflow_ref_node_find(_ref->lflow_ref_nodes, lflow,
> > > hash);
> > >>> +if (!lrn) {
> > >>> +lrn = xzalloc(sizeof *lrn);
> > >>> +lrn->lflow = lflow;
> > >>> +lrn->dp_index = od->index;
> > >>> +ovs_list_insert(_ref->lflows_ref_list,
> > >>> +>lflow_list_node);
> > >>> +inc_dp_refcnt(>dp_refcnts_map, lrn->dp_index);
> > >>> +ovs_list_insert(>referenced_by,
> > > >ref_list_node);
> > >>> +
> > >>> +hmap_insert(_ref->lflow_ref_nodes, >ref_node,
> > > hash);
> > >>> +}
> > >>> +
> > >>> +lrn->linked = true;
> > >>> +}
> > >>> +
> > >>> +lflow_hash_unlock(hash_lock);
> > >>> +
> > >>> +}
> > >>> +
> > >>
> > >> This part is not thread safe.
> > >>
> > >> If two threads try to add logical flows that have different hashes and
> > >> lflow_ref is not NULL we're going to have a race condition when
> > >> inserting to the _ref->lflow_ref_nodes hash map because the two
> > >> threads will take different locks.
> > >>
> > >
> > > I think it is safe because a lflow_ref is always associated with an
> object,
> > > e.g. port, datapath, lb, etc., and lflow generation for a single such
> > > object is never executed in parallel, which is how the parallel lflow
> build
> > > is designed.
> > > Does it make sense?
> >
> > It happens that it's safe in this current patch set because indeed we
> > always process individual ports, datapaths, lbs, etc, in the same
> > thread.  However, this code (lflow_table_add_lflow()) is generic and
> > there's nothing (not even a comment) that would warn developers in the
> > future about the potential race if the lflow_ref is shared.
> >
> > I spoke to Numan offline a bit about this and I think the current plan
> > is to leave it as is and add proper locking as a follow up (or in v7).
> > But I think we still need a clear comment here warning users about this.
> >  Maybe we should add a comment where the lflow_ref structure is defined
> too.
> >
> > What do you think?
>
> I totally agree with you about adding comments to explain the thread safety
> considerations, and make it clear that the lflow_ref should always be
> associated with the object that is being processed by the thread.
> With regard to any follow up change for proper locking, I am not sure what
> scenario is your concern. I think if we always make sure the lflow_ref
> passed in is the one owned by the object then the current locking is
> sufficient. And this is the natural case.
>
> However, I did think about a situation where there might be a potential
> problem in the future when we need to maintain references for more than one
> input for the same lflow. For the "main" input, which is the object that
> the thread is iterating and generating lflow for, will have its lflow_ref
> passed in this function, but we might also need to maintain the reference
> of the lflow for a secondary input 

Re: [ovs-dev] [PATCH ovn v5 08/16] northd: Refactor lflow management into a separate module.

2024-01-24 Thread Han Zhou
On Wed, Jan 24, 2024 at 4:23 AM Dumitru Ceara  wrote:
>
> On 1/24/24 06:01, Han Zhou wrote:
> > On Fri, Jan 19, 2024 at 2:50 AM Dumitru Ceara  wrote:
> >>
> >> On 1/11/24 16:31, num...@ovn.org wrote:
> >>> +
> >>> +void
> >>> +lflow_table_add_lflow(struct lflow_table *lflow_table,
> >>> +  const struct ovn_datapath *od,
> >>> +  const unsigned long *dp_bitmap, size_t
> > dp_bitmap_len,
> >>> +  enum ovn_stage stage, uint16_t priority,
> >>> +  const char *match, const char *actions,
> >>> +  const char *io_port, const char *ctrl_meter,
> >>> +  const struct ovsdb_idl_row *stage_hint,
> >>> +  const char *where,
> >>> +  struct lflow_ref *lflow_ref)
> >>> +OVS_EXCLUDED(fake_hash_mutex)
> >>> +{
> >>> +struct ovs_mutex *hash_lock;
> >>> +uint32_t hash;
> >>> +
> >>> +ovs_assert(!od ||
> >>> +   ovn_stage_to_datapath_type(stage) ==
> > ovn_datapath_get_type(od));
> >>> +
> >>> +hash = ovn_logical_flow_hash(ovn_stage_get_table(stage),
> >>> + ovn_stage_get_pipeline(stage),
> >>> + priority, match,
> >>> + actions);
> >>> +
> >>> +hash_lock = lflow_hash_lock(_table->entries, hash);
> >>> +struct ovn_lflow *lflow =
> >>> +do_ovn_lflow_add(lflow_table, od, dp_bitmap,
> >>> + dp_bitmap_len, hash, stage,
> >>> + priority, match, actions,
> >>> + io_port, ctrl_meter, stage_hint, where);
> >>> +
> >>> +if (lflow_ref) {
> >>> +/* lflow referencing is only supported if 'od' is not NULL.
*/
> >>> +ovs_assert(od);
> >>> +
> >>> +struct lflow_ref_node *lrn =
> >>> +lflow_ref_node_find(_ref->lflow_ref_nodes, lflow,
> > hash);
> >>> +if (!lrn) {
> >>> +lrn = xzalloc(sizeof *lrn);
> >>> +lrn->lflow = lflow;
> >>> +lrn->dp_index = od->index;
> >>> +ovs_list_insert(_ref->lflows_ref_list,
> >>> +>lflow_list_node);
> >>> +inc_dp_refcnt(>dp_refcnts_map, lrn->dp_index);
> >>> +ovs_list_insert(>referenced_by,
> > >ref_list_node);
> >>> +
> >>> +hmap_insert(_ref->lflow_ref_nodes, >ref_node,
> > hash);
> >>> +}
> >>> +
> >>> +lrn->linked = true;
> >>> +}
> >>> +
> >>> +lflow_hash_unlock(hash_lock);
> >>> +
> >>> +}
> >>> +
> >>
> >> This part is not thread safe.
> >>
> >> If two threads try to add logical flows that have different hashes and
> >> lflow_ref is not NULL we're going to have a race condition when
> >> inserting to the _ref->lflow_ref_nodes hash map because the two
> >> threads will take different locks.
> >>
> >
> > I think it is safe because a lflow_ref is always associated with an
object,
> > e.g. port, datapath, lb, etc., and lflow generation for a single such
> > object is never executed in parallel, which is how the parallel lflow
build
> > is designed.
> > Does it make sense?
>
> It happens that it's safe in this current patch set because indeed we
> always process individual ports, datapaths, lbs, etc, in the same
> thread.  However, this code (lflow_table_add_lflow()) is generic and
> there's nothing (not even a comment) that would warn developers in the
> future about the potential race if the lflow_ref is shared.
>
> I spoke to Numan offline a bit about this and I think the current plan
> is to leave it as is and add proper locking as a follow up (or in v7).
> But I think we still need a clear comment here warning users about this.
>  Maybe we should add a comment where the lflow_ref structure is defined
too.
>
> What do you think?

I totally agree with you about adding comments to explain the thread safety
considerations, and make it clear that the lflow_ref should always be
associated with the object that is being processed by the thread.
With regard to any follow up change for proper locking, I am not sure what
scenario is your concern. I think if we always make sure the lflow_ref
passed in is the one owned by the object then the current locking is
sufficient. And this is the natural case.

However, I did think about a situation where there might be a potential
problem in the future when we need to maintain references for more than one
input for the same lflow. For the "main" input, which is the object that
the thread is iterating and generating lflow for, will have its lflow_ref
passed in this function, but we might also need to maintain the reference
of the lflow for a secondary input (or even third). In that case it is not
just the locking issue, but firstly we need to have a proper way to pass in
the secondary lflow_ref, which is what I had mentioned in the review
comments for v3:
https://mail.openvswitch.org/pipermail/ovs-dev/2023-December/410269.html.

Re: [ovs-dev] [PATCH ovn] Controller: Handle unconditional IDL messages while paused.

2024-01-24 Thread Mark Michelson

Hi Mohammad,

I'm having a hard time with this one, and mainly it's because of what 
the intended semantics of being "paused" are. I could see it argued that 
pausing ovn-controller should have the existing behavior. I could see 
someone writing a test where they pause ovn-controller specifically to 
trigger disconnection SB DB.


My comments below have the mindset that this change is the correct 
behavior. But in addition to my comments, I think we need to verify if 
this proposed change is what we actually want ovn-controller to do.


On 1/23/24 08:03, Mohammad Heib wrote:

If the user triggers a pause command to the ovn-controller the current
implementation will wait for commands from unixctl server only and
ignore the other component.

This implementation works fine if we don't have inactivity_probe set in
the SB DataBase, but once the user sets the inactivity_probe in SB
DataBase the connection will be dropped by the SBDB. Once the controller
resumes the execution it will try to commit some changes to the SBDB but
the transaction will fail since we lost the connection to the SBDB and
the controller must reconnect before committing the transaction again.

To avoid the above scenario the controller can keep handling
unconditional IDL messages to avoid reconnecting to SB.

Signed-off-by: Mohammad Heib 
---
  controller/ovn-controller.c | 16 +---
  ovn-sb.xml  |  2 +-
  tests/ovn-controller.at | 51 +
  3 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 856e5e270..d2c8f66d9 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -5534,12 +5534,22 @@ main(int argc, char *argv[])
  simap_destroy();
  }
  
-/* If we're paused just run the unixctl server and skip most of the

- * processing loop.
+/* If we're paused just run the unixctl-server/unconditional IDL and
+ *  skip most of the processing loop.
   */
  if (paused) {
  unixctl_server_run(unixctl);
+int ovnsb_seq = ovsdb_idl_get_seqno(ovnsb_idl_loop.idl);
+ovsdb_idl_run(ovnsb_idl_loop.idl);
+int new_ovnsb_seq = ovsdb_idl_get_seqno(ovnsb_idl_loop.idl);
+/* If the IDL content has changed while the controller is
+ * in pause state, trigger a recompute.
+ */
+if (new_ovnsb_seq != ovnsb_seq) {
+engine_set_force_recompute(true);
+}


If we're going to safeguard ourselves from being disconnected from the 
SB DB, then we should do the same for the OVS DB.



  unixctl_server_wait(unixctl);
+ovsdb_idl_wait(ovnsb_idl_loop.idl);
  goto loop_done;
  }
  
@@ -6009,7 +6019,6 @@ main(int argc, char *argv[])

  OVS_NOT_REACHED();
  }
  
-ovsdb_idl_track_clear(ovnsb_idl_loop.idl);

  ovsdb_idl_track_clear(ovs_idl_loop.idl);
  
  lflow_cache_run(ctrl_engine_ctx.lflow_cache);

@@ -6017,6 +6026,7 @@ main(int argc, char *argv[])
  
  loop_done:

  memory_wait();
+ovsdb_idl_track_clear(ovnsb_idl_loop.idl);
  poll_block();
  if (should_service_stop()) {
  exit_args.exiting = true;
diff --git a/ovn-sb.xml b/ovn-sb.xml
index e393f92b3..43c13f23c 100644
--- a/ovn-sb.xml
+++ b/ovn-sb.xml
@@ -4308,7 +4308,7 @@ tcp.flags = RST;

  Maximum number of milliseconds of idle time on connection to the 
client
  before sending an inactivity probe message.  If Open vSwitch does not
-communicate with the client for the specified number of seconds, it
+communicate with the client for the specified number of milliseconds,it


Please put a space between the comma and "it".


  will send a probe.  If a response is not received for the same
  additional amount of time, Open vSwitch assumes the connection has 
been
  broken and attempts to reconnect.  Default is implementation-specific.
diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index 9d2a37c72..04e4c52e7 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -352,6 +352,57 @@ OVS_APP_EXIT_AND_WAIT([ovsdb-server])
  AT_CLEANUP
  ])
  
+# Check that the connection to the Southbound database

+# is not dropped when probe-interval is set and the controller
+# is in pause state.
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([ovn-controller - check sbdb connection while pause])
+AT_KEYWORDS([ovn])
+ovn_start
+
+net_add n1
+sim_add hv
+as hv
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+
+ovs-vsctl -- add-port br-int hv1-vif1 -- \
+set interface hv1-vif1 external-ids:iface-id=sw0-p1 \
+options:tx_pcap=hv1/vif1-tx.pcap \
+options:rxq_pcap=hv1/vif1-rx.pcap \
+ofport-request=1
+
+ovs-vsctl set open . external_ids:ovn-remote-probe-interval=10
+ovn-sbctl set 

Re: [ovs-dev] [PATCH ovn v5 13/16] northd: Add ls_stateful handler for lflow engine node.

2024-01-24 Thread Numan Siddique
On Fri, Jan 19, 2024 at 6:57 AM Dumitru Ceara  wrote:
>
> On 1/11/24 16:33, num...@ovn.org wrote:
> > From: Numan Siddique 
> >
> > Signed-off-by: Numan Siddique 
> > ---
>
> Hi Numan,
>
> > diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c
> > index dcce79510a..f7c3d2bcf5 100644
> > --- a/northd/inc-proc-northd.c
> > +++ b/northd/inc-proc-northd.c
> > @@ -228,11 +228,11 @@ void inc_proc_northd_init(struct ovsdb_idl_loop *nb,
> >  engine_add_input(_lflow, _sb_logical_flow, NULL);
> >  engine_add_input(_lflow, _sb_multicast_group, NULL);
> >  engine_add_input(_lflow, _sb_igmp_group, NULL);
> > -engine_add_input(_lflow, _ls_stateful, NULL);
> >  engine_add_input(_lflow, _sb_logical_dp_group, NULL);
> >  engine_add_input(_lflow, _northd, lflow_northd_handler);
> >  engine_add_input(_lflow, _port_group, lflow_port_group_handler);
> >  engine_add_input(_lflow, _lr_stateful, 
> > lflow_lr_stateful_handler);
> > +engine_add_input(_lflow, _ls_stateful, 
> > lflow_ls_stateful_handler);
> >
> >  engine_add_input(_sync_to_sb_addr_set, _nb_address_set,
> >   sync_to_sb_addr_set_nb_address_set_handler);
> > diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c
> > index 6cb2a367fe..d81b13a25c 100644
> > --- a/northd/lflow-mgr.c
> > +++ b/northd/lflow-mgr.c
> > @@ -442,7 +442,7 @@ lflow_ref_unlink_lflows(struct lflow_ref *lflow_ref)
> >  BITMAP_FOR_EACH_1 (index, lrn->dpgrp_bitmap_len,
> > lrn->dpgrp_bitmap) {
> >  if (dec_dp_refcnt(>lflow->dp_refcnts_map, index)) {
> > -bitmap_set0(lrn->lflow->dpg_bitmap, lrn->dp_index);
> > +bitmap_set0(lrn->lflow->dpg_bitmap, index);
>
> Hmm, is this something that should go in "[PATCH ovn v5 01/16] northd:
> Refactor the northd change tracking."?

I guess you mean patch 11 "northd: Handle lb changes in lflow engine.".
If so, that's correct.  I'll fix it in v6.

Thanks
Numan

>
> Thanks,
> Dumitru
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd: Make sure that affinity flows match on VIP.

2024-01-24 Thread Mark Michelson

Hi Ales,

Acked-by: Mark Michelson 

I was ready to point out this doesn't have an accompanying change to 
ovn-northd.8.xml . However, looking at the existing documentation, the 
match on the LB VIP is already documented. Apparently this was planned 
from the beginning but didn't actually make it into the code.


On 1/23/24 10:00, Ales Musil wrote:

Consider the two following LBs:
1) 192.168.0.100:8080 -> 192.168.0.10:80
affinity_timeout=120, skip_snat=true
2) 172.10.0.100:8080  -> 192.168.0.10:80
affinity_timeout=120, skip_snat=false

Connected to the same LR with "lb_force_snat_ip" option set.
This combination would produce two flows with the same match
but different action:
1) priority=150, match=(reg9[6] == 1 && ct.new && ip4 && reg4 == 192.168.0.10 
&& reg8[0..15] == 80)
action=(reg0 = 192.168.0.100; flags.skip_snat_for_lb = 1; 
ct_lb_mark(backends=192.168.0.10:80; skip_snat);)

2) priority=150, match=(reg9[6] == 1 && ct.new && ip4 && reg4 == 192.168.0.10 
&& reg8[0..15] == 80)
action=(reg0 = 172.10.0.100; flags.force_snat_for_lb = 1; 
ct_lb_mark(backends=192.168.0.10:80; force_snat);)

This causes issues because it's not defined what flow will take
priority because they have the same match. So it might happen that
it the traffic undergoes SNAT when it shouldn't or vice versa.

To make sure that correct action is taken, differentiate the match by
adding VIP match (ip.dst == VIP).

Fixes: d3926b433e44 ("northd: rely on new actions for lb affinity")
Reported-at: https://issues.redhat.com/browse/FDP-290
Signed-off-by: Ales Musil 
---
  northd/northd.c  | 14 +++-
  tests/ofproto-macros.at  |  4 ++--
  tests/ovn-northd.at  | 48 +++-
  tests/system-ovn-kmod.at |  8 +++
  4 files changed, 52 insertions(+), 22 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index de15ca101..5763a1b8b 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -8404,12 +8404,12 @@ build_lb_rules_pre_stateful(struct hmap *lflows,
   *
   * - load balancing:
   *   table=lr_in_dnat, priority=150
- *  match=(REGBIT_KNOWN_LB_SESSION == 1 && ct.new && ip4
+ *  match=(REGBIT_KNOWN_LB_SESSION == 1 && ct.new && ip4.dst == V
   * && REG_LB_AFF_BACKEND_IP4 == B1 && REG_LB_AFF_MATCH_PORT == 
BP1)
   *  action=(REG_NEXT_HOP_IPV4 = V; lb_action;
   *  ct_lb_mark(backends=B1:BP1; ct_flag);)
   *   table=lr_in_dnat, priority=150
- *  match=(REGBIT_KNOWN_LB_SESSION == 1 && ct.new && ip4
+ *  match=(REGBIT_KNOWN_LB_SESSION == 1 && ct.new && ip4.dst == V
   * && REG_LB_AFF_BACKEND_IP4 == B2 && REG_LB_AFF_MATCH_PORT == 
BP2)
   *  action=(REG_NEXT_HOP_IPV4 = V; lb_action;
   *  ct_lb_mark(backends=B2:BP2; ct_flag);)
@@ -8508,7 +8508,8 @@ build_lb_affinity_lr_flows(struct hmap *lflows, const 
struct ovn_northd_lb *lb,
  
  /* Prepare common part of affinity match. */

  ds_put_format(_match, REGBIT_KNOWN_LB_SESSION" == 1 && "
-  "ct.new && %s && %s == ", ip_match, reg_backend);
+  "ct.new && %s.dst == %s && %s == ", ip_match,
+  lb_vip->vip_str, reg_backend);
  
  /* Store the common part length. */

  size_t aff_action_len = aff_action.length;
@@ -8587,13 +8588,13 @@ build_lb_affinity_lr_flows(struct hmap *lflows, const 
struct ovn_northd_lb *lb,
   *
   * - load balancing:
   *   table=ls_in_lb, priority=150
- *  match=(REGBIT_KNOWN_LB_SESSION == 1 && ct.new && ip4
+ *  match=(REGBIT_KNOWN_LB_SESSION == 1 && ct.new && ip4.dst == V
   * && REG_LB_AFF_BACKEND_IP4 == B1 && REG_LB_AFF_MATCH_PORT == 
BP1)
   *  action=(REGBIT_CONNTRACK_COMMIT = 0;
   *  REG_ORIG_DIP_IPV4 = V; REG_ORIG_TP_DPORT = VP;
   *  ct_lb_mark(backends=B1:BP1);)
   *   table=ls_in_lb, priority=150
- *  match=(REGBIT_KNOWN_LB_SESSION == 1 && ct.new && ip4
+ *  match=(REGBIT_KNOWN_LB_SESSION == 1 && ct.new && ip4.dst == V
   * && REG_LB_AFF_BACKEND_IP4 == B2 && REG_LB_AFF_MATCH_PORT == 
BP2)
   *  action=(REGBIT_CONNTRACK_COMMIT = 0;
   *  REG_ORIG_DIP_IPV4 = V;
@@ -8696,7 +8697,8 @@ build_lb_affinity_ls_flows(struct hmap *lflows,
  
  /* Prepare common part of affinity match. */

  ds_put_format(_match, REGBIT_KNOWN_LB_SESSION" == 1 && "
-  "ct.new && %s && %s == ", ip_match, reg_backend);
+  "ct.new && %s.dst == %s && %s == ", ip_match,
+  lb_vip->vip_str, reg_backend);
  
  /* Store the common part length. */

  size_t aff_action_len = aff_action.length;
diff --git a/tests/ofproto-macros.at b/tests/ofproto-macros.at
index 07ef1d092..bccedbaf7 100644
--- a/tests/ofproto-macros.at
+++ b/tests/ofproto-macros.at
@@ -200,14 +200,14 @@ m4_define([_OVS_VSWITCHD_START],
 AT_CHECK([[sed < stderr '
  /vlog|INFO|opened log file/d
  /ovsdb_server|INFO|ovsdb-server (Open vSwitch)/d']])
-   

Re: [ovs-dev] [PATCH] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-01-24 Thread Mike Pattrick
On Wed, Jan 24, 2024 at 1:14 PM Mike Pattrick  wrote:
>
> When sending packets that are flagged as requiring segmentation to an
> interface that doens't support this feature, send the packet to the TSO
> software fallback instead of dropping it.
>
> Signed-off-by: Mike Pattrick 

For some reason, this patch fails github CI but not Intel CI and
didn't have any errors when I tested it. I'm looking into the issue.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dp-packet: Reset offload flags when clearing a packet.

2024-01-24 Thread Ilya Maximets
On 1/24/24 17:39, Aaron Conole wrote:
> Dumitru Ceara  writes:
> 
>> On 1/23/24 00:11, Mike Pattrick wrote:
>>> The OVN test suite identified a bug in dp_packet_ol_send_prepare() where
>>> a BFD packet flagged as double encapsulated would trigger a seg fault.
>>> The problem surfaced because bfd_put_packet was reusing a packet
>>> allocated on the stack that wasn't having its flags reset between calls.
>>>
>>
>> Thanks for tracking this one down, Mike!
>>
>>> This change will reset OL flags in data_clear(), which should fix this
>>> type of packet reuse issue in general as long as data_clear() is called
>>> in between uses. This change also includes a tangentially related check
>>> in dp_packet_inner_l4_size(), where the correct offset was not being
>>> checked.
>>
>> Up to maintainers but should this tangential fix be a separate patch?
> 
> I think it makes sense.  These are two different issues, so probably
> should go as two separate patches.

+1

> 
>>>
>>> Fixes: 8b5fe2dc6080 ("userspace: Add Generic Segmentation Offloading.")
>>> Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
>>> Reported-by: Dumitru Ceara 
>>> Reported-at: https://issues.redhat.com/browse/FDP-300
>>> Signed-off-by: Mike Pattrick 
>>> ---
>>
>> I'm no expert in this code but the change itself looks correct to me and
>> OVN tests pass with this applied:
>>
>> Reviewed-by: Dumitru Ceara 
>>
>>>  lib/dp-packet.h | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
>>> index 939bec5c8..f328a6637 100644
>>> --- a/lib/dp-packet.h
>>> +++ b/lib/dp-packet.h
>>> @@ -207,6 +207,7 @@ void *dp_packet_resize_l2(struct dp_packet *, int 
>>> increment);
>>>  void *dp_packet_resize_l2_5(struct dp_packet *, int increment);
>>>  static inline void *dp_packet_eth(const struct dp_packet *);
>>>  static inline void dp_packet_reset_offsets(struct dp_packet *);
>>> +static inline void dp_packet_reset_offload(struct dp_packet *);
>>>  static inline uint16_t dp_packet_l2_pad_size(const struct dp_packet *);
>>>  static inline void dp_packet_set_l2_pad_size(struct dp_packet *, uint16_t);
>>>  static inline void *dp_packet_l2_5(const struct dp_packet *);
>>> @@ -380,6 +381,7 @@ dp_packet_clear(struct dp_packet *b)
>>>  {
>>>  dp_packet_set_data(b, dp_packet_base(b));
>>>  dp_packet_set_size(b, 0);
>>> +dp_packet_reset_offload(b);

Should we also reset offsets here?  They are also not valid after
clearing the packet data.

Looking through the different dp-packet functions it seems we're
missing adjustments for inner offsets in dp_packet_resize_l2_5().
May be a problem if we would also need to push a vlan after
encapsulation of packet with checksum offload.
May be a separate fix as well.


>>>  }
>>>  
>>>  /* Removes 'size' bytes from the head end of 'b', which must contain at 
>>> least
>>> @@ -537,7 +539,7 @@ dp_packet_inner_l4(const struct dp_packet *b)
>>>  static inline size_t
>>>  dp_packet_inner_l4_size(const struct dp_packet *b)
>>>  {
>>> -return OVS_LIKELY(b->l4_ofs != UINT16_MAX)
>>> +return OVS_LIKELY(b->inner_l4_ofs != UINT16_MAX)
>>> ? (const char *) dp_packet_tail(b)
>>> - (const char *) dp_packet_inner_l4(b)
>>> - dp_packet_l2_pad_size(b)
>>
>> Regards,
>> Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] userspace: Check for inner L3 while preparing encapsulated packets.

2024-01-24 Thread Ilya Maximets
On 1/23/24 14:31, Mike Pattrick wrote:
> On Tue, Jan 23, 2024 at 7:41 AM Ilya Maximets  wrote:
>>
>> On 1/22/24 23:24, Mike Pattrick wrote:
>>> On Mon, Jan 22, 2024 at 4:10 PM Ilya Maximets  wrote:

 On 1/22/24 21:33, Mike Pattrick wrote:
> On Mon, Jan 22, 2024 at 1:47 PM Ilya Maximets  wrote:
>>
>> On 1/22/24 18:51, Mike Pattrick wrote:
>>> The OVN test suite identified a bug in dp_packet_ol_send_prepare() where
>>> a double encapsulated BFD packet would trigger a seg fault. This
>>> happened because we had assumed that if IP checksumming was marked as
>>> offloaded, that checksum would occur in the innermost packet.
>>>
>>> This change will check that the inner packet has an L3 and L4 before
>>> checksumming those parts. And if missing, will resume checksumming the
>>> outer components.
>>
>> Hrm.  This looks like a workaround rather than a fix.  If the inner 
>> packet
>> doesn't have L3 header, dp-packet must not have a flag for L3 
>> checksumming
>> set in the first place.  And if it does have inner L3, the offset must be
>> initialized.  I guess, some of the offsets can be not initialized, 
>> because
>> the packet is never parsed by either miniflow_extract() or 
>> parse_tcp_flags().
>> And the bfd_put_packet() doesn't seem to set them.
>
> I think you're right, I stepped through the problem in GDB and it was
> clear that the flags weren't getting reset properly between BFD
> packets. I'll send an updated patch.

 Yeah, flags preservation is one thing, the other is that l3/l4_ofs are just
 not populated in these packets, which doesn't sound right.

 It's also not clear why the packet is marked for inner checksum offload if
 the inner checksum is actually fully calculated and correct.  I understand
 that the flags are carried over from a previous packet, but why did that
 previous packet have this flag set?
>>>
>>> Here's an example:
>>>
>>> monitor_run()
>>> \_ dp_packet_use_stub(, stub, sizeof stub); <--- initializes packet 
>>> once
>>> \_ while(!heap_is_empty(_heap) <-- loop where the same packet
>>> can be reused
>>> \_ monitor_mport_run()
>>> \_ dp_packet_clear() <- Data cleared, but flags are not cleared
>>> \_ cfm_compose_ccm() / bfd_put_packet() / lldp_put_packet() <-
>>> one or more can run
>>> \_ Note that cfm_compose_ccm and lldp_put_packet call
>>> eth_compose, but bfd_put_packet doesn't. bfd_put_packet doesn't reset
>>> the offsets like eth_compose does.
>>
>> Sounds like a bug to me.  bfd_put_packet() should set correct offsets
>> in the packet, otherwise we'll get random failures with different
>> actions that may depend on these offsets to be populated.
> 
> Agreed. Do you want me to homologate this as part of this series?

Yeah, a separate patch for this would be great.

> 
>>
>>> \_ ofproto_dpif_send_packet()
>>>    non-relevant stack trace ...
>>>  \_ netdev_push_header()
>>>   \_ First run, push geneve header and toggle geneve flag
>>>   \_ Second run, detect geneve header flag, call 
>>> dp_packet_ol_send_prepare()
>>>
>>> Given the above, I think it makes sense to clear the offload flags in
>>> dp_packet_clear().
>>
>> I agree with that.  But it still doesn't explain why the 
>> DP_PACKET_OL_TX_IP_CKSUM
>> is set after the first run.  The inner checksum is fully calculated and 
>> correct.
>> There should be no Tx offloading for it set.  Only the 
>> DP_PACKET_OL_TX_OUTER_IP_CKSUM
>> should be set in this packet.  Or am I missing something?
> 
> bfd_put_packet() calculates the checksum, so it will always be
> correct, miniflow_extract() sets DP_PACKET_OL_TX_IP_CKSUM if the
> previous packet had DP_PACKET_OL_RX_IP_CKSUM_GOOD, which it would have
> gotten from dp_packet_ol_send_prepare().

OK.  That makes sense.  I understand now.  Thanks!

> 
> 
>>
>>>
>>> -M
>>>

>
>> BTW, is there actually a double encapsulation in the original OVN test?
>> Sounds strange.
>
> The problem was exposed in netdev_push_header() in
> dp_packet_ol_send_prepare(packet, 0);

 That's true, but the packet dump in gdb showed a plain BFD packet.
 So, it was a first encapsulation, not double.

>
> -M
>
>>
>>>
>>> Fixes: 8b5fe2dc6080 ("userspace: Add Generic Segmentation Offloading.")
>>> Reported-by: Dumitru Ceara 
>>> Reported-at: https://issues.redhat.com/browse/FDP-300
>>> Signed-off-by: Mike Pattrick 
>>> ---
>>>  lib/dp-packet.h | 10 --
>>>  lib/packets.c   |  6 +++---
>>>  2 files changed, 11 insertions(+), 5 deletions(-)
>>
>> Best regards, Ilya Maximets.
>>
>

>>>
>>
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-01-24 Thread Mike Pattrick
When sending packets that are flagged as requiring segmentation to an
interface that doens't support this feature, send the packet to the TSO
software fallback instead of dropping it.

Signed-off-by: Mike Pattrick 
---
 lib/dp-packet-gso.c | 73 +
 lib/dp-packet.h | 26 +++
 lib/netdev-native-tnl.c |  8 +
 lib/netdev.c| 19 ---
 tests/system-traffic.at | 48 +++
 5 files changed, 142 insertions(+), 32 deletions(-)

diff --git a/lib/dp-packet-gso.c b/lib/dp-packet-gso.c
index 847685ad9..f25abf436 100644
--- a/lib/dp-packet-gso.c
+++ b/lib/dp-packet-gso.c
@@ -47,6 +47,8 @@ dp_packet_gso_seg_new(const struct dp_packet *p, size_t 
hdr_len,
 seg->l2_5_ofs = p->l2_5_ofs;
 seg->l3_ofs = p->l3_ofs;
 seg->l4_ofs = p->l4_ofs;
+seg->inner_l3_ofs = p->inner_l3_ofs;
+seg->inner_l4_ofs = p->inner_l4_ofs;
 
 /* The protocol headers remain the same, so preserve hash and mark. */
 *dp_packet_rss_ptr(seg) = *dp_packet_rss_ptr(p);
@@ -71,7 +73,12 @@ dp_packet_gso_nr_segs(struct dp_packet *p)
 const char *data_tail;
 const char *data_pos;
 
-data_pos = dp_packet_get_tcp_payload(p);
+if (dp_packet_hwol_is_tunnel_vxlan(p) ||
+dp_packet_hwol_is_tunnel_geneve(p)) {
+data_pos = dp_packet_get_inner_tcp_payload(p);
+} else {
+data_pos = dp_packet_get_tcp_payload(p);
+}
 data_tail = (char *) dp_packet_tail(p) - dp_packet_l2_pad_size(p);
 
 return DIV_ROUND_UP(data_tail - data_pos, segsz);
@@ -91,12 +98,15 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 struct tcp_header *tcp_hdr;
 struct ip_header *ip_hdr;
 struct dp_packet *seg;
+const char *data_pos;
 uint16_t tcp_offset;
 uint16_t tso_segsz;
+uint16_t ip_id = 0;
 uint32_t tcp_seq;
-uint16_t ip_id;
+bool outer_ipv4;
 int hdr_len;
 int seg_len;
+bool tnl;
 
 tso_segsz = dp_packet_get_tso_segsz(p);
 if (!tso_segsz) {
@@ -105,20 +115,35 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 return false;
 }
 
-tcp_hdr = dp_packet_l4(p);
-tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
-tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
-hdr_len = ((char *) dp_packet_l4(p) - (char *) dp_packet_eth(p))
-  + tcp_offset * 4;
-ip_id = 0;
-if (dp_packet_hwol_is_ipv4(p)) {
+if (dp_packet_hwol_is_tunnel_vxlan(p) ||
+dp_packet_hwol_is_tunnel_geneve(p)) {
+data_pos =  dp_packet_get_inner_tcp_payload(p);
+outer_ipv4 = dp_packet_hwol_is_outer_ipv4(p);
+tcp_hdr = dp_packet_inner_l4(p);
+ip_hdr = dp_packet_inner_l3(p);
+tnl = true;
+if (outer_ipv4) {
+ip_id = ntohs(((struct ip_header *) dp_packet_l3(p))->ip_id);
+} else if (dp_packet_hwol_is_ipv4(p)) {
+ip_id = ntohs(ip_hdr->ip_id);
+}
+} else {
+data_pos = dp_packet_get_tcp_payload(p);
+outer_ipv4 = dp_packet_hwol_is_ipv4(p);
+tcp_hdr = dp_packet_l4(p);
 ip_hdr = dp_packet_l3(p);
-ip_id = ntohs(ip_hdr->ip_id);
+tnl = false;
+if (outer_ipv4) {
+ip_id = ntohs(ip_hdr->ip_id);
+}
 }
 
+tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
+tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
+hdr_len = ((char *) tcp_hdr - (char *) dp_packet_eth(p))
+  + tcp_offset * 4;
 const char *data_tail = (char *) dp_packet_tail(p)
 - dp_packet_l2_pad_size(p);
-const char *data_pos = dp_packet_get_tcp_payload(p);
 int n_segs = dp_packet_gso_nr_segs(p);
 
 for (int i = 0; i < n_segs; i++) {
@@ -130,8 +155,26 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
**batches)
 seg = dp_packet_gso_seg_new(p, hdr_len, data_pos, seg_len);
 data_pos += seg_len;
 
+if (tnl) {
+/* Update tunnel L3 header. */
+if (dp_packet_hwol_is_ipv4(seg)) {
+ip_hdr = dp_packet_inner_l3(seg);
+ip_hdr->ip_tot_len = htons(sizeof *ip_hdr +
+   dp_packet_inner_l4_size(seg));
+ip_hdr->ip_id = htons(ip_id);
+ip_hdr->ip_csum = 0;
+ip_id++;
+} else {
+struct ovs_16aligned_ip6_hdr *ip6_hdr;
+
+ip6_hdr = dp_packet_inner_l3(seg);
+ip6_hdr->ip6_ctlun.ip6_un1.ip6_un1_plen
+= htons(dp_packet_inner_l3_size(seg) - sizeof *ip6_hdr);
+}
+}
+
 /* Update L3 header. */
-if (dp_packet_hwol_is_ipv4(seg)) {
+if (outer_ipv4) {
 ip_hdr = dp_packet_l3(seg);
 ip_hdr->ip_tot_len = htons(sizeof *ip_hdr +
dp_packet_l4_size(seg));
@@ -146,7 +189,11 @@ dp_packet_gso(struct dp_packet *p, 

Re: [ovs-dev] [PATCH v4 ovn] ovn: Add geneve PMTUD support.

2024-01-24 Thread Numan Siddique
On Tue, Jan 23, 2024 at 11:14 AM Lorenzo Bianconi
 wrote:
>
> Introduce specif flows for E/W ICMPv{4,6} packets if tunnelled packets
> do not fit path MTU. This patch enable PMTUD for East/West Geneve traffic.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2241711
> Tested-at: 
> https://github.com/LorenzoBianconi/ovn/actions/runs/7627345562/job/20777962659
> Signed-off-by: Lorenzo Bianconi 

Thanks Lorenzo for fixing this issue and for the patience.

I applied this patch to main and branch-23.09

Numan

> ---
> Changes since v3:
> - fix flaky ovn multinode test
> - rebase on top of ovn main branch
> Changes since v2:
> - fix openshift regression for regular icmp traffic received from a remote 
> node
> - add gw router selftest
> Changes since v1:
> - add fix for vxlan and stt tunnels
> ---
>  .../workflows/ovn-fake-multinode-tests.yml|   6 +-
>  NEWS  |   1 +
>  controller/physical.c |  30 +-
>  include/ovn/logical-fields.h  |   3 +
>  lib/logical-fields.c  |   2 +
>  northd/northd.c   |  83 ++
>  northd/ovn-northd.8.xml   |  41 +-
>  tests/multinode.at| 813 +-
>  tests/ovn-northd.at   |  12 +
>  9 files changed, 985 insertions(+), 6 deletions(-)
>
> diff --git a/.github/workflows/ovn-fake-multinode-tests.yml 
> b/.github/workflows/ovn-fake-multinode-tests.yml
> index 25610df53..b3ba82a30 100644
> --- a/.github/workflows/ovn-fake-multinode-tests.yml
> +++ b/.github/workflows/ovn-fake-multinode-tests.yml
> @@ -76,8 +76,8 @@ jobs:
>fail-fast: false
>matrix:
>  cfg:
> -- { branch: "main" }
> -- { branch: "branch-22.03" }
> +- { branch: "main", testsuiteflags: ""}
> +- { branch: "branch-22.03", testsuiteflags: "-k 'basic test'" }
>  name: multinode tests ${{ join(matrix.cfg.*, ' ') }}
>  env:
>RUNC_CMD: podman
> @@ -176,7 +176,7 @@ jobs:
>
>  - name: Run fake-multinode system tests
>run: |
> -if ! sudo make check-multinode; then
> +if ! sudo make check-multinode TESTSUITEFLAGS="${{ 
> matrix.cfg.testsuiteflags }}"; then
>sudo podman exec -it ovn-central ovn-nbctl show || :
>sudo podman exec -it ovn-central ovn-sbctl show || :
>sudo podman exec -it ovn-chassis-1 ovs-vsctl show || :
> diff --git a/NEWS b/NEWS
> index 5f267b4c6..72c40df82 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -14,6 +14,7 @@ Post v23.09.0
>- ovn-northd-ddlog has been removed.
>- A new LSP option "enable_router_port_acl" has been added to enable
>  conntrack for the router port whose peer is l3dgw_port if set it true.
> +  - Enable PMTU discovery on geneve tunnels for E/W traffic.
>
>  OVN v23.09.0 - 15 Sep 2023
>  --
> diff --git a/controller/physical.c b/controller/physical.c
> index eda085441..c3f8769bb 100644
> --- a/controller/physical.c
> +++ b/controller/physical.c
> @@ -2452,9 +2452,37 @@ physical_run(struct physical_ctx *p_ctx,
>  }
>
>  put_resubmit(OFTABLE_LOCAL_OUTPUT, );
> -
>  ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 100, 0, ,
>  , hc_uuid);
> +
> +/* Set allow rx from tunnel bit. */
> +put_load(1, MFF_LOG_FLAGS, MLF_RX_FROM_TUNNEL_BIT, 1, );
> +
> +/* Add specif flows for E/W ICMPv{4,6} packets if tunnelled packets
> + * do not fit path MTU.
> + */
> +put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, );
> +
> +/* IPv4 */
> +match_init_catchall();
> +match_set_in_port(, tun->ofport);
> +match_set_dl_type(, htons(ETH_TYPE_IP));
> +match_set_nw_proto(, IPPROTO_ICMP);
> +match_set_icmp_type(, 3);
> +match_set_icmp_code(, 4);
> +
> +ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 120, 0, ,
> +, hc_uuid);
> +/* IPv6 */
> +match_init_catchall();
> +match_set_in_port(, tun->ofport);
> +match_set_dl_type(, htons(ETH_TYPE_IPV6));
> +match_set_nw_proto(, IPPROTO_ICMPV6);
> +match_set_icmp_type(, 2);
> +match_set_icmp_code(, 0);
> +
> +ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 120, 0, ,
> +, hc_uuid);
>  }
>
>  /* Add VXLAN specific rules to transform port keys
> diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
> index 272277ec4..e8e0e6d33 100644
> --- a/include/ovn/logical-fields.h
> +++ b/include/ovn/logical-fields.h
> @@ -78,6 +78,7 @@ enum mff_log_flags_bits {
>  MLF_LOOKUP_COMMIT_ECMP_NH_BIT = 13,
>  MLF_USE_LB_AFF_SESSION_BIT = 14,
>  MLF_LOCALNET_BIT = 15,
> +MLF_RX_FROM_TUNNEL_BIT = 16,
>  };
>
>  /* MFF_LOG_FLAGS_REG flag assignments */
> @@ -129,6 +130,8 @@ enum mff_log_flags {
>  /* Indicate that the port 

Re: [ovs-dev] [PATCH] dp-packet: Reset offload flags when clearing a packet.

2024-01-24 Thread Aaron Conole
Dumitru Ceara  writes:

> On 1/23/24 00:11, Mike Pattrick wrote:
>> The OVN test suite identified a bug in dp_packet_ol_send_prepare() where
>> a BFD packet flagged as double encapsulated would trigger a seg fault.
>> The problem surfaced because bfd_put_packet was reusing a packet
>> allocated on the stack that wasn't having its flags reset between calls.
>> 
>
> Thanks for tracking this one down, Mike!
>
>> This change will reset OL flags in data_clear(), which should fix this
>> type of packet reuse issue in general as long as data_clear() is called
>> in between uses. This change also includes a tangentially related check
>> in dp_packet_inner_l4_size(), where the correct offset was not being
>> checked.
>
> Up to maintainers but should this tangential fix be a separate patch?

I think it makes sense.  These are two different issues, so probably
should go as two separate patches.

>> 
>> Fixes: 8b5fe2dc6080 ("userspace: Add Generic Segmentation Offloading.")
>> Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
>> Reported-by: Dumitru Ceara 
>> Reported-at: https://issues.redhat.com/browse/FDP-300
>> Signed-off-by: Mike Pattrick 
>> ---
>
> I'm no expert in this code but the change itself looks correct to me and
> OVN tests pass with this applied:
>
> Reviewed-by: Dumitru Ceara 
>
>>  lib/dp-packet.h | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>> 
>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
>> index 939bec5c8..f328a6637 100644
>> --- a/lib/dp-packet.h
>> +++ b/lib/dp-packet.h
>> @@ -207,6 +207,7 @@ void *dp_packet_resize_l2(struct dp_packet *, int 
>> increment);
>>  void *dp_packet_resize_l2_5(struct dp_packet *, int increment);
>>  static inline void *dp_packet_eth(const struct dp_packet *);
>>  static inline void dp_packet_reset_offsets(struct dp_packet *);
>> +static inline void dp_packet_reset_offload(struct dp_packet *);
>>  static inline uint16_t dp_packet_l2_pad_size(const struct dp_packet *);
>>  static inline void dp_packet_set_l2_pad_size(struct dp_packet *, uint16_t);
>>  static inline void *dp_packet_l2_5(const struct dp_packet *);
>> @@ -380,6 +381,7 @@ dp_packet_clear(struct dp_packet *b)
>>  {
>>  dp_packet_set_data(b, dp_packet_base(b));
>>  dp_packet_set_size(b, 0);
>> +dp_packet_reset_offload(b);
>>  }
>>  
>>  /* Removes 'size' bytes from the head end of 'b', which must contain at 
>> least
>> @@ -537,7 +539,7 @@ dp_packet_inner_l4(const struct dp_packet *b)
>>  static inline size_t
>>  dp_packet_inner_l4_size(const struct dp_packet *b)
>>  {
>> -return OVS_LIKELY(b->l4_ofs != UINT16_MAX)
>> +return OVS_LIKELY(b->inner_l4_ofs != UINT16_MAX)
>> ? (const char *) dp_packet_tail(b)
>> - (const char *) dp_packet_inner_l4(b)
>> - dp_packet_l2_pad_size(b)
>
> Regards,
> Dumitru
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v5 4/4] ic/tests: Add unit test for ic sync command.

2024-01-24 Thread Ales Musil
On Wed, Jan 24, 2024 at 3:28 PM Mohammad Heib  wrote:

> add unit test that check validate that sync command
> sync ISB properly
>
> Signed-off-by: Mohammad Heib 
> ---
>  tests/ovn-ic.at | 43 +++
>  1 file changed, 43 insertions(+)
>
> diff --git a/tests/ovn-ic.at b/tests/ovn-ic.at
> index d4c436f84..535aba7da 100644
> --- a/tests/ovn-ic.at
> +++ b/tests/ovn-ic.at
> @@ -1274,3 +1274,46 @@ OVN_CLEANUP_IC([az1], [az2])
>
>  AT_CLEANUP
>  ])
> +
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([ovn-ic -- sync ISB status to INB])
> +ovn_init_ic_db
> +net_add n1
> +
> +ovn_start az1
> +sim_add gw-az1
> +as gw-az1
> +
> +check ovs-vsctl add-br br-phys
> +ovn_az_attach az1 n1 br-phys 192.168.1.1
> +check ovs-vsctl set open . external-ids:ovn-is-interconn=true
> +as az1
> +
> +# pause ovn-ic instance
> +check ovn-appctl -t ic/ovn-ic pause
> +
> +# run sync command in the background this commands
> +# supposed to stuck since ovn-ic is paused.
> +ovn-ic-nbctl --wait=sb sync &
> +
> +OVS_WAIT_UNTIL([test $(ovn-ic-nbctl get ic_nb_global . nb_ic_cfg) -gt
> $(ovn-ic-nbctl get ic_nb_global . sb_ic_cfg)])
> +AT_CHECK([ovn-ic-nbctl get ic_nb_global . nb_ic_cfg], [0], [dnl
> +1
> +])
> +AT_CHECK([ovn-ic-nbctl get ic_nb_global . sb_ic_cfg], [0], [dnl
> +0
> +])
> +
> +# resume ovn-ic instance
> +check ovn-appctl -t ic/ovn-ic resume
> +OVS_WAIT_UNTIL([test $(ovn-ic-nbctl get ic_nb_global . nb_ic_cfg) -eq
> $(ovn-ic-nbctl get ic_nb_global . sb_ic_cfg)])
> +AT_CHECK([ovn-ic-nbctl get ic_nb_global . nb_ic_cfg], [0], [dnl
> +1
> +])
> +AT_CHECK([ovn-ic-nbctl get ic_nb_global . sb_ic_cfg], [0], [dnl
> +1
> +])
> +
> +OVN_CLEANUP_IC([az1])
> +AT_CLEANUP
> +])
> --
> 2.34.3
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
Looks good to me, thanks.

Acked-by: Ales Musil 

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA 

amu...@redhat.com

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v5 2/4] ovn-ic: Implement basic INB change handling status.

2024-01-24 Thread Ales Musil
On Wed, Jan 24, 2024 at 3:28 PM Mohammad Heib  wrote:

> This patch implements a basic sequence number protocol
> that can be used by CMS to determine if the changes
> applied to INB are successfully propagated to ISB.
>
> The implementation of this patch relies on OVN-ICs
> instances to update the ISB by adding a per AZ a nb_ic_cfg
> counter that will be updated by the OVN-IC once it is done
> and commit all needed changes to the ISB, and according to this
> AZ:nb_ic_cfg the ISB and INB will be updating about the status
> of the changes.
>
> Signed-off-by: Mohammad Heib 
> ---
>  ic/ovn-ic.c | 95 ++---
>  1 file changed, 90 insertions(+), 5 deletions(-)
>
> diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
> index 8ceb34d7c..d0b58d4d1 100644
> --- a/ic/ovn-ic.c
> +++ b/ic/ovn-ic.c
> @@ -1782,16 +1782,95 @@ route_run(struct ic_context *ctx,
>  hmap_destroy(_lrs);
>  }
>
> +/*
> + * This function implements a sequence number protocol that can be used by
> + * the INB end user to verify that ISB is synced with all the changes that
> + * are done be the user/AZs-controllers:
> + *
> + * Since we have multiple IC instances running in different regions
> + * we can't rely on one of them to update the ISB and sync that update
> + * to INB since other ICs can make changes in parallel.
> + * So to have a sequence number protocol working properly we must
> + * make sure that all the IC instances are synced with the ISB first
> + * and then update the INB.
> + *
> + * To guarantee that all instances are synced with ISB first, each IC
> + * will do the following steps:
> + *
> + * 1. when local ovn-ic sees that INB:nb_ic_cfg has updated we will set
> + *the ic_sb_loop->next_cfg to match the INB:nb_ic_cfg and increment
> + *the value of AZ:nb_ic_cfg and wait until we get confirmation from
> + *the server.
> + *
> + * 2. once this IC instance changes for ISB are committed successfully
> + *(next loop), the value of cur_cfg will be updated to match
> + *the INB:nb_ic_cfg that indicate that our local instance is up to
> date
> + *and no more changes need to be done for ISB.
> + *
> + * 3. validate that the AZ:nb_ic_cfg to match the INB:nb_ic_cfg.
> + *
> + * 4. Go through all the AZs and check if all have the same value of
> + *AZ:nb_ic_cfg that means all the AZs are done with ISB changes and
> ISB are
> + *up to date with INB, so we can set the values of ISB:nb_ic_cfg to
> + *INB:nb_ic_cfg and INB:sb_ic_cfg to INB:nb_ic_cfg.
> + */
>  static void
> -ovn_db_run(struct ic_context *ctx)
> +update_sequence_numbers(const struct icsbrec_availability_zone *az,
> +struct ic_context *ctx,
> +struct ovsdb_idl_loop *ic_sb_loop)
>  {
> -const struct icsbrec_availability_zone *az = az_run(ctx);
> -VLOG_DBG("Availability zone: %s", az ? az->name : "not created yet.");
> +if (!ctx->ovnisb_txn || !ctx->ovninb_txn) {
> +return;
> +}
> +
> +const struct icnbrec_ic_nb_global *ic_nb = icnbrec_ic_nb_global_first(
> +   ctx->ovninb_idl);
> +if (!ic_nb) {
> +ic_nb = icnbrec_ic_nb_global_insert(ctx->ovninb_txn);
> +}
> +const struct icsbrec_ic_sb_global *ic_sb = icsbrec_ic_sb_global_first(
> +   ctx->ovnisb_idl);
> +if (!ic_sb) {
> +ic_sb = icsbrec_ic_sb_global_insert(ctx->ovnisb_txn);
> +}
>
> -if (!az) {
> +if ((ic_nb->nb_ic_cfg != ic_sb->nb_ic_cfg) &&
> +  (ic_nb->nb_ic_cfg != az->nb_ic_cfg)) {
> +/* Deal with potential overflows. */
> +if (az->nb_ic_cfg == LLONG_MAX) {
> +icsbrec_availability_zone_set_nb_ic_cfg(az, 0);
> +}
> +ic_sb_loop->next_cfg = ic_nb->nb_ic_cfg;
> +ovsdb_idl_txn_increment(ctx->ovnisb_txn, >header_,
> +   _availability_zone_col_nb_ic_cfg,
> true);
>  return;
>  }
>
> +/* handle cases where accidentally AZ:ic_nb_cfg exceeds
> + * the INB:ic_nb_cfg.
> + */
> +if (az->nb_ic_cfg != ic_sb_loop->cur_cfg) {
> +icsbrec_availability_zone_set_nb_ic_cfg(az, ic_sb_loop->cur_cfg);
> +return;
> +}
> +
> +const struct icsbrec_availability_zone *other_az;
> +ICSBREC_AVAILABILITY_ZONE_FOR_EACH (other_az, ctx->ovnisb_idl) {
> +if (other_az->nb_ic_cfg != az->nb_ic_cfg) {
> +return;
> +}
> +}
> +/* All the AZs are updated successfully, update SB/NB counter. */
> +if (ic_nb->nb_ic_cfg != ic_sb->nb_ic_cfg) {
> +icsbrec_ic_sb_global_set_nb_ic_cfg(ic_sb, az->nb_ic_cfg);
> +icnbrec_ic_nb_global_set_sb_ic_cfg(ic_nb, az->nb_ic_cfg);
> +}
> +}
> +
> +static void
> +ovn_db_run(struct ic_context *ctx,
> +   const struct icsbrec_availability_zone *az)
> +{
>  ts_run(ctx);
>  gateway_run(ctx, az);
>  port_binding_run(ctx, az);
> @@ 

Re: [ovs-dev] [PATCH ovn] pinctrl: dns: Ignore additional additional records.

2024-01-24 Thread Dumitru Ceara
On 1/23/24 22:47, Mark Michelson wrote:
> Hi Dumitru.
> 

Hi Mark,

> Two questions:
> 
> 1) Did you mean to title this as "additional additional records?" Or did
> you just mean "additional records?"
> 

Well, no, I didn't I just expanded AR without thinking. :)

> 2) Should this include a test? I'm thinking you could construct a DNS
> query that includes an EDNS AR and ensure that OVN responds to it.
> 

I should've added a test, you're right.

I'll work on v2, thanks for checking this out!

Regards,
Dumitru

> Thanks,
> Mark Michelson
> 
> On 1/23/24 09:15, Dumitru Ceara wrote:
>> EDNS is backwards compatible so it's safe to just ignore additional ARs.
>>
>> Reported-at: https://github.com/ovn-org/ovn/issues/228
>> Reported-at: https://issues.redhat.com/browse/FDP-222
>> Signed-off-by: Dumitru Ceara 
>> ---
>>   controller/pinctrl.c | 20 ++--
>>   1 file changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
>> index 4992eab089..0be77701ec 100644
>> --- a/controller/pinctrl.c
>> +++ b/controller/pinctrl.c
>> @@ -2885,6 +2885,7 @@ dns_build_ptr_answer(
>>   free(encoded);
>>   }
>>   +#define DNS_QUERY_TYPE_CLASS_LEN (2 * sizeof(ovs_be16))
>>   #define DNS_RCODE_SERVER_REFUSE 0x5
>>     /* Called with in the pinctrl_handler thread context. */
>> @@ -2949,18 +2950,13 @@ pinctrl_handle_dns_lookup(
>>   goto exit;
>>   }
>>   -    /* Check if there is an additional record present, which is
>> unsupported */
>> -    if (in_dns_header->arcount) {
>> -    VLOG_DBG_RL(, "Received DNS query with additional records,
>> which"
>> -    " is unsupported");
>> -    goto exit;
>> -    }
>> -
>>   struct udp_header *in_udp = dp_packet_l4(pkt_in);
>>   size_t udp_len = ntohs(in_udp->udp_len);
>>   size_t l4_len = dp_packet_l4_size(pkt_in);
>> +    uint8_t *l4_start = (uint8_t *) in_udp;
>>   uint8_t *end = (uint8_t *)in_udp + MIN(udp_len, l4_len);
>>   uint8_t *in_dns_data = (uint8_t *)(in_dns_header + 1);
>> +    uint8_t *in_dns_data_start = in_dns_data;
>>   uint8_t *in_queryname = in_dns_data;
>>   uint16_t idx = 0;
>>   struct ds query_name;
>> @@ -2984,7 +2980,7 @@ pinctrl_handle_dns_lookup(
>>   in_dns_data += idx;
>>     /* Query should have TYPE and CLASS fields */
>> -    if (in_dns_data + (2 * sizeof(ovs_be16)) > end) {
>> +    if (in_dns_data + DNS_QUERY_TYPE_CLASS_LEN > end) {
>>   ds_destroy(_name);
>>   goto exit;
>>   }
>> @@ -2998,6 +2994,10 @@ pinctrl_handle_dns_lookup(
>>   goto exit;
>>   }
>>   +    uint8_t *rest = in_dns_data + DNS_QUERY_TYPE_CLASS_LEN;
>> +    uint32_t query_size = rest - in_dns_data_start;
>> +    uint32_t query_l4_size = rest - l4_start;
>> +
>>   uint64_t dp_key = ntohll(pin->flow_metadata.flow.metadata);
>>   const char *answer_data = NULL;
>>   bool ovn_owned = false;
>> @@ -3080,7 +3080,7 @@ pinctrl_handle_dns_lookup(
>>   goto exit;
>>   }
>>   -    uint16_t new_l4_size = ntohs(in_udp->udp_len) +  dns_answer.size;
>> +    uint16_t new_l4_size = query_l4_size + dns_answer.size;
>>   size_t new_packet_size = pkt_in->l4_ofs + new_l4_size;
>>   struct dp_packet pkt_out;
>>   dp_packet_init(_out, new_packet_size);
>> @@ -3117,7 +3117,7 @@ pinctrl_handle_dns_lookup(
>>   out_dns_header->arcount = 0;
>>     /* Copy the Query section. */
>> -    dp_packet_put(_out, dp_packet_data(pkt_in),
>> dp_packet_size(pkt_in));
>> +    dp_packet_put(_out, dp_packet_data(pkt_in), query_size);
>>     /* Copy the answer sections. */
>>   dp_packet_put(_out, dns_answer.data, dns_answer.size);
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v5 3/4] OVN-IC: Make it possible for CMS to detect when the ISB is up-to-date.

2024-01-24 Thread 0-day Robot
Bleep bloop.  Greetings Mohammad Heib, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line lacks whitespace around operator
WARNING: Line lacks whitespace around operator
#149 FILE: utilities/ovn-ic-nbctl.c:355:
Synchronization command (use with --wait=sb):\n\

WARNING: Line lacks whitespace around operator
WARNING: Line lacks whitespace around operator
WARNING: Line lacks whitespace around operator
#155 FILE: utilities/ovn-ic-nbctl.c:361:
  --no-wait, --wait=none  do not wait for OVN reconfiguration (default)\n\

WARNING: Line lacks whitespace around operator
WARNING: Line lacks whitespace around operator
#156 FILE: utilities/ovn-ic-nbctl.c:362:
  --wait=sb   wait for southbound database update\n\

Lines checked: 268, Warnings: 7, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v4 4/4] ic/tests: add unit test for ic sync command

2024-01-24 Thread Mohammad Heib
Hi Ales,

On Fri, Jan 12, 2024 at 11:12 AM Ales Musil  wrote:

>
>
> On Tue, Jan 9, 2024 at 2:29 PM Mohammad Heib  wrote:
>
>> add unit test that check validate that sync command
>> sync ISB properly
>>
>> Signed-off-by: Mohammad Heib 
>> ---
>>
>
> Hi Mohammad,
>
> I have a few comments down below.
>
>
>>  tests/ovn-ic.at | 47 +++
>>  1 file changed, 47 insertions(+)
>>
>> diff --git a/tests/ovn-ic.at b/tests/ovn-ic.at
>> index d4c436f84..f55ffa6cd 100644
>> --- a/tests/ovn-ic.at
>> +++ b/tests/ovn-ic.at
>> @@ -1274,3 +1274,50 @@ OVN_CLEANUP_IC([az1], [az2])
>>
>>  AT_CLEANUP
>>  ])
>> +
>> +AT_SETUP([ovn-ic -- sync ISB status to INB])
>> +ovn_init_ic_db
>> +net_add n1
>> +
>> +ovn_start az1
>> +sim_add gw-az1
>> +as gw-az1
>> +
>> +check ovs-vsctl add-br br-phys
>> +ovn_az_attach az1 n1 br-phys 192.168.1.1
>> +check ovs-vsctl set open . external-ids:ovn-is-interconn=true
>> +as az1
>> +
>> +# pause ovn-ic instance
>> +check ovn-appctl -t ic/ovn-ic pause
>> +
>> +# run sync command in the background this commands
>> +# supposed to stuck since ovn-ic is paused.
>> +$(ovn-ic-nbctl --wait=sb sync &)
>>
>
> The extra $() around shouldn't be needed.
>
>
>> +
>> +for i in {1..5}; do
>> +set -- $(ovn-ic-nbctl get ic_nb_global . nb_ic_cfg sb_ic_cfg)
>> +AS_VAR_SET([ic_nb_cfg], [$1])
>> +AS_VAR_SET([ic_sb_cfg], [$2])
>> +if test $ic_nb_cfg -gt $ic_sb_cfg; then
>> +sleep 1
>> +else
>> +break
>> +fi
>> +done
>>
>
> Why not to use OVS_WAIT_UNTIL?
>
>
>> +
>> +# if ic_nb_cfg equal to ic_sb_cfg that mean both zero
>> +# or both 1 which is a not correct and the test must fail.
>> +AT_FAIL_IF([test $ic_nb_cfg == $ic_sb_cfg])
>>
>
> AT_CHECK would be better suited for this.
>
>
>> +
>> +# resume ovn-ic instance
>> +check ovn-appctl -t ic/ovn-ic resume
>> +set -- $(ovn-ic-nbctl get ic_nb_global . nb_ic_cfg sb_ic_cfg)
>>
>
> We should wait after "resume" for the values to settle down.
>
>
>> +AS_VAR_SET([ic_nb_cfg], [$1])
>> +AS_VAR_SET([ic_sb_cfg], [$2])
>> +AT_FAIL_IF([test $ic_nb_cfg != 1])
>> +AT_FAIL_IF([test $ic_nb_cfg != 1])
>>
>
> Same as above, AT_CHECK would be better.
>
>
>> +
>> +OVN_CLEANUP_IC([az1])
>> +AT_CLEANUP
>> +])
>> --
>> 2.34.3
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
> The test seems to be overly complicated for what it tries to achieve. I
> would suggest the following diff:
>
Thank you so much for reviewing the patch,
I agree with you the test is complicated, I will go with your suggestions
in v5,
Thanks :)

>
> diff --git a/tests/ovn-ic.at b/tests/ovn-ic.at
> index f55ffa6cd..32df3be2c 100644
> --- a/tests/ovn-ic.at
> +++ b/tests/ovn-ic.at
> @@ -1293,30 +1293,25 @@ check ovn-appctl -t ic/ovn-ic pause
>
>  # run sync command in the background this commands
>  # supposed to stuck since ovn-ic is paused.
> -$(ovn-ic-nbctl --wait=sb sync &)
> -
> -for i in {1..5}; do
> -set -- $(ovn-ic-nbctl get ic_nb_global . nb_ic_cfg sb_ic_cfg)
> -AS_VAR_SET([ic_nb_cfg], [$1])
> -AS_VAR_SET([ic_sb_cfg], [$2])
> -if test $ic_nb_cfg -gt $ic_sb_cfg; then
> -sleep 1
> -else
> -break
> -fi
> -done
> +ovn-ic-nbctl --wait=sb sync &
>
> -# if ic_nb_cfg equal to ic_sb_cfg that mean both zero
> -# or both 1 which is a not correct and the test must fail.
> -AT_FAIL_IF([test $ic_nb_cfg == $ic_sb_cfg])
> +OVS_WAIT_UNTIL([test $(ovn-ic-nbctl get ic_nb_global . nb_ic_cfg) -gt
> $(ovn-ic-nbctl get ic_nb_global . sb_ic_cfg)])
> +AT_CHECK([ovn-ic-nbctl get ic_nb_global . nb_ic_cfg], [0], [dnl
> +1
> +])
> +AT_CHECK([ovn-ic-nbctl get ic_nb_global . sb_ic_cfg], [0], [dnl
> +0
> +])
>
>  # resume ovn-ic instance
>  check ovn-appctl -t ic/ovn-ic resume
> -set -- $(ovn-ic-nbctl get ic_nb_global . nb_ic_cfg sb_ic_cfg)
> -AS_VAR_SET([ic_nb_cfg], [$1])
> -AS_VAR_SET([ic_sb_cfg], [$2])
> -AT_FAIL_IF([test $ic_nb_cfg != 1])
> -AT_FAIL_IF([test $ic_nb_cfg != 1])
> +OVS_WAIT_UNTIL([test $(ovn-ic-nbctl get ic_nb_global . nb_ic_cfg) -eq
> $(ovn-ic-nbctl get ic_nb_global . sb_ic_cfg)])
> +AT_CHECK([ovn-ic-nbctl get ic_nb_global . nb_ic_cfg], [0], [dnl
> +1
> +])
> +AT_CHECK([ovn-ic-nbctl get ic_nb_global . sb_ic_cfg], [0], [dnl
> +1
> +])
>
>  OVN_CLEANUP_IC([az1])
>  AT_CLEANUP
>
>
> Thanks,
> Ales
>
> --
>
> Ales Musil
>
> Senior Software Engineer - OVN Core
>
> Red Hat EMEA 
>
> amu...@redhat.com
> 
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v5 3/4] OVN-IC: Make it possible for CMS to detect when the ISB is up-to-date.

2024-01-24 Thread Mohammad Heib
Until now, there has been no reliable for the CMS to detect when
changes made to the INB configuration have been passed through
to the ISB, This commit adds this feature to the system,
by adding sequence numbers to the INB and ISB and adding code
in ovn-ic-nbctl, ovn-ic to keep those sequence numbers up-to-date.

The biggest user-visible change from this commit is a new option
'--wait' and new command 'sync' to ovn-ic-nbctl.
With --wait=sb, ovn-ic-nbctl now waits for ovn-ics to update the ISB
database.

Signed-off-by: Mohammad Heib 
Acked-by: Mark Michelson 
Acked-by: Ales Musil 
---
 utilities/ovn-ic-nbctl.8.xml | 49 
 utilities/ovn-ic-nbctl.c | 86 +++-
 2 files changed, 133 insertions(+), 2 deletions(-)

diff --git a/utilities/ovn-ic-nbctl.8.xml b/utilities/ovn-ic-nbctl.8.xml
index 4a70994b8..5a1324d2d 100644
--- a/utilities/ovn-ic-nbctl.8.xml
+++ b/utilities/ovn-ic-nbctl.8.xml
@@ -123,9 +123,58 @@
   
 
 
+Synchronization Commands
+
+
+  sync
+  
+Ordinarily, --wait=sb only waits for changes by the
+current ovn-ic-nbctl invocation to take effect.
+This means that, if none of the commands supplied to
+ovn-ic-nbctl change the database, then the command does
+not wait at all.  With the sync command, however,
+ovn-ic-nbctl waits even for earlier changes to the
+database to propagate down to the southbound database, according to the
+argument of --wait.
+  
+
+
 Options
 
 
+  --no-wait | --wait=none
+  --wait=sb
+
+  
+
+  These options control whether and how ovn-ic-nbctl waits
+  for the OVN system to become up-to-date with changes made in an
+  ovn-ic-nbctl invocation.
+
+
+
+  By default, or if --no-wait or --wait=none,
+  ovn-ic-nbctl exits immediately after confirming that
+  changes have been committed to the Interconnect northbound database,
+  without waiting.
+
+
+
+  With --wait=sb, before ovn-ic-nbctl exits,
+  it waits for ovn-ics to bring the Interconnect
+  southbound database up-to-date with the Interconnect northbound
+  database updates.
+
+
+
+  Ordinarily, --wait=sb only waits for changes by the
+  current ovn-ic-nbctl invocation to take effect.
+  This means that, if none of the commands supplied to
+  ovn-ic-nbctl change the database, then the command
+  does not wait at all.
+  Use the sync command to override this behavior.
+
+  
 --db database
 
   The OVSDB database remote to contact.  If the OVN_IC_NB_DB
diff --git a/utilities/ovn-ic-nbctl.c b/utilities/ovn-ic-nbctl.c
index 721dc4586..4317c385a 100644
--- a/utilities/ovn-ic-nbctl.c
+++ b/utilities/ovn-ic-nbctl.c
@@ -58,6 +58,13 @@ static bool oneline;
 /* --dry-run: Do not commit any changes. */
 static bool dry_run;
 
+/* --wait=TYPE: Wait for configuration change to take effect? */
+static enum nbctl_wait_type wait_type = NBCTL_WAIT_NONE;
+
+/* Should we wait (if specified by 'wait_type') even if the commands don't
+ * change the database at all */
+static bool force_wait = false;
+
 /* --timeout: Time to wait for a connection to 'db'. */
 static unsigned int timeout;
 
@@ -161,6 +168,8 @@ parse_options(int argc, char *argv[], struct shash 
*local_options)
 OPT_DB = UCHAR_MAX + 1,
 OPT_ONELINE,
 OPT_NO_SYSLOG,
+OPT_NO_WAIT,
+OPT_WAIT,
 OPT_DRY_RUN,
 OPT_LOCAL,
 OPT_COMMANDS,
@@ -173,6 +182,8 @@ parse_options(int argc, char *argv[], struct shash 
*local_options)
 static const struct option global_long_options[] = {
 {"db", required_argument, NULL, OPT_DB},
 {"no-syslog", no_argument, NULL, OPT_NO_SYSLOG},
+{"no-wait", no_argument, NULL, OPT_NO_WAIT},
+{"wait", required_argument, NULL, OPT_WAIT},
 {"dry-run", no_argument, NULL, OPT_DRY_RUN},
 {"oneline", no_argument, NULL, OPT_ONELINE},
 {"timeout", required_argument, NULL, 't'},
@@ -234,7 +245,19 @@ parse_options(int argc, char *argv[], struct shash 
*local_options)
 case OPT_DRY_RUN:
 dry_run = true;
 break;
-
+case OPT_NO_WAIT:
+wait_type = NBCTL_WAIT_NONE;
+break;
+case OPT_WAIT:
+if (!strcmp(optarg, "none")) {
+wait_type = NBCTL_WAIT_NONE;
+} else if (!strcmp(optarg, "sb")) {
+wait_type = NBCTL_WAIT_SB;
+} else {
+ctl_fatal("argument to --wait must be "
+  "\"none\", \"sb\" ");
+}
+break;
 case OPT_LOCAL:
 if (shash_find(local_options, options[idx].name)) {
 ctl_fatal("'%s' option specified multiple times",
@@ -329,9 +352,14 @@ set the 

[ovs-dev] [PATCH ovn v5 4/4] ic/tests: Add unit test for ic sync command.

2024-01-24 Thread Mohammad Heib
add unit test that check validate that sync command
sync ISB properly

Signed-off-by: Mohammad Heib 
---
 tests/ovn-ic.at | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/tests/ovn-ic.at b/tests/ovn-ic.at
index d4c436f84..535aba7da 100644
--- a/tests/ovn-ic.at
+++ b/tests/ovn-ic.at
@@ -1274,3 +1274,46 @@ OVN_CLEANUP_IC([az1], [az2])
 
 AT_CLEANUP
 ])
+
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([ovn-ic -- sync ISB status to INB])
+ovn_init_ic_db
+net_add n1
+
+ovn_start az1
+sim_add gw-az1
+as gw-az1
+
+check ovs-vsctl add-br br-phys
+ovn_az_attach az1 n1 br-phys 192.168.1.1
+check ovs-vsctl set open . external-ids:ovn-is-interconn=true
+as az1
+
+# pause ovn-ic instance
+check ovn-appctl -t ic/ovn-ic pause
+
+# run sync command in the background this commands
+# supposed to stuck since ovn-ic is paused.
+ovn-ic-nbctl --wait=sb sync &
+
+OVS_WAIT_UNTIL([test $(ovn-ic-nbctl get ic_nb_global . nb_ic_cfg) -gt 
$(ovn-ic-nbctl get ic_nb_global . sb_ic_cfg)])
+AT_CHECK([ovn-ic-nbctl get ic_nb_global . nb_ic_cfg], [0], [dnl
+1
+])
+AT_CHECK([ovn-ic-nbctl get ic_nb_global . sb_ic_cfg], [0], [dnl
+0
+])
+
+# resume ovn-ic instance
+check ovn-appctl -t ic/ovn-ic resume
+OVS_WAIT_UNTIL([test $(ovn-ic-nbctl get ic_nb_global . nb_ic_cfg) -eq 
$(ovn-ic-nbctl get ic_nb_global . sb_ic_cfg)])
+AT_CHECK([ovn-ic-nbctl get ic_nb_global . nb_ic_cfg], [0], [dnl
+1
+])
+AT_CHECK([ovn-ic-nbctl get ic_nb_global . sb_ic_cfg], [0], [dnl
+1
+])
+
+OVN_CLEANUP_IC([az1])
+AT_CLEANUP
+])
-- 
2.34.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v5 2/4] ovn-ic: Implement basic INB change handling status.

2024-01-24 Thread Mohammad Heib
This patch implements a basic sequence number protocol
that can be used by CMS to determine if the changes
applied to INB are successfully propagated to ISB.

The implementation of this patch relies on OVN-ICs
instances to update the ISB by adding a per AZ a nb_ic_cfg
counter that will be updated by the OVN-IC once it is done
and commit all needed changes to the ISB, and according to this
AZ:nb_ic_cfg the ISB and INB will be updating about the status
of the changes.

Signed-off-by: Mohammad Heib 
---
 ic/ovn-ic.c | 95 ++---
 1 file changed, 90 insertions(+), 5 deletions(-)

diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
index 8ceb34d7c..d0b58d4d1 100644
--- a/ic/ovn-ic.c
+++ b/ic/ovn-ic.c
@@ -1782,16 +1782,95 @@ route_run(struct ic_context *ctx,
 hmap_destroy(_lrs);
 }
 
+/*
+ * This function implements a sequence number protocol that can be used by
+ * the INB end user to verify that ISB is synced with all the changes that
+ * are done be the user/AZs-controllers:
+ *
+ * Since we have multiple IC instances running in different regions
+ * we can't rely on one of them to update the ISB and sync that update
+ * to INB since other ICs can make changes in parallel.
+ * So to have a sequence number protocol working properly we must
+ * make sure that all the IC instances are synced with the ISB first
+ * and then update the INB.
+ *
+ * To guarantee that all instances are synced with ISB first, each IC
+ * will do the following steps:
+ *
+ * 1. when local ovn-ic sees that INB:nb_ic_cfg has updated we will set
+ *the ic_sb_loop->next_cfg to match the INB:nb_ic_cfg and increment
+ *the value of AZ:nb_ic_cfg and wait until we get confirmation from
+ *the server.
+ *
+ * 2. once this IC instance changes for ISB are committed successfully
+ *(next loop), the value of cur_cfg will be updated to match
+ *the INB:nb_ic_cfg that indicate that our local instance is up to date
+ *and no more changes need to be done for ISB.
+ *
+ * 3. validate that the AZ:nb_ic_cfg to match the INB:nb_ic_cfg.
+ *
+ * 4. Go through all the AZs and check if all have the same value of
+ *AZ:nb_ic_cfg that means all the AZs are done with ISB changes and ISB are
+ *up to date with INB, so we can set the values of ISB:nb_ic_cfg to
+ *INB:nb_ic_cfg and INB:sb_ic_cfg to INB:nb_ic_cfg.
+ */
 static void
-ovn_db_run(struct ic_context *ctx)
+update_sequence_numbers(const struct icsbrec_availability_zone *az,
+struct ic_context *ctx,
+struct ovsdb_idl_loop *ic_sb_loop)
 {
-const struct icsbrec_availability_zone *az = az_run(ctx);
-VLOG_DBG("Availability zone: %s", az ? az->name : "not created yet.");
+if (!ctx->ovnisb_txn || !ctx->ovninb_txn) {
+return;
+}
+
+const struct icnbrec_ic_nb_global *ic_nb = icnbrec_ic_nb_global_first(
+   ctx->ovninb_idl);
+if (!ic_nb) {
+ic_nb = icnbrec_ic_nb_global_insert(ctx->ovninb_txn);
+}
+const struct icsbrec_ic_sb_global *ic_sb = icsbrec_ic_sb_global_first(
+   ctx->ovnisb_idl);
+if (!ic_sb) {
+ic_sb = icsbrec_ic_sb_global_insert(ctx->ovnisb_txn);
+}
 
-if (!az) {
+if ((ic_nb->nb_ic_cfg != ic_sb->nb_ic_cfg) &&
+  (ic_nb->nb_ic_cfg != az->nb_ic_cfg)) {
+/* Deal with potential overflows. */
+if (az->nb_ic_cfg == LLONG_MAX) {
+icsbrec_availability_zone_set_nb_ic_cfg(az, 0);
+}
+ic_sb_loop->next_cfg = ic_nb->nb_ic_cfg;
+ovsdb_idl_txn_increment(ctx->ovnisb_txn, >header_,
+   _availability_zone_col_nb_ic_cfg, true);
 return;
 }
 
+/* handle cases where accidentally AZ:ic_nb_cfg exceeds
+ * the INB:ic_nb_cfg.
+ */
+if (az->nb_ic_cfg != ic_sb_loop->cur_cfg) {
+icsbrec_availability_zone_set_nb_ic_cfg(az, ic_sb_loop->cur_cfg);
+return;
+}
+
+const struct icsbrec_availability_zone *other_az;
+ICSBREC_AVAILABILITY_ZONE_FOR_EACH (other_az, ctx->ovnisb_idl) {
+if (other_az->nb_ic_cfg != az->nb_ic_cfg) {
+return;
+}
+}
+/* All the AZs are updated successfully, update SB/NB counter. */
+if (ic_nb->nb_ic_cfg != ic_sb->nb_ic_cfg) {
+icsbrec_ic_sb_global_set_nb_ic_cfg(ic_sb, az->nb_ic_cfg);
+icnbrec_ic_nb_global_set_sb_ic_cfg(ic_nb, az->nb_ic_cfg);
+}
+}
+
+static void
+ovn_db_run(struct ic_context *ctx,
+   const struct icsbrec_availability_zone *az)
+{
 ts_run(ctx);
 gateway_run(ctx, az);
 port_binding_run(ctx, az);
@@ -2218,7 +2297,13 @@ main(int argc, char *argv[])
 ovsdb_idl_has_ever_connected(ctx.ovnsb_idl) &&
 ovsdb_idl_has_ever_connected(ctx.ovninb_idl) &&
 ovsdb_idl_has_ever_connected(ctx.ovnisb_idl)) {
-ovn_db_run();
+const 

[ovs-dev] [PATCH ovn v5 1/4] OVN-IC: Interconnect DBs add basic Information Flow columns.

2024-01-24 Thread Mohammad Heib
Add basic flow columns to interconnect northbound DB and
interconnect Southbound DB.

Those columns will be used by future patches to add basic
support for Information Flow in OVN  interconnect.

Signed-off-by: Mohammad Heib 
Acked-by: Mark Michelson 
Acked-by: Ales Musil 
---
 NEWS|  8 
 ovn-ic-nb.ovsschema |  6 --
 ovn-ic-nb.xml   | 17 +
 ovn-ic-sb.ovsschema |  8 +---
 ovn-ic-sb.xml   | 21 +
 5 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/NEWS b/NEWS
index 5f267b4c6..9762af0e6 100644
--- a/NEWS
+++ b/NEWS
@@ -14,6 +14,14 @@ Post v23.09.0
   - ovn-northd-ddlog has been removed.
   - A new LSP option "enable_router_port_acl" has been added to enable
 conntrack for the router port whose peer is l3dgw_port if set it true.
+  - OVN Interconnection:
+* INB provides basic feedback to the CMS about the ISB changes
+  handling status.
+* IC_NB_Global now have "nb_ic_cfg" and "sb_ic_cfg" columns for
+  for ISB informational status.
+* IC_SB_Global now have "nb_ic_cfg" column for ISB informational status.
+* Availability_Zone now have "nb_ic_cfg" column for local AZ
+  informational status.
 
 OVN v23.09.0 - 15 Sep 2023
 --
diff --git a/ovn-ic-nb.ovsschema b/ovn-ic-nb.ovsschema
index 894db8344..bee174357 100644
--- a/ovn-ic-nb.ovsschema
+++ b/ovn-ic-nb.ovsschema
@@ -1,10 +1,12 @@
 {
 "name": "OVN_IC_Northbound",
-"version": "1.0.0",
-"cksum": "45589876 3383",
+"version": "1.1.0",
+"cksum": "3964083684 3501",
 "tables": {
 "IC_NB_Global": {
 "columns": {
+"nb_ic_cfg": {"type": {"key": "integer"}},
+"sb_ic_cfg": {"type": {"key": "integer"}},
 "external_ids": {
 "type": {"key": "string", "value": "string",
  "min": 0, "max": "unlimited"}},
diff --git a/ovn-ic-nb.xml b/ovn-ic-nb.xml
index 8c53bec3b..2ae9bf6d5 100644
--- a/ovn-ic-nb.xml
+++ b/ovn-ic-nb.xml
@@ -36,6 +36,23 @@
   one row.
 
 
+
+  These columns allow a client to track the overall configuration state of
+  the system.
+
+  
+Sequence number for client to increment. When a client modifies the
+interconnect northbound database configuration and wishes to wait for
+OVN-ICs to handle this change and update the Interconnect
+southbound database, it may increment this sequence number.
+  
+  
+Sequence number that one OVN-IC sets to the value of
+ after waiting to all the OVN-ICs
+finish applying their changes to interconnect southbound database.
+  
+
+
 
   
 See External IDs at the beginning of this document.
diff --git a/ovn-ic-sb.ovsschema b/ovn-ic-sb.ovsschema
index 1d60b36d1..5baf141cf 100644
--- a/ovn-ic-sb.ovsschema
+++ b/ovn-ic-sb.ovsschema
@@ -1,10 +1,11 @@
 {
 "name": "OVN_IC_Southbound",
-"version": "1.1.1",
-"cksum": "3684563024 6914",
+"version": "1.2.0",
+"cksum": "1381014956 7032",
 "tables": {
 "IC_SB_Global": {
 "columns": {
+"nb_ic_cfg": {"type": {"key": "integer"}},
 "external_ids": {
 "type": {"key": "string", "value": "string",
  "min": 0, "max": "unlimited"}},
@@ -24,7 +25,8 @@
 "isRoot": true},
 "Availability_Zone": {
 "columns": {
-"name": {"type": "string"}},
+"name": {"type": "string"},
+"nb_ic_cfg": {"type": {"key": "integer"}}},
 "isRoot": true,
 "indexes": [["name"]]},
 "Gateway": {
diff --git a/ovn-ic-sb.xml b/ovn-ic-sb.xml
index f7e17e113..c3e7d2173 100644
--- a/ovn-ic-sb.xml
+++ b/ovn-ic-sb.xml
@@ -69,6 +69,21 @@
   one row.
 
 
+
+  This column allow a client to track the overall configuration state of
+  the system.
+
+  
+Sequence number for the configuration. When a CMS or
+ovn-ic-nbctl updates the Interconnect northbound database,
+it increments the nb_ic_cfg column in the
+NB_IC_Global table in the Interconnect northbound
+database. when OVN-ICs updates the southbound database to
+bring it up to date with these changes, one OVN-IC updates
+this column to the same value.
+  
+
+
 
   
 See External IDs at the beginning of this document.
@@ -102,6 +117,12 @@
 
   A name that uniquely identifies the availability zone.
 
+
+
+  This column is used by the OVN-IC to inform
+  that this IC instance is aligned with the changes in INB
+
+
   
 
   
-- 
2.34.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v5 0/4] OVN-IC: Add basic sequence number status support.

2024-01-24 Thread Mohammad Heib
Currently, OVN-IC doesn't support a way to tell the end-user when their changes
to the IC-NB database have propagated successfully to the IC-SB Database.

This patch series adds basic support for the sequence number status protocol
that is implemented on the native OVN, with this patch series the end user
now can wait for their changes in the IC-NB DB to take effect by executing the
'sync' command after applying any changes to the IC-NB DB, for example, if the
end-user has created a transit switch in the IC-NB global DB and want to make
sure that the IC-SB create a DP binding for this ts-switch the user now can use
the 'sync' command as following:
$ ovn-ic-nbctl ts-add ts1
$ ovn-ic-nbctl --wait=sb sync

The second command will wait until all the ovn-ic instances see the new changes
and update their own local dbs and the global IC-SB db.

v4 -> v5

* Addressed review comments from Ales to simplify the unit test and
* a small comment on the ovn-ic.
v3 -> v4

* Addressed review comments from Ales to add check for the 
  overflow cases.

v2 -> v3

* Rebase over main.
* Addressed review comments from Mark and Ales.



Mohammad Heib (4):
  OVN-IC: Interconnect DBs add basic Information Flow columns.
  ovn-ic: Implement basic INB change handling status.
  OVN-IC: Make it possible for CMS to detect when the ISB is up-to-date.
  ic/tests: Add unit test for ic sync command.

 NEWS |  8 +++
 ic/ovn-ic.c  | 95 ++--
 ovn-ic-nb.ovsschema  |  6 ++-
 ovn-ic-nb.xml| 17 +++
 ovn-ic-sb.ovsschema  |  8 +--
 ovn-ic-sb.xml| 21 
 tests/ovn-ic.at  | 43 
 utilities/ovn-ic-nbctl.8.xml | 49 +++
 utilities/ovn-ic-nbctl.c | 86 +++-
 9 files changed, 321 insertions(+), 12 deletions(-)

-- 
2.34.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH OVN] DHCP Relay Agent support for overlay subnets

2024-01-24 Thread Naveen Yerramneni


> On 24-Jan-2024, at 8:59 AM, Numan Siddique  wrote:
> 
> On Tue, Jan 23, 2024 at 8:02 PM Naveen Yerramneni
>  wrote:
>> 
>> 
>> 
>>> On 16-Jan-2024, at 2:30 AM, Numan Siddique  wrote:
>>> 
>>> On Tue, Dec 12, 2023 at 1:05 PM Naveen Yerramneni
>>>  wrote:
 
   This patch contains changes to enable DHCP Relay Agent support for 
 overlay subnets.
 
   USE CASE:
   --
 - Enable IP address assignment for overlay subnets from the 
 centralized DHCP server present in the underlay network.
 
   PREREQUISITES
   --
 - Logical Router Port IP should be assigned (statically) from the same 
 overlay subnet which is managed by DHCP server.
 - LRP IP is used for GIADRR field when relaying the DHCP packets and 
 also same IP needs to be configured as default gateway for the overlay 
 subnet.
 - Overlay subnets managed by external DHCP server are expected to be 
 directly reachable from the underlay network.
 
   EXPECTED PACKET FLOW:
   --
   Following is the expected packet flow inorder to support DHCP rleay 
 functionality in OVN.
 1. DHCP client originates DHCP discovery (broadcast).
 2. DHCP relay (running on the OVN) receives the broadcast and forwards 
 the packet to the DHCP server by converting it to unicast. While 
 forwarding the packet, it updates the GIADDR in DHCP header to its
interface IP on which DHCP packet is received.
 3. DHCP server uses GIADDR field to decide the IP address pool from 
 which IP has to be assigned and DHCP offer is sent to the same IP (GIADDR).
 4. DHCP relay agent forwards the offer to the client, it resets the 
 GIADDR field when forwarding the offer to the client.
 5. DHCP client sends DHCP request (broadcast) packet.
 6. DHCP relay (running on the OVN) receives the broadcast and forwards 
 the packet to the DHCP server by converting it to unicast. While 
 forwarding the packet, it updates the GIADDR in DHCP header to its
interface IP on which DHCP packet is received.
 7. DHCP Server sends the ACK packet.
 8. DHCP relay agent forwards the ACK packet to the client, it resets 
 the GIADDR field when forwarding the ACK to the client.
 9. All the future renew/release packets are directly exchanged between 
 DHCP client and DHCP server.
 
   OVN DHCP RELAY PACKET FLOW:
   
   To add DHCP Relay support on OVN, we need to replicate all the behavior 
 described above using distributed logical switch and logical router.
   At, highlevel packet flow is distributed among Logical Switch and 
 Logical Router on source node (where VM is deployed) and redirect 
 chassis(RC) node.
 1. Request packet gets processed on the source node where VM is 
 deployed and relays the packet to DHCP server.
 2. Response packet is first processed on RC node (which first recieves 
 the packet from underlay network). RC node forwards the packet to the 
 right node by filling in the dest MAC and IP.
 
   OVN Packet flow with DHCP relay is explained below.
 1. DHCP client (VM) sends the DHCP discover packet (broadcast).
 2. Logical switch converts the packet to L2 unicast by setting the 
 destination MAC to LRP's MAC
 3. Logical Router receives the packet and redirects it to the OVN 
 controller.
 4. OVN controller updates the required information(GIADDR) in the DHCP 
 payload after doing the required checks. If any check fails, packet is 
 dropped.
 5. Logical Router converts the packet to L3 unicast and forwards it to 
 the server. This packets gets routed like any other packet (via RC node).
 6. Server replies with DHCP offer.
 7. RC node processes the DHCP offer and forwards it to the OVN 
 controller.
 8. OVN controller does sanity checks and  updates the destination MAC 
 (available in DHCP header), destination IP (available in DHCP header), 
 resets GIADDR  and reinjects the packet to datapath.
If any check fails, packet is dropped.
 9. Logical router updates the source IP and port and forwards the 
 packet to logical switch.
 10. Logical switch delivers the packet to the DHCP client.
 11. Similar steps are performed for Request and Ack packets.
 12. All the future renew/release packets are directly exchanged 
 between DHCP client and DHCP server
 
   NEW OVN ACTIONS
   ---
 
 1. dhcp_relay_req(, )
 - This action executes on the source node on which the DHCP 
 request originated.
 - This action relays the DHCP request coming from client to the 
 server. Relay-ip is used to update GIADDR in the DHCP header.
 2. 

Re: [ovs-dev] [PATCH ovn v5 08/16] northd: Refactor lflow management into a separate module.

2024-01-24 Thread Dumitru Ceara
On 1/24/24 06:01, Han Zhou wrote:
> On Fri, Jan 19, 2024 at 2:50 AM Dumitru Ceara  wrote:
>>
>> On 1/11/24 16:31, num...@ovn.org wrote:
>>> +
>>> +void
>>> +lflow_table_add_lflow(struct lflow_table *lflow_table,
>>> +  const struct ovn_datapath *od,
>>> +  const unsigned long *dp_bitmap, size_t
> dp_bitmap_len,
>>> +  enum ovn_stage stage, uint16_t priority,
>>> +  const char *match, const char *actions,
>>> +  const char *io_port, const char *ctrl_meter,
>>> +  const struct ovsdb_idl_row *stage_hint,
>>> +  const char *where,
>>> +  struct lflow_ref *lflow_ref)
>>> +OVS_EXCLUDED(fake_hash_mutex)
>>> +{
>>> +struct ovs_mutex *hash_lock;
>>> +uint32_t hash;
>>> +
>>> +ovs_assert(!od ||
>>> +   ovn_stage_to_datapath_type(stage) ==
> ovn_datapath_get_type(od));
>>> +
>>> +hash = ovn_logical_flow_hash(ovn_stage_get_table(stage),
>>> + ovn_stage_get_pipeline(stage),
>>> + priority, match,
>>> + actions);
>>> +
>>> +hash_lock = lflow_hash_lock(_table->entries, hash);
>>> +struct ovn_lflow *lflow =
>>> +do_ovn_lflow_add(lflow_table, od, dp_bitmap,
>>> + dp_bitmap_len, hash, stage,
>>> + priority, match, actions,
>>> + io_port, ctrl_meter, stage_hint, where);
>>> +
>>> +if (lflow_ref) {
>>> +/* lflow referencing is only supported if 'od' is not NULL. */
>>> +ovs_assert(od);
>>> +
>>> +struct lflow_ref_node *lrn =
>>> +lflow_ref_node_find(_ref->lflow_ref_nodes, lflow,
> hash);
>>> +if (!lrn) {
>>> +lrn = xzalloc(sizeof *lrn);
>>> +lrn->lflow = lflow;
>>> +lrn->dp_index = od->index;
>>> +ovs_list_insert(_ref->lflows_ref_list,
>>> +>lflow_list_node);
>>> +inc_dp_refcnt(>dp_refcnts_map, lrn->dp_index);
>>> +ovs_list_insert(>referenced_by,
> >ref_list_node);
>>> +
>>> +hmap_insert(_ref->lflow_ref_nodes, >ref_node,
> hash);
>>> +}
>>> +
>>> +lrn->linked = true;
>>> +}
>>> +
>>> +lflow_hash_unlock(hash_lock);
>>> +
>>> +}
>>> +
>>
>> This part is not thread safe.
>>
>> If two threads try to add logical flows that have different hashes and
>> lflow_ref is not NULL we're going to have a race condition when
>> inserting to the _ref->lflow_ref_nodes hash map because the two
>> threads will take different locks.
>>
> 
> I think it is safe because a lflow_ref is always associated with an object,
> e.g. port, datapath, lb, etc., and lflow generation for a single such
> object is never executed in parallel, which is how the parallel lflow build
> is designed.
> Does it make sense?

It happens that it's safe in this current patch set because indeed we
always process individual ports, datapaths, lbs, etc, in the same
thread.  However, this code (lflow_table_add_lflow()) is generic and
there's nothing (not even a comment) that would warn developers in the
future about the potential race if the lflow_ref is shared.

I spoke to Numan offline a bit about this and I think the current plan
is to leave it as is and add proper locking as a follow up (or in v7).
But I think we still need a clear comment here warning users about this.
 Maybe we should add a comment where the lflow_ref structure is defined too.

What do you think?

Regards,
Dumitru

> 
> Thanks,
> Han
> 
>> That might corrupt the map.
>>
>> I guess, if we don't want to cause more performance degradation we
>> should maintain as many lflow_ref instances as we do hash_locks, i.e.,
>> LFLOW_HASH_LOCK_MASK + 1.  Will that even be possible?
>>
>> Wdyt?
>>
>> Regards,
>> Dumitru
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v4 0/4] OVN-IC: add basic sequence number status support

2024-01-24 Thread Ales Musil
On Wed, Jan 24, 2024 at 9:44 AM Mohammad Heib  wrote:

> Hi Ales,
>
> Yes sure that's what I'm planning to do next, i first wanted to make those
> patches accepted upstream and then update the test case to use
> the sync command, i guess it will be easier to review if  send the test
> cases change as a separate patch, what do you think?
>

Separate series/patch is completely fine once this series is accepted.

Thanks,
Ales


>
> On Fri, Jan 12, 2024 at 11:14 AM Ales Musil  wrote:
>
>>
>>
>> On Tue, Jan 9, 2024 at 2:29 PM Mohammad Heib  wrote:
>>
>>> Currently, OVN-IC doesn't support a way to tell the end-user when their
>>> changes
>>> to the IC-NB database have propagated successfully to the IC-SB Database.
>>>
>>> This patch series adds basic support for the sequence number status
>>> protocol
>>> that is implemented on the native OVN, with this patch series the end
>>> user
>>> now can wait for their changes in the IC-NB DB to take effect by
>>> executing the
>>> 'sync' command after applying any changes to the IC-NB DB, for example,
>>> if the
>>> end-user has created a transit switch in the IC-NB global DB and want to
>>> make
>>> sure that the IC-SB create a DP binding for this ts-switch the user now
>>> can use
>>> the 'sync' command as following:
>>> $ ovn-ic-nbctl ts-add ts1
>>> $ ovn-ic-nbctl --wait=sb sync
>>>
>>> The second command will wait until all the ovn-ic instances see the new
>>> changes
>>> and update their own local dbs and the global IC-SB db.
>>>
>>> v3 -> v4
>>> 
>>> * Addressed review comments from Ales to add check for the
>>>   overflow cases.
>>>
>>> v2 -> v3
>>> 
>>> * Rebase over main.
>>> * Addressed review comments from Mark and Ales.
>>>
>>>
>>> Mohammad Heib (4):
>>>   OVN-IC: interconnect DBs add basic Information Flow columns
>>>   ovn-ic: implement basic INB change handling status
>>>   OVN-IC: Make it possible for CMS to detect when the ISB is up-to-date.
>>>   ic/tests: add unit test for ic sync command
>>>
>>>  NEWS |  8 +++
>>>  ic/ovn-ic.c  | 96 ++--
>>>  ovn-ic-nb.ovsschema  |  6 ++-
>>>  ovn-ic-nb.xml| 17 +++
>>>  ovn-ic-sb.ovsschema  |  8 +--
>>>  ovn-ic-sb.xml| 21 
>>>  tests/ovn-ic.at  | 47 ++
>>>  utilities/ovn-ic-nbctl.8.xml | 49 ++
>>>  utilities/ovn-ic-nbctl.c | 89 -
>>>  9 files changed, 329 insertions(+), 12 deletions(-)
>>>
>>> --
>>> 2.34.3
>>>
>>> ___
>>> dev mailing list
>>> d...@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>>
>> Hi Mohammad,
>>
>> thank you for the series. This suggestion is out of scope of this series,
>> but it would be great to use the new sync across all ic tests that we
>> currently have to further stabilize them. WDYT?
>>
>> Thanks,
>> Ales
>>
>>
>> --
>>
>> Ales Musil
>>
>> Senior Software Engineer - OVN Core
>>
>> Red Hat EMEA 
>>
>> amu...@redhat.com
>> 
>>
>

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA 

amu...@redhat.com

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v4 0/4] OVN-IC: add basic sequence number status support

2024-01-24 Thread Mohammad Heib
Hi Ales,

Yes sure that's what I'm planning to do next, i first wanted to make those
patches accepted upstream and then update the test case to use
the sync command, i guess it will be easier to review if  send the test
cases change as a separate patch, what do you think?

On Fri, Jan 12, 2024 at 11:14 AM Ales Musil  wrote:

>
>
> On Tue, Jan 9, 2024 at 2:29 PM Mohammad Heib  wrote:
>
>> Currently, OVN-IC doesn't support a way to tell the end-user when their
>> changes
>> to the IC-NB database have propagated successfully to the IC-SB Database.
>>
>> This patch series adds basic support for the sequence number status
>> protocol
>> that is implemented on the native OVN, with this patch series the end user
>> now can wait for their changes in the IC-NB DB to take effect by
>> executing the
>> 'sync' command after applying any changes to the IC-NB DB, for example,
>> if the
>> end-user has created a transit switch in the IC-NB global DB and want to
>> make
>> sure that the IC-SB create a DP binding for this ts-switch the user now
>> can use
>> the 'sync' command as following:
>> $ ovn-ic-nbctl ts-add ts1
>> $ ovn-ic-nbctl --wait=sb sync
>>
>> The second command will wait until all the ovn-ic instances see the new
>> changes
>> and update their own local dbs and the global IC-SB db.
>>
>> v3 -> v4
>> 
>> * Addressed review comments from Ales to add check for the
>>   overflow cases.
>>
>> v2 -> v3
>> 
>> * Rebase over main.
>> * Addressed review comments from Mark and Ales.
>>
>>
>> Mohammad Heib (4):
>>   OVN-IC: interconnect DBs add basic Information Flow columns
>>   ovn-ic: implement basic INB change handling status
>>   OVN-IC: Make it possible for CMS to detect when the ISB is up-to-date.
>>   ic/tests: add unit test for ic sync command
>>
>>  NEWS |  8 +++
>>  ic/ovn-ic.c  | 96 ++--
>>  ovn-ic-nb.ovsschema  |  6 ++-
>>  ovn-ic-nb.xml| 17 +++
>>  ovn-ic-sb.ovsschema  |  8 +--
>>  ovn-ic-sb.xml| 21 
>>  tests/ovn-ic.at  | 47 ++
>>  utilities/ovn-ic-nbctl.8.xml | 49 ++
>>  utilities/ovn-ic-nbctl.c | 89 -
>>  9 files changed, 329 insertions(+), 12 deletions(-)
>>
>> --
>> 2.34.3
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
> Hi Mohammad,
>
> thank you for the series. This suggestion is out of scope of this series,
> but it would be great to use the new sync across all ic tests that we
> currently have to further stabilize them. WDYT?
>
> Thanks,
> Ales
>
>
> --
>
> Ales Musil
>
> Senior Software Engineer - OVN Core
>
> Red Hat EMEA 
>
> amu...@redhat.com
> 
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev