Hi,

Thanks for the review Mark, replied inline.

On Wed, Aug 9, 2023 at 6:04 PM Mark Michelson <mmich...@redhat.com> wrote:
>
> Hi Lucas,
>
> I finally took some time to have a look at the patch. It mostly looks
> good, but I have some concerns below.
>
> Also there should be a NEWS item added about this.
>
> On 8/3/23 09:13, lmart...@redhat.com wrote:
> > From: Lucas Alvares Gomes <lucasago...@gmail.com>
> >
> > In order for the CMS to know which Chassis a distributed gateway port
> > is bond to, this patch updates the ovn-northd daemon to populate the
> > Logical_Router_Port table with that information.
> >
> > To avoid changing the database schema, ovn-northd is setting a new key
> > called "hosting-chassis" in the options column from the LRP table. This
> > key value points to the name of the Chassis that is currently hosting
> > the distributed port.
> >
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2107515
> > Signed-off-by: Lucas Alvares Gomes <lucasago...@gmail.com>
> > ---
> > v4 -> v5
> > * Fix test to remove sleep and assert on the right table
> >
> > v3 -> v4
> > * Removed smap_clone for updating the options and replaced it with
> >    smap_replace/smap_remove
> > * Updated the test to also assert the removal of the hosting-chassis
> >    key by northd
> >
> > v2 -> v3
> > * Fixed memory leak
> > * Separate the logic to update the router port into their own function
> > * Use attribute l3dgw_port to find the GW port
> >
> > v1 -> v2
> > * Rebased on the main branch
> > * Updated the ovnsb_db_run() and handle_port_binding_changes() functions
> >    to include the LR ports as a parameter
> >
> >   northd/en-sync-from-sb.c |  2 +-
> >   northd/northd.c          | 30 +++++++++++++++++++++++++++--
> >   northd/northd.h          |  3 ++-
> >   ovn-nb.xml               | 15 +++++++++++++++
> >   tests/ovn-northd.at      | 41 ++++++++++++++++++++++++++++++++++++++++
> >   5 files changed, 87 insertions(+), 4 deletions(-)
> >
> > diff --git a/northd/en-sync-from-sb.c b/northd/en-sync-from-sb.c
> > index 55ece2d16..4109aebe4 100644
> > --- a/northd/en-sync-from-sb.c
> > +++ b/northd/en-sync-from-sb.c
> > @@ -60,7 +60,7 @@ en_sync_from_sb_run(struct engine_node *node, void *data 
> > OVS_UNUSED)
> >       stopwatch_start(OVNSB_DB_RUN_STOPWATCH_NAME, time_msec());
> >       ovnsb_db_run(eng_ctx->ovnnb_idl_txn, eng_ctx->ovnsb_idl_txn,
> >                    sb_pb_table, sb_ha_ch_grp_table, sb_ha_ch_grp_by_name,
> > -                 &nd->ls_ports);
> > +                 &nd->ls_ports, &nd->lr_ports);
> >       stopwatch_stop(OVNSB_DB_RUN_STOPWATCH_NAME, time_msec());
> >   }
> >
> > diff --git a/northd/northd.c b/northd/northd.c
> > index b9605862e..c6752884c 100644
> > --- a/northd/northd.c
> > +++ b/northd/northd.c
> > @@ -17780,6 +17780,22 @@ build_ha_chassis_group_ref_chassis(struct 
> > ovsdb_idl_index *ha_ch_grp_by_name,
> >       }
> >   }
> >
> > +/* Set the "hosting-chassis" option in the NBDB logical_router_port
> > + * table indicating which chassis the distributed port is bond to. */
> > +static void
> > +handle_cr_port_binding_changes(const struct sbrec_port_binding *sb,
> > +                struct ovn_port *orp)
> > +{
> > +    if (sb->chassis) {
> > +        nbrec_logical_router_port_update_options_setkey(
> > +            orp->l3dgw_port->nbrp, "hosting-chassis",
> > +            sb->chassis->name);
> > +    } else {
> > +        nbrec_logical_router_port_update_options_delkey(
> > +            orp->l3dgw_port->nbrp, "hosting-chassis");
> > +    }
> > +}
> > +
> >   /* Handle changes to the 'chassis' column of the 'Port_Binding' table.  
> > When
> >    * this column is not empty, it means we need to set the corresponding 
> > logical
> >    * port as 'up' in the northbound DB. */
> > @@ -17789,6 +17805,7 @@ handle_port_binding_changes(struct ovsdb_idl_txn 
> > *ovnsb_txn,
> >                   const struct sbrec_ha_chassis_group_table 
> > *sb_ha_ch_grp_table,
> >                   struct ovsdb_idl_index *sb_ha_ch_grp_by_name,
> >                   struct hmap *ls_ports,
> > +                struct hmap *lr_ports,
> >                   struct shash *ha_ref_chassis_map)
> >   {
> >       const struct sbrec_port_binding *sb;
> > @@ -17807,6 +17824,14 @@ handle_port_binding_changes(struct ovsdb_idl_txn 
> > *ovnsb_txn,
> >       }
> >
> >       SBREC_PORT_BINDING_TABLE_FOR_EACH (sb, sb_pb_table) {
> > +
> > +        struct ovn_port *orp = ovn_port_find(lr_ports, sb->logical_port);
> > +
> > +        if (orp && is_cr_port(orp)) {
> > +            handle_cr_port_binding_changes(sb, orp);
> > +            continue;
> > +        }
> > +
> >           struct ovn_port *op = ovn_port_find(ls_ports, sb->logical_port);
> >
> >           if (!op || !op->nbsp) {
> > @@ -17855,7 +17880,8 @@ ovnsb_db_run(struct ovsdb_idl_txn *ovnnb_txn,
> >                const struct sbrec_port_binding_table *sb_pb_table,
> >                const struct sbrec_ha_chassis_group_table 
> > *sb_ha_ch_grp_table,
> >                struct ovsdb_idl_index *sb_ha_ch_grp_by_name,
> > -             struct hmap *ls_ports)
> > +             struct hmap *ls_ports,
> > +             struct hmap *lr_ports)
> >   {
> >       if (!ovnnb_txn ||
> >           !ovsdb_idl_has_ever_connected(ovsdb_idl_txn_get_idl(ovnsb_txn))) {
> > @@ -17864,7 +17890,7 @@ ovnsb_db_run(struct ovsdb_idl_txn *ovnnb_txn,
> >
> >       struct shash ha_ref_chassis_map = 
> > SHASH_INITIALIZER(&ha_ref_chassis_map);
> >       handle_port_binding_changes(ovnsb_txn, sb_pb_table, 
> > sb_ha_ch_grp_table,
> > -                                sb_ha_ch_grp_by_name, ls_ports,
> > +                                sb_ha_ch_grp_by_name, ls_ports, lr_ports,
> >                                   &ha_ref_chassis_map);
> >       if (ovnsb_txn) {
> >           update_sb_ha_group_ref_chassis(sb_ha_ch_grp_table,
> > diff --git a/northd/northd.h b/northd/northd.h
> > index f3e63b1e1..44dc11009 100644
> > --- a/northd/northd.h
> > +++ b/northd/northd.h
> > @@ -326,7 +326,8 @@ void ovnsb_db_run(struct ovsdb_idl_txn *ovnnb_txn,
> >                     const struct sbrec_port_binding_table *,
> >                     const struct sbrec_ha_chassis_group_table *,
> >                     struct ovsdb_idl_index *sb_ha_ch_grp_by_name,
> > -                  struct hmap *ls_ports);
> > +                  struct hmap *ls_ports,
> > +                  struct hmap *lr_ports);
> >   bool northd_handle_ls_changes(struct ovsdb_idl_txn *,
> >                                 const struct northd_input *,
> >                                 struct northd_data *);
> > diff --git a/ovn-nb.xml b/ovn-nb.xml
> > index 4fbf4f7e5..b1e9ba724 100644
> > --- a/ovn-nb.xml
> > +++ b/ovn-nb.xml
> > @@ -3113,6 +3113,21 @@ or
> >               port on the logical router.  It is otherwise ignored.
> >             </p>
> >           </column>
> > +
> > +        <column name="options" key="hosting-chassis">
>
> I'm curious if the "options" column is the best place to store this
> information. I have three concerns.
>
> 1) My most minor concern has to do with how options are propagated to
> the southbound database. I'm fairly certain that options set on a
> logical router port are all copied to the southbound Port_Binding. This
> will lead to options:hosting-chassis being set on the southbound port
> binding. This isn't too bad, except that it doesn't really mean anything
> and can lead to confusion by someone inspecting the SB DB.
>
> 2) Next, my medium concern is that CMSes are used to writing to the
> options column to configure the logical router port. This means the CMS
> has the power to do two things that may not be great:
>     a) The CMS can clear the options column.
>     b) The CMS can explicitly set the hosting-chassis to something
> incorrect.
> In both cases, ovn-northd should eventually re-write the correct value
> to the DB, but there will be a brief period where there appears to be
> either an incorrect or missing hosting-chassis on the LRP.
>
> 3) Finally, my largest concern is that ovn-northd has to monitor the
> "options" column on logical router ports since that is where options are
> configured. By writing to this column, it will lead to ovn-northd
> immediately waking up again since the "options" column has changed. This
> can lead to unnecessary processing in ovn-northd since we do not have
> any incremental processing defined for LRPs.
>
>
> My suggestion for this would be to write this value to a different
> column. We can either have a dedicated column for this (e.g.
> "hosting-chassis") or if we expect to add more status elements for LRPs
> in the future, then we could create a key-value style column for this
> (e.g. "status:hosting-chassis"). This right away fixes point 1.
>

Yeah, I wasn't sure either whether re-using the options column was a
good option or not. I thought it would be better because it makes
things more backward compatible, but indeed after reading your
concerns I tend to agree that a new column may be better for this.

If we go with a new column I think it would be good to make it more
future proof and create it as a key-value column. I like "status" as a
name for the new column as you suggested, so I will go with it.

> Based on point 3, you might expect me to say that ovn-northd should not
> monitor this column since it would cause ovn-northd to wake up
> immediately after writing to the column. However, if ovn-northd does not
> monitor this column, then it cannot reliably correct the value if the
> CMS writes something to the column. So we still end up having the same
> problem as before. However, by isolating the change to a dedicated
> column that should only be written by ovn-northd, it allows for future
> incremental processing of LRPS to more easily process this change
> compared to using the options column.
>
> This addresses concerns 1 and 3, but it doesn't address concern 2
> directly. There's no way to outright block a CMS from writing to a
> column in the northbound database [1]. However, by moving to a dedicated
> column, there is less of a chance of the CMS doing something bad by
> accident, such as clearing the "options" column on the LRP since the CMS
> is not accustomed to
>

Indeed.

> > +          <p>
> > +            This option is populated by <code>ovn-northd</code>, rather
> > +            than by the CMS plugin.
> > +          </p>
> > +
> > +          <p>
> > +            When a distributed gateway port is bound to a location in
> > +            the OVN Southbound database
> > +            <ref db="OVN_Southbound" table="Port_Binding"/>
> > +            <code>ovn-northd</code> will populate this option with the
> > +            name of the Chassis that is currently hosting this port.
> > +          </p>
> > +        </column>
> >         </group>
> >       </group>
> >
> > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> > index 3e06f14c9..99634d51f 100644
> > --- a/tests/ovn-northd.at
> > +++ b/tests/ovn-northd.at
> > @@ -169,6 +169,47 @@ AT_CHECK([test x`ovn-nbctl lsp-get-up S1-R1` = xup])
> >   AT_CLEANUP
> >   ])
> >
> > +OVN_FOR_EACH_NORTHD_NO_HV([
> > +AT_SETUP([check Logical Router Port hosting-chassis option])
> > +ovn_start
> > +
> > +check ovn-sbctl chassis-add ch1 geneve 127.0.0.2
> > +
> > +check ovn-nbctl lr-add lr1
> > +check ovn-nbctl lrp-add lr1 lrp1 00:00:00:00:00:01 10.0.0.1/24
> > +check ovn-nbctl ls-add ls1
> > +check ovn-nbctl lsp-add ls1 lsp1 -- \
> > +    lsp-set-addresses lsp1 router -- \
> > +    lsp-set-type lsp1 router -- \
> > +    lsp-set-options lsp1 router-port=lrp1
> > +
> > +# Make lrp a cr-port
> > +check ovn-nbctl lrp-set-gateway-chassis lrp1 ch1
> > +
> > +check ovn-nbctl --wait=sb sync
> > +
> > +wait_row_count Port_Binding 1 logical_port=cr-lrp1 \
> > +    options:always-redirect="true" options:distributed-port="lrp1"
> > +
> > +# Simulate cr-port being bound to ch1
> > +ch1_uuid=`ovn-sbctl --bare --columns _uuid find Chassis name="ch1"`
> > +check ovn-sbctl set Port_Binding cr-lrp1 chassis=${ch1_uuid}
> > +
> > +check ovn-nbctl --wait=sb sync
> > +
> > +# Check for the hosting-chassis option being set by northd
> > +wait_row_count nb:Logical_Router_Port 1 name=lrp1 
> > options:hosting-chassis=ch1
>
> Before removing the chassis from the Port_Binding, may I suggest adding
> a test where you either
>
> a) clear options:hosting-chassis, or
> b) Set options:hosting-chassis to some arbitrary value.
>
> Then ensure that ovn-northd ends up setting options:hosting-chassis back
> to ch1.
>

Sure, will do!

Appreciate the review, sorry for the delay in replying to this, I was
caught up in other things. I will send a new patch version including
this new column and tests updates shortly.


> > +
> > +# Now remove the chassis from the port binding record and assert that the
> > +# hosting-chassis option was removed by northd
> > +check ovn-sbctl clear Port_Binding cr-lrp1 chassis
> > +check ovn-nbctl --wait=sb sync
> > +
> > +wait_row_count nb:Logical_Router_Port 0 name=lrp1 
> > options:hosting-chassis=ch1
> > +
> > +AT_CLEANUP
> > +])
> > +
> >   OVN_FOR_EACH_NORTHD_NO_HV([
> >   AT_SETUP([check LRP external id propagation to SBDB])
> >   ovn_start
> > --
> > 2.41.0
> >
>
> Thanks,
> Mark Michelson
>
>
> [1] If we *really* wanted to try to prevent writing to this column, then
> we could add restrictions to ovn-nbctl to prevent writes to the column.
> It still wouldn't prevent CMSes from writing to the column through other
> means, but it would prevent a common method of writing to the DB. Also,
> this would be *far* outside the scope of the task you're currently
> doing, so please don't actually make these ovn-nbctl changes :)
>
> > _______________________________________________
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
>

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to