Hi, Thanks for the review Mark, replied inline.
On Wed, Aug 9, 2023 at 6:04 PM Mark Michelson <mmich...@redhat.com> wrote: > > Hi Lucas, > > I finally took some time to have a look at the patch. It mostly looks > good, but I have some concerns below. > > Also there should be a NEWS item added about this. > > On 8/3/23 09:13, lmart...@redhat.com wrote: > > From: Lucas Alvares Gomes <lucasago...@gmail.com> > > > > In order for the CMS to know which Chassis a distributed gateway port > > is bond to, this patch updates the ovn-northd daemon to populate the > > Logical_Router_Port table with that information. > > > > To avoid changing the database schema, ovn-northd is setting a new key > > called "hosting-chassis" in the options column from the LRP table. This > > key value points to the name of the Chassis that is currently hosting > > the distributed port. > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2107515 > > Signed-off-by: Lucas Alvares Gomes <lucasago...@gmail.com> > > --- > > v4 -> v5 > > * Fix test to remove sleep and assert on the right table > > > > v3 -> v4 > > * Removed smap_clone for updating the options and replaced it with > > smap_replace/smap_remove > > * Updated the test to also assert the removal of the hosting-chassis > > key by northd > > > > v2 -> v3 > > * Fixed memory leak > > * Separate the logic to update the router port into their own function > > * Use attribute l3dgw_port to find the GW port > > > > v1 -> v2 > > * Rebased on the main branch > > * Updated the ovnsb_db_run() and handle_port_binding_changes() functions > > to include the LR ports as a parameter > > > > northd/en-sync-from-sb.c | 2 +- > > northd/northd.c | 30 +++++++++++++++++++++++++++-- > > northd/northd.h | 3 ++- > > ovn-nb.xml | 15 +++++++++++++++ > > tests/ovn-northd.at | 41 ++++++++++++++++++++++++++++++++++++++++ > > 5 files changed, 87 insertions(+), 4 deletions(-) > > > > diff --git a/northd/en-sync-from-sb.c b/northd/en-sync-from-sb.c > > index 55ece2d16..4109aebe4 100644 > > --- a/northd/en-sync-from-sb.c > > +++ b/northd/en-sync-from-sb.c > > @@ -60,7 +60,7 @@ en_sync_from_sb_run(struct engine_node *node, void *data > > OVS_UNUSED) > > stopwatch_start(OVNSB_DB_RUN_STOPWATCH_NAME, time_msec()); > > ovnsb_db_run(eng_ctx->ovnnb_idl_txn, eng_ctx->ovnsb_idl_txn, > > sb_pb_table, sb_ha_ch_grp_table, sb_ha_ch_grp_by_name, > > - &nd->ls_ports); > > + &nd->ls_ports, &nd->lr_ports); > > stopwatch_stop(OVNSB_DB_RUN_STOPWATCH_NAME, time_msec()); > > } > > > > diff --git a/northd/northd.c b/northd/northd.c > > index b9605862e..c6752884c 100644 > > --- a/northd/northd.c > > +++ b/northd/northd.c > > @@ -17780,6 +17780,22 @@ build_ha_chassis_group_ref_chassis(struct > > ovsdb_idl_index *ha_ch_grp_by_name, > > } > > } > > > > +/* Set the "hosting-chassis" option in the NBDB logical_router_port > > + * table indicating which chassis the distributed port is bond to. */ > > +static void > > +handle_cr_port_binding_changes(const struct sbrec_port_binding *sb, > > + struct ovn_port *orp) > > +{ > > + if (sb->chassis) { > > + nbrec_logical_router_port_update_options_setkey( > > + orp->l3dgw_port->nbrp, "hosting-chassis", > > + sb->chassis->name); > > + } else { > > + nbrec_logical_router_port_update_options_delkey( > > + orp->l3dgw_port->nbrp, "hosting-chassis"); > > + } > > +} > > + > > /* Handle changes to the 'chassis' column of the 'Port_Binding' table. > > When > > * this column is not empty, it means we need to set the corresponding > > logical > > * port as 'up' in the northbound DB. */ > > @@ -17789,6 +17805,7 @@ handle_port_binding_changes(struct ovsdb_idl_txn > > *ovnsb_txn, > > const struct sbrec_ha_chassis_group_table > > *sb_ha_ch_grp_table, > > struct ovsdb_idl_index *sb_ha_ch_grp_by_name, > > struct hmap *ls_ports, > > + struct hmap *lr_ports, > > struct shash *ha_ref_chassis_map) > > { > > const struct sbrec_port_binding *sb; > > @@ -17807,6 +17824,14 @@ handle_port_binding_changes(struct ovsdb_idl_txn > > *ovnsb_txn, > > } > > > > SBREC_PORT_BINDING_TABLE_FOR_EACH (sb, sb_pb_table) { > > + > > + struct ovn_port *orp = ovn_port_find(lr_ports, sb->logical_port); > > + > > + if (orp && is_cr_port(orp)) { > > + handle_cr_port_binding_changes(sb, orp); > > + continue; > > + } > > + > > struct ovn_port *op = ovn_port_find(ls_ports, sb->logical_port); > > > > if (!op || !op->nbsp) { > > @@ -17855,7 +17880,8 @@ ovnsb_db_run(struct ovsdb_idl_txn *ovnnb_txn, > > const struct sbrec_port_binding_table *sb_pb_table, > > const struct sbrec_ha_chassis_group_table > > *sb_ha_ch_grp_table, > > struct ovsdb_idl_index *sb_ha_ch_grp_by_name, > > - struct hmap *ls_ports) > > + struct hmap *ls_ports, > > + struct hmap *lr_ports) > > { > > if (!ovnnb_txn || > > !ovsdb_idl_has_ever_connected(ovsdb_idl_txn_get_idl(ovnsb_txn))) { > > @@ -17864,7 +17890,7 @@ ovnsb_db_run(struct ovsdb_idl_txn *ovnnb_txn, > > > > struct shash ha_ref_chassis_map = > > SHASH_INITIALIZER(&ha_ref_chassis_map); > > handle_port_binding_changes(ovnsb_txn, sb_pb_table, > > sb_ha_ch_grp_table, > > - sb_ha_ch_grp_by_name, ls_ports, > > + sb_ha_ch_grp_by_name, ls_ports, lr_ports, > > &ha_ref_chassis_map); > > if (ovnsb_txn) { > > update_sb_ha_group_ref_chassis(sb_ha_ch_grp_table, > > diff --git a/northd/northd.h b/northd/northd.h > > index f3e63b1e1..44dc11009 100644 > > --- a/northd/northd.h > > +++ b/northd/northd.h > > @@ -326,7 +326,8 @@ void ovnsb_db_run(struct ovsdb_idl_txn *ovnnb_txn, > > const struct sbrec_port_binding_table *, > > const struct sbrec_ha_chassis_group_table *, > > struct ovsdb_idl_index *sb_ha_ch_grp_by_name, > > - struct hmap *ls_ports); > > + struct hmap *ls_ports, > > + struct hmap *lr_ports); > > bool northd_handle_ls_changes(struct ovsdb_idl_txn *, > > const struct northd_input *, > > struct northd_data *); > > diff --git a/ovn-nb.xml b/ovn-nb.xml > > index 4fbf4f7e5..b1e9ba724 100644 > > --- a/ovn-nb.xml > > +++ b/ovn-nb.xml > > @@ -3113,6 +3113,21 @@ or > > port on the logical router. It is otherwise ignored. > > </p> > > </column> > > + > > + <column name="options" key="hosting-chassis"> > > I'm curious if the "options" column is the best place to store this > information. I have three concerns. > > 1) My most minor concern has to do with how options are propagated to > the southbound database. I'm fairly certain that options set on a > logical router port are all copied to the southbound Port_Binding. This > will lead to options:hosting-chassis being set on the southbound port > binding. This isn't too bad, except that it doesn't really mean anything > and can lead to confusion by someone inspecting the SB DB. > > 2) Next, my medium concern is that CMSes are used to writing to the > options column to configure the logical router port. This means the CMS > has the power to do two things that may not be great: > a) The CMS can clear the options column. > b) The CMS can explicitly set the hosting-chassis to something > incorrect. > In both cases, ovn-northd should eventually re-write the correct value > to the DB, but there will be a brief period where there appears to be > either an incorrect or missing hosting-chassis on the LRP. > > 3) Finally, my largest concern is that ovn-northd has to monitor the > "options" column on logical router ports since that is where options are > configured. By writing to this column, it will lead to ovn-northd > immediately waking up again since the "options" column has changed. This > can lead to unnecessary processing in ovn-northd since we do not have > any incremental processing defined for LRPs. > > > My suggestion for this would be to write this value to a different > column. We can either have a dedicated column for this (e.g. > "hosting-chassis") or if we expect to add more status elements for LRPs > in the future, then we could create a key-value style column for this > (e.g. "status:hosting-chassis"). This right away fixes point 1. > Yeah, I wasn't sure either whether re-using the options column was a good option or not. I thought it would be better because it makes things more backward compatible, but indeed after reading your concerns I tend to agree that a new column may be better for this. If we go with a new column I think it would be good to make it more future proof and create it as a key-value column. I like "status" as a name for the new column as you suggested, so I will go with it. > Based on point 3, you might expect me to say that ovn-northd should not > monitor this column since it would cause ovn-northd to wake up > immediately after writing to the column. However, if ovn-northd does not > monitor this column, then it cannot reliably correct the value if the > CMS writes something to the column. So we still end up having the same > problem as before. However, by isolating the change to a dedicated > column that should only be written by ovn-northd, it allows for future > incremental processing of LRPS to more easily process this change > compared to using the options column. > > This addresses concerns 1 and 3, but it doesn't address concern 2 > directly. There's no way to outright block a CMS from writing to a > column in the northbound database [1]. However, by moving to a dedicated > column, there is less of a chance of the CMS doing something bad by > accident, such as clearing the "options" column on the LRP since the CMS > is not accustomed to > Indeed. > > + <p> > > + This option is populated by <code>ovn-northd</code>, rather > > + than by the CMS plugin. > > + </p> > > + > > + <p> > > + When a distributed gateway port is bound to a location in > > + the OVN Southbound database > > + <ref db="OVN_Southbound" table="Port_Binding"/> > > + <code>ovn-northd</code> will populate this option with the > > + name of the Chassis that is currently hosting this port. > > + </p> > > + </column> > > </group> > > </group> > > > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > > index 3e06f14c9..99634d51f 100644 > > --- a/tests/ovn-northd.at > > +++ b/tests/ovn-northd.at > > @@ -169,6 +169,47 @@ AT_CHECK([test x`ovn-nbctl lsp-get-up S1-R1` = xup]) > > AT_CLEANUP > > ]) > > > > +OVN_FOR_EACH_NORTHD_NO_HV([ > > +AT_SETUP([check Logical Router Port hosting-chassis option]) > > +ovn_start > > + > > +check ovn-sbctl chassis-add ch1 geneve 127.0.0.2 > > + > > +check ovn-nbctl lr-add lr1 > > +check ovn-nbctl lrp-add lr1 lrp1 00:00:00:00:00:01 10.0.0.1/24 > > +check ovn-nbctl ls-add ls1 > > +check ovn-nbctl lsp-add ls1 lsp1 -- \ > > + lsp-set-addresses lsp1 router -- \ > > + lsp-set-type lsp1 router -- \ > > + lsp-set-options lsp1 router-port=lrp1 > > + > > +# Make lrp a cr-port > > +check ovn-nbctl lrp-set-gateway-chassis lrp1 ch1 > > + > > +check ovn-nbctl --wait=sb sync > > + > > +wait_row_count Port_Binding 1 logical_port=cr-lrp1 \ > > + options:always-redirect="true" options:distributed-port="lrp1" > > + > > +# Simulate cr-port being bound to ch1 > > +ch1_uuid=`ovn-sbctl --bare --columns _uuid find Chassis name="ch1"` > > +check ovn-sbctl set Port_Binding cr-lrp1 chassis=${ch1_uuid} > > + > > +check ovn-nbctl --wait=sb sync > > + > > +# Check for the hosting-chassis option being set by northd > > +wait_row_count nb:Logical_Router_Port 1 name=lrp1 > > options:hosting-chassis=ch1 > > Before removing the chassis from the Port_Binding, may I suggest adding > a test where you either > > a) clear options:hosting-chassis, or > b) Set options:hosting-chassis to some arbitrary value. > > Then ensure that ovn-northd ends up setting options:hosting-chassis back > to ch1. > Sure, will do! Appreciate the review, sorry for the delay in replying to this, I was caught up in other things. I will send a new patch version including this new column and tests updates shortly. > > + > > +# Now remove the chassis from the port binding record and assert that the > > +# hosting-chassis option was removed by northd > > +check ovn-sbctl clear Port_Binding cr-lrp1 chassis > > +check ovn-nbctl --wait=sb sync > > + > > +wait_row_count nb:Logical_Router_Port 0 name=lrp1 > > options:hosting-chassis=ch1 > > + > > +AT_CLEANUP > > +]) > > + > > OVN_FOR_EACH_NORTHD_NO_HV([ > > AT_SETUP([check LRP external id propagation to SBDB]) > > ovn_start > > -- > > 2.41.0 > > > > Thanks, > Mark Michelson > > > [1] If we *really* wanted to try to prevent writing to this column, then > we could add restrictions to ovn-nbctl to prevent writes to the column. > It still wouldn't prevent CMSes from writing to the column through other > means, but it would prevent a common method of writing to the DB. Also, > this would be *far* outside the scope of the task you're currently > doing, so please don't actually make these ovn-nbctl changes :) > > > _______________________________________________ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev