Hi Lucas,
I finally took some time to have a look at the patch. It mostly looks
good, but I have some concerns below.
Also there should be a NEWS item added about this.
On 8/3/23 09:13, lmart...@redhat.com wrote:
From: Lucas Alvares Gomes <lucasago...@gmail.com>
In order for the CMS to know which Chassis a distributed gateway port
is bond to, this patch updates the ovn-northd daemon to populate the
Logical_Router_Port table with that information.
To avoid changing the database schema, ovn-northd is setting a new key
called "hosting-chassis" in the options column from the LRP table. This
key value points to the name of the Chassis that is currently hosting
the distributed port.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2107515
Signed-off-by: Lucas Alvares Gomes <lucasago...@gmail.com>
---
v4 -> v5
* Fix test to remove sleep and assert on the right table
v3 -> v4
* Removed smap_clone for updating the options and replaced it with
smap_replace/smap_remove
* Updated the test to also assert the removal of the hosting-chassis
key by northd
v2 -> v3
* Fixed memory leak
* Separate the logic to update the router port into their own function
* Use attribute l3dgw_port to find the GW port
v1 -> v2
* Rebased on the main branch
* Updated the ovnsb_db_run() and handle_port_binding_changes() functions
to include the LR ports as a parameter
northd/en-sync-from-sb.c | 2 +-
northd/northd.c | 30 +++++++++++++++++++++++++++--
northd/northd.h | 3 ++-
ovn-nb.xml | 15 +++++++++++++++
tests/ovn-northd.at | 41 ++++++++++++++++++++++++++++++++++++++++
5 files changed, 87 insertions(+), 4 deletions(-)
diff --git a/northd/en-sync-from-sb.c b/northd/en-sync-from-sb.c
index 55ece2d16..4109aebe4 100644
--- a/northd/en-sync-from-sb.c
+++ b/northd/en-sync-from-sb.c
@@ -60,7 +60,7 @@ en_sync_from_sb_run(struct engine_node *node, void *data
OVS_UNUSED)
stopwatch_start(OVNSB_DB_RUN_STOPWATCH_NAME, time_msec());
ovnsb_db_run(eng_ctx->ovnnb_idl_txn, eng_ctx->ovnsb_idl_txn,
sb_pb_table, sb_ha_ch_grp_table, sb_ha_ch_grp_by_name,
- &nd->ls_ports);
+ &nd->ls_ports, &nd->lr_ports);
stopwatch_stop(OVNSB_DB_RUN_STOPWATCH_NAME, time_msec());
}
diff --git a/northd/northd.c b/northd/northd.c
index b9605862e..c6752884c 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -17780,6 +17780,22 @@ build_ha_chassis_group_ref_chassis(struct
ovsdb_idl_index *ha_ch_grp_by_name,
}
}
+/* Set the "hosting-chassis" option in the NBDB logical_router_port
+ * table indicating which chassis the distributed port is bond to. */
+static void
+handle_cr_port_binding_changes(const struct sbrec_port_binding *sb,
+ struct ovn_port *orp)
+{
+ if (sb->chassis) {
+ nbrec_logical_router_port_update_options_setkey(
+ orp->l3dgw_port->nbrp, "hosting-chassis",
+ sb->chassis->name);
+ } else {
+ nbrec_logical_router_port_update_options_delkey(
+ orp->l3dgw_port->nbrp, "hosting-chassis");
+ }
+}
+
/* Handle changes to the 'chassis' column of the 'Port_Binding' table. When
* this column is not empty, it means we need to set the corresponding logical
* port as 'up' in the northbound DB. */
@@ -17789,6 +17805,7 @@ handle_port_binding_changes(struct ovsdb_idl_txn
*ovnsb_txn,
const struct sbrec_ha_chassis_group_table *sb_ha_ch_grp_table,
struct ovsdb_idl_index *sb_ha_ch_grp_by_name,
struct hmap *ls_ports,
+ struct hmap *lr_ports,
struct shash *ha_ref_chassis_map)
{
const struct sbrec_port_binding *sb;
@@ -17807,6 +17824,14 @@ handle_port_binding_changes(struct ovsdb_idl_txn
*ovnsb_txn,
}
SBREC_PORT_BINDING_TABLE_FOR_EACH (sb, sb_pb_table) {
+
+ struct ovn_port *orp = ovn_port_find(lr_ports, sb->logical_port);
+
+ if (orp && is_cr_port(orp)) {
+ handle_cr_port_binding_changes(sb, orp);
+ continue;
+ }
+
struct ovn_port *op = ovn_port_find(ls_ports, sb->logical_port);
if (!op || !op->nbsp) {
@@ -17855,7 +17880,8 @@ ovnsb_db_run(struct ovsdb_idl_txn *ovnnb_txn,
const struct sbrec_port_binding_table *sb_pb_table,
const struct sbrec_ha_chassis_group_table *sb_ha_ch_grp_table,
struct ovsdb_idl_index *sb_ha_ch_grp_by_name,
- struct hmap *ls_ports)
+ struct hmap *ls_ports,
+ struct hmap *lr_ports)
{
if (!ovnnb_txn ||
!ovsdb_idl_has_ever_connected(ovsdb_idl_txn_get_idl(ovnsb_txn))) {
@@ -17864,7 +17890,7 @@ ovnsb_db_run(struct ovsdb_idl_txn *ovnnb_txn,
struct shash ha_ref_chassis_map = SHASH_INITIALIZER(&ha_ref_chassis_map);
handle_port_binding_changes(ovnsb_txn, sb_pb_table, sb_ha_ch_grp_table,
- sb_ha_ch_grp_by_name, ls_ports,
+ sb_ha_ch_grp_by_name, ls_ports, lr_ports,
&ha_ref_chassis_map);
if (ovnsb_txn) {
update_sb_ha_group_ref_chassis(sb_ha_ch_grp_table,
diff --git a/northd/northd.h b/northd/northd.h
index f3e63b1e1..44dc11009 100644
--- a/northd/northd.h
+++ b/northd/northd.h
@@ -326,7 +326,8 @@ void ovnsb_db_run(struct ovsdb_idl_txn *ovnnb_txn,
const struct sbrec_port_binding_table *,
const struct sbrec_ha_chassis_group_table *,
struct ovsdb_idl_index *sb_ha_ch_grp_by_name,
- struct hmap *ls_ports);
+ struct hmap *ls_ports,
+ struct hmap *lr_ports);
bool northd_handle_ls_changes(struct ovsdb_idl_txn *,
const struct northd_input *,
struct northd_data *);
diff --git a/ovn-nb.xml b/ovn-nb.xml
index 4fbf4f7e5..b1e9ba724 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -3113,6 +3113,21 @@ or
port on the logical router. It is otherwise ignored.
</p>
</column>
+
+ <column name="options" key="hosting-chassis">
I'm curious if the "options" column is the best place to store this
information. I have three concerns.
1) My most minor concern has to do with how options are propagated to
the southbound database. I'm fairly certain that options set on a
logical router port are all copied to the southbound Port_Binding. This
will lead to options:hosting-chassis being set on the southbound port
binding. This isn't too bad, except that it doesn't really mean anything
and can lead to confusion by someone inspecting the SB DB.
2) Next, my medium concern is that CMSes are used to writing to the
options column to configure the logical router port. This means the CMS
has the power to do two things that may not be great:
a) The CMS can clear the options column.
b) The CMS can explicitly set the hosting-chassis to something
incorrect.
In both cases, ovn-northd should eventually re-write the correct value
to the DB, but there will be a brief period where there appears to be
either an incorrect or missing hosting-chassis on the LRP.
3) Finally, my largest concern is that ovn-northd has to monitor the
"options" column on logical router ports since that is where options are
configured. By writing to this column, it will lead to ovn-northd
immediately waking up again since the "options" column has changed. This
can lead to unnecessary processing in ovn-northd since we do not have
any incremental processing defined for LRPs.
My suggestion for this would be to write this value to a different
column. We can either have a dedicated column for this (e.g.
"hosting-chassis") or if we expect to add more status elements for LRPs
in the future, then we could create a key-value style column for this
(e.g. "status:hosting-chassis"). This right away fixes point 1.
Based on point 3, you might expect me to say that ovn-northd should not
monitor this column since it would cause ovn-northd to wake up
immediately after writing to the column. However, if ovn-northd does not
monitor this column, then it cannot reliably correct the value if the
CMS writes something to the column. So we still end up having the same
problem as before. However, by isolating the change to a dedicated
column that should only be written by ovn-northd, it allows for future
incremental processing of LRPS to more easily process this change
compared to using the options column.
This addresses concerns 1 and 3, but it doesn't address concern 2
directly. There's no way to outright block a CMS from writing to a
column in the northbound database [1]. However, by moving to a dedicated
column, there is less of a chance of the CMS doing something bad by
accident, such as clearing the "options" column on the LRP since the CMS
is not accustomed to
+ <p>
+ This option is populated by <code>ovn-northd</code>, rather
+ than by the CMS plugin.
+ </p>
+
+ <p>
+ When a distributed gateway port is bound to a location in
+ the OVN Southbound database
+ <ref db="OVN_Southbound" table="Port_Binding"/>
+ <code>ovn-northd</code> will populate this option with the
+ name of the Chassis that is currently hosting this port.
+ </p>
+ </column>
</group>
</group>
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 3e06f14c9..99634d51f 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -169,6 +169,47 @@ AT_CHECK([test x`ovn-nbctl lsp-get-up S1-R1` = xup])
AT_CLEANUP
])
+OVN_FOR_EACH_NORTHD_NO_HV([
+AT_SETUP([check Logical Router Port hosting-chassis option])
+ovn_start
+
+check ovn-sbctl chassis-add ch1 geneve 127.0.0.2
+
+check ovn-nbctl lr-add lr1
+check ovn-nbctl lrp-add lr1 lrp1 00:00:00:00:00:01 10.0.0.1/24
+check ovn-nbctl ls-add ls1
+check ovn-nbctl lsp-add ls1 lsp1 -- \
+ lsp-set-addresses lsp1 router -- \
+ lsp-set-type lsp1 router -- \
+ lsp-set-options lsp1 router-port=lrp1
+
+# Make lrp a cr-port
+check ovn-nbctl lrp-set-gateway-chassis lrp1 ch1
+
+check ovn-nbctl --wait=sb sync
+
+wait_row_count Port_Binding 1 logical_port=cr-lrp1 \
+ options:always-redirect="true" options:distributed-port="lrp1"
+
+# Simulate cr-port being bound to ch1
+ch1_uuid=`ovn-sbctl --bare --columns _uuid find Chassis name="ch1"`
+check ovn-sbctl set Port_Binding cr-lrp1 chassis=${ch1_uuid}
+
+check ovn-nbctl --wait=sb sync
+
+# Check for the hosting-chassis option being set by northd
+wait_row_count nb:Logical_Router_Port 1 name=lrp1 options:hosting-chassis=ch1
Before removing the chassis from the Port_Binding, may I suggest adding
a test where you either
a) clear options:hosting-chassis, or
b) Set options:hosting-chassis to some arbitrary value.
Then ensure that ovn-northd ends up setting options:hosting-chassis back
to ch1.
+
+# Now remove the chassis from the port binding record and assert that the
+# hosting-chassis option was removed by northd
+check ovn-sbctl clear Port_Binding cr-lrp1 chassis
+check ovn-nbctl --wait=sb sync
+
+wait_row_count nb:Logical_Router_Port 0 name=lrp1 options:hosting-chassis=ch1
+
+AT_CLEANUP
+])
+
OVN_FOR_EACH_NORTHD_NO_HV([
AT_SETUP([check LRP external id propagation to SBDB])
ovn_start
--
2.41.0
Thanks,
Mark Michelson
[1] If we *really* wanted to try to prevent writing to this column, then
we could add restrictions to ovn-nbctl to prevent writes to the column.
It still wouldn't prevent CMSes from writing to the column through other
means, but it would prevent a common method of writing to the DB. Also,
this would be *far* outside the scope of the task you're currently
doing, so please don't actually make these ovn-nbctl changes :)
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev