On 2/19/20 4:37 PM, [email protected] wrote: > From: Lucas Alvares Gomes <[email protected]> > > NOTE: SENDING THIS PATCH AS A RFC TO SEE WHAT OTHERS MIGHT THINK ABOUT > THE IDEA PRIOR TO WRITTING TESTS TO IT. > > CMSes integrating with OVN often uses the nb_cfg mechanism as a way to > check the health status of the ovn-controller process but, the current > implementation isn't ideal for that purpose because it floods the control > plane with update notifications every time the nb_cfg value is > incremented. > > This patch is merging two ideas: > > 1) Han Zhou proposed a patch [0] creating a table called Chassis_Private > where each hypervisor *only* writes and monitors its own record to > avoid this flooding problem. > > 2) Having ovn-controller to periodically log that it's alive instead of > relying on the nb_cfg mechanism. > > By using this mechanism, a CMS can more easily just read this new > Chassis_Private table and figure out the status of each Chassis > (ovn-controller) in the cluster. >
Hi Lucas, Thanks a lot for working on this! > Here's some reasons why I believe this approach is better than having > to bump the nb_cfg value: > > 1) Simple integration. Before, the CMS had to increment the nb_cfg > value in the NB_Global table and wait for it to be propagated to the > Southbound database (ovn-northd and then ovn-controllers) Chassis > table in order to know the status of the Chassis. > > Now, it can just read the Chassis_Private table and compare the > alive_at column against the current time. I guess the only comment I have about this approach is that it doesn't feel completely OK to store a timestamp representation as string in "alive_at". I'm afraid this might be too inflexible and force CMSes to compare their local time with the value in this column. I know we discussed offline that using a counter that is periodically incremented by ovn-controller instead of a timestamp would complicate the implementation in Openstack but maybe other people on this mailing list have ideas on how to deal with this in a more generic way. > > 2) Less costy. Just one read from the db is needed, no writing. Code > using the nb_cfg mechanism had to implement a few safe-guard code to > make less error prone. See [1] and [2] for example. > > 3) Backwards compatibility. The patch [0] was moving the nb_cfg value > to new table so, systems relying on it would need to update their > code upon updating OVN. Nice! One more minor comment on the code, inline. Thanks, Dumitru > > To enable this new mechanism set an option called > chassis_liveness_interval=N to the "options" column in the NB_Global > table, where N is the time (in seconds) in which the ovn-controller > should updates the alive_at column. If not set or set to 0, the > mechanism is disabled. By default, it's disabled. > > [0] https://patchwork.ozlabs.org/patch/899608/ > [1] https://review.opendev.org/#/c/703612/ > [2] https://review.opendev.org/#/c/704530/ > > Co-authored-by: Han Zhou <[email protected]> > Co-authored-by: Numan Siddique <[email protected]> > Co-authored-by: Dumitru Ceara <[email protected]> > Signed-off-by: Lucas Alvares Gomes <[email protected]> > --- > controller/chassis.c | 56 +++++++++++++++++++++++++++++++++++-- > controller/chassis.h | 11 ++++++-- > controller/ovn-controller.c | 44 +++++++++++++++++++++++++++-- > lib/chassis-index.c | 26 +++++++++++++++++ > lib/chassis-index.h | 5 ++++ > northd/ovn-northd.c | 4 +++ > ovn-sb.ovsschema | 13 +++++++-- > ovn-sb.xml | 35 +++++++++++++++++++++++ > 8 files changed, 185 insertions(+), 9 deletions(-) > > diff --git a/controller/chassis.c b/controller/chassis.c > index 522893ead..7892cb966 100644 > --- a/controller/chassis.c > +++ b/controller/chassis.c > @@ -24,6 +24,7 @@ > #include "openvswitch/dynamic-string.h" > #include "openvswitch/vlog.h" > #include "openvswitch/ofp-parse.h" > +#include "openvswitch/poll-loop.h" > #include "lib/chassis-index.h" > #include "lib/ovn-sb-idl.h" > #include "ovn-controller.h" > @@ -47,11 +48,16 @@ struct chassis_info { > > /* True if Chassis SB record is initialized, false otherwise. */ > uint32_t id_inited : 1; > + > + /* Next livennes update time to update the 'alive_at' column of > + * Chassis_Private table. */ > + long long int next_liveness_update; > }; > > static struct chassis_info chassis_state = { > .id = DS_EMPTY_INITIALIZER, > .id_inited = false, > + .next_liveness_update = LLONG_MAX, > }; > > static void > @@ -581,18 +587,38 @@ chassis_update(const struct sbrec_chassis *chassis_rec, > free(encaps); > } > > +static void > +chassis_update_liveness_cfg(const struct sbrec_chassis_private *chassis_rec, > + struct chassis_info *ch_info, > + int chassis_liveness_interval) > +{ > + if (ch_info->next_liveness_update == LLONG_MAX || > + time_msec() > ch_info->next_liveness_update) { > + sbrec_chassis_private_set_alive_at( > + chassis_rec, > + xastrftime_msec("%Y-%m-%d %H:%M:%S", time_wall_msec(), true)); > + ch_info->next_liveness_update = > + time_msec() + chassis_liveness_interval * 1000; > + } > +} > + > /* Returns this chassis's Chassis record, if it is available. */ > const struct sbrec_chassis * > chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn, > struct ovsdb_idl_index *sbrec_chassis_by_name, > + struct ovsdb_idl_index *sbrec_chassis_private_by_name, > const struct ovsrec_open_vswitch_table *ovs_table, > const struct sbrec_chassis_table *chassis_table, > const char *chassis_id, > const struct ovsrec_bridge *br_int, > - const struct sset *transport_zones) > + const struct sset *transport_zones, > + const struct sbrec_chassis_private **chassis_private, > + int chassis_liveness_interval) > { > struct ovs_chassis_cfg ovs_cfg; > > + *chassis_private = NULL; > + > /* Get the chassis config from the ovs table. */ > ovs_chassis_cfg_init(&ovs_cfg); > if (!chassis_parse_ovs_config(ovs_table, br_int, &ovs_cfg)) { > @@ -616,6 +642,24 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn, > chassis_id); > } > > + /* Create the Chassis_Private entry if it's not there yet */ > + const struct sbrec_chassis_private *chassis_private_rec = > + chassis_private_lookup_by_name(sbrec_chassis_private_by_name, > + chassis_id); > + if (!chassis_private_rec && ovnsb_idl_txn) { > + chassis_private_rec = sbrec_chassis_private_insert(ovnsb_idl_txn); > + sbrec_chassis_private_set_name(chassis_private_rec, chassis_id); > + } > + > + /* Update the alive_at column from the Chassis_Private table if > + * the chassis liveness mechanism is enabled */ > + if (chassis_private_rec && ovnsb_idl_txn && chassis_liveness_interval) { > + chassis_update_liveness_cfg(chassis_private_rec, &chassis_state, > + chassis_liveness_interval); > + } > + > + *chassis_private = chassis_private_rec; > + > ovs_chassis_cfg_destroy(&ovs_cfg); > return chassis_rec; > } > @@ -669,7 +713,8 @@ chassis_get_mac(const struct sbrec_chassis *chassis_rec, > * required. */ > bool > chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, > - const struct sbrec_chassis *chassis_rec) > + const struct sbrec_chassis *chassis_rec, > + const struct sbrec_chassis_private *chassis_private_rec) > { > if (!chassis_rec) { > return true; > @@ -679,6 +724,7 @@ chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, > "ovn-controller: unregistering chassis > '%s'", > chassis_rec->name); > sbrec_chassis_delete(chassis_rec); > + sbrec_chassis_private_delete(chassis_private_rec); > } > return false; > } > @@ -695,3 +741,9 @@ chassis_get_id(void) > > return NULL; > } > + > +void chassis_wait(int chassis_liveness_interval) { > + if (chassis_liveness_interval) { > + poll_timer_wait_until(chassis_state.next_liveness_update); > + } > +} > diff --git a/controller/chassis.h b/controller/chassis.h > index 178d2957e..38a549553 100644 > --- a/controller/chassis.h > +++ b/controller/chassis.h > @@ -17,6 +17,7 @@ > #define OVN_CHASSIS_H 1 > > #include <stdbool.h> > +#include "lib/ovn-sb-idl.h" > > struct ovsdb_idl; > struct ovsdb_idl_index; > @@ -33,17 +34,21 @@ void chassis_register_ovs_idl(struct ovsdb_idl *); > const struct sbrec_chassis *chassis_run( > struct ovsdb_idl_txn *ovnsb_idl_txn, > struct ovsdb_idl_index *sbrec_chassis_by_name, > + struct ovsdb_idl_index *sbrec_chassis_private_by_name, > const struct ovsrec_open_vswitch_table *, > const struct sbrec_chassis_table *, > const char *chassis_id, const struct ovsrec_bridge *br_int, > - const struct sset *transport_zones); > + const struct sset *transport_zones, > + const struct sbrec_chassis_private **chassis_private, > + int chassis_liveness_interval); > bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, > - const struct sbrec_chassis *); > + const struct sbrec_chassis *, > + const struct sbrec_chassis_private *); > bool chassis_get_mac(const struct sbrec_chassis *chassis, > const char *bridge_mapping, > struct eth_addr *chassis_mac); > const char *chassis_get_id(void); > const char * get_chassis_mac_mappings(const struct smap *ext_ids); > - > +void chassis_wait(int chassis_liveness_interval); > > #endif /* controller/chassis.h */ > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index 4d245ca28..fd9d95da2 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -155,6 +155,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl, > struct ovsdb_idl_condition ce = OVSDB_IDL_CONDITION_INIT(&ce); > struct ovsdb_idl_condition ip_mcast = > OVSDB_IDL_CONDITION_INIT(&ip_mcast); > struct ovsdb_idl_condition igmp = OVSDB_IDL_CONDITION_INIT(&igmp); > + struct ovsdb_idl_condition chprv = OVSDB_IDL_CONDITION_INIT(&chprv); > > if (monitor_all) { > ovsdb_idl_condition_add_clause_true(&pb); > @@ -165,6 +166,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl, > ovsdb_idl_condition_add_clause_true(&ce); > ovsdb_idl_condition_add_clause_true(&ip_mcast); > ovsdb_idl_condition_add_clause_true(&igmp); > + ovsdb_idl_condition_add_clause_true(&chprv); > goto out; > } > > @@ -196,6 +198,10 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl, > &chassis->header_.uuid); > sbrec_igmp_group_add_clause_chassis(&igmp, OVSDB_F_EQ, > &chassis->header_.uuid); > + > + /* Monitors Chassis_Private record for current chassis only */ > + sbrec_chassis_private_add_clause_name(&chprv, OVSDB_F_EQ, > + chassis->name); > } > if (local_ifaces) { > const char *name; > @@ -229,6 +235,7 @@ out: > sbrec_controller_event_set_condition(ovnsb_idl, &ce); > sbrec_ip_multicast_set_condition(ovnsb_idl, &ip_mcast); > sbrec_igmp_group_set_condition(ovnsb_idl, &igmp); > + sbrec_chassis_private_set_condition(ovnsb_idl, &chprv); > ovsdb_idl_condition_destroy(&pb); > ovsdb_idl_condition_destroy(&lf); > ovsdb_idl_condition_destroy(&mb); > @@ -237,6 +244,7 @@ out: > ovsdb_idl_condition_destroy(&ce); > ovsdb_idl_condition_destroy(&ip_mcast); > ovsdb_idl_condition_destroy(&igmp); > + ovsdb_idl_condition_destroy(&chprv); > } > > static const char * > @@ -687,6 +695,15 @@ get_transport_zones(const struct > ovsrec_open_vswitch_table *ovs_table) > return smap_get_def(&cfg->external_ids, "ovn-transport-zones", ""); > } > > +static uint16_t > +get_chassis_liveness_interval(const struct sbrec_sb_global_table > + *sb_global_table) > +{ > + const struct sbrec_sb_global *sb > + = sbrec_sb_global_table_first(sb_global_table); > + return atoi(smap_get_def(&sb->options, "chassis_liveness_interval", > "0")); We have smap_get_int in smap.h which does exactly this with a bit more error checking. > +} > + > static void > ctrl_register_ovs_idl(struct ovsdb_idl *ovs_idl) > { > @@ -1760,6 +1777,8 @@ main(int argc, char *argv[]) > > struct ovsdb_idl_index *sbrec_chassis_by_name > = chassis_index_create(ovnsb_idl_loop.idl); > + struct ovsdb_idl_index *sbrec_chassis_private_by_name > + = chassis_private_index_create(ovnsb_idl_loop.idl); > struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath > = mcast_group_index_create(ovnsb_idl_loop.idl); > struct ovsdb_idl_index *sbrec_port_binding_by_name > @@ -1821,6 +1840,10 @@ main(int argc, char *argv[]) > ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_status); > ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_target); > > + /* Do not monitor the Chassis_Private external_ids column */ > + ovsdb_idl_omit(ovnsb_idl_loop.idl, > + &sbrec_chassis_private_col_external_ids); > + > update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false); > > stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS); > @@ -1998,10 +2021,16 @@ main(int argc, char *argv[]) > process_br_int(ovs_idl_txn, bridge_table, ovs_table); > const char *chassis_id = get_ovs_chassis_id(ovs_table); > const struct sbrec_chassis *chassis = NULL; > + const struct sbrec_chassis_private *chassis_private = NULL; > + uint16_t chassis_liveness_interval = > get_chassis_liveness_interval( > + sbrec_sb_global_table_get(ovnsb_idl_loop.idl)); > if (chassis_id) { > chassis = chassis_run(ovnsb_idl_txn, sbrec_chassis_by_name, > + sbrec_chassis_private_by_name, > ovs_table, chassis_table, chassis_id, > - br_int, &transport_zones); > + br_int, &transport_zones, > + &chassis_private, > + chassis_liveness_interval); > } > > if (br_int) { > @@ -2164,6 +2193,10 @@ main(int argc, char *argv[]) > ofctrl_wait(); > pinctrl_wait(ovnsb_idl_txn); > } > + > + if (chassis) { > + chassis_wait(chassis_liveness_interval); > + } > } > > unixctl_server_run(unixctl); > @@ -2232,10 +2265,17 @@ main(int argc, char *argv[]) > ? chassis_lookup_by_name(sbrec_chassis_by_name, > chassis_id) > : NULL); > > + const struct sbrec_chassis_private *chassis_private > + = (chassis_id > + ? chassis_private_lookup_by_name( > + sbrec_chassis_private_by_name, chassis_id) > + : NULL); > + > /* Run all of the cleanup functions, even if one of them returns > * false. We're done if all of them return true. */ > done = binding_cleanup(ovnsb_idl_txn, port_binding_table, > chassis); > - done = chassis_cleanup(ovnsb_idl_txn, chassis) && done; > + done = chassis_cleanup(ovnsb_idl_txn, chassis, > + chassis_private) && done; > done = encaps_cleanup(ovs_idl_txn, br_int) && done; > done = igmp_group_cleanup(ovnsb_idl_txn, sbrec_igmp_group) && > done; > if (done) { > diff --git a/lib/chassis-index.c b/lib/chassis-index.c > index 39066f4cc..d9bbba8e4 100644 > --- a/lib/chassis-index.c > +++ b/lib/chassis-index.c > @@ -40,6 +40,32 @@ chassis_lookup_by_name(struct ovsdb_idl_index > *sbrec_chassis_by_name, > return retval; > } > > +struct ovsdb_idl_index * > +chassis_private_index_create(struct ovsdb_idl *idl) > +{ > + return ovsdb_idl_index_create1( > + idl, &sbrec_chassis_private_col_name); > +} > + > +/* Finds and returns the chassis with the given 'name', or NULL if no such > + * chassis exists. */ > +const struct sbrec_chassis_private * > +chassis_private_lookup_by_name( > + struct ovsdb_idl_index *sbrec_chassis_private_by_name, > + const char *name) > +{ > + struct sbrec_chassis_private *target = \ > + sbrec_chassis_private_index_init_row(sbrec_chassis_private_by_name); > + sbrec_chassis_private_index_set_name(target, name); > + > + struct sbrec_chassis_private *retval = sbrec_chassis_private_index_find( > + sbrec_chassis_private_by_name, target); > + > + sbrec_chassis_private_index_destroy_row(target); > + > + return retval; > +} > + > struct ovsdb_idl_index * > ha_chassis_group_index_create(struct ovsdb_idl *idl) > { > diff --git a/lib/chassis-index.h b/lib/chassis-index.h > index 302e5f0fd..77d2e1f99 100644 > --- a/lib/chassis-index.h > +++ b/lib/chassis-index.h > @@ -23,6 +23,11 @@ struct ovsdb_idl_index *chassis_index_create(struct > ovsdb_idl *); > const struct sbrec_chassis *chassis_lookup_by_name( > struct ovsdb_idl_index *sbrec_chassis_by_name, const char *name); > > +struct ovsdb_idl_index *chassis_private_index_create(struct ovsdb_idl *); > + > +const struct sbrec_chassis_private *chassis_private_lookup_by_name( > + struct ovsdb_idl_index *sbrec_chassis_private_by_name, const char *name); > + > struct ovsdb_idl_index *ha_chassis_group_index_create(struct ovsdb_idl *idl); > const struct sbrec_ha_chassis_group *ha_chassis_group_lookup_by_name( > struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const char *name); > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > index 2580b4ec9..1b16f5ef6 100644 > --- a/northd/ovn-northd.c > +++ b/northd/ovn-northd.c > @@ -11555,6 +11555,10 @@ main(int argc, char *argv[]) > ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name); > ovsdb_idl_add_column(ovnsb_idl_loop.idl, > &sbrec_chassis_col_external_ids); > > + ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis_private); > + add_column_noalert(ovnsb_idl_loop.idl, > + &sbrec_chassis_private_col_external_ids); > + > ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_ha_chassis); > add_column_noalert(ovnsb_idl_loop.idl, > &sbrec_ha_chassis_col_chassis); > diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema > index d89f8dbbb..88da05849 100644 > --- a/ovn-sb.ovsschema > +++ b/ovn-sb.ovsschema > @@ -1,7 +1,7 @@ > { > "name": "OVN_Southbound", > - "version": "2.7.0", > - "cksum": "4286723485 21693", > + "version": "2.8.0", > + "cksum": "293994447 22064", > "tables": { > "SB_Global": { > "columns": { > @@ -43,6 +43,15 @@ > "max": "unlimited"}}}, > "isRoot": true, > "indexes": [["name"]]}, > + "Chassis_Private": { > + "columns": { > + "name": {"type": "string"}, > + "alive_at": {"type": "string"}, > + "external_ids": { > + "type": {"key": "string", "value": "string", > + "min": 0, "max": "unlimited"}}}, > + "isRoot": true, > + "indexes": [["name"]]}, > "Encap": { > "columns": { > "type": {"type": {"key": { > diff --git a/ovn-sb.xml b/ovn-sb.xml > index 3ae9d4f92..8c9822fdb 100644 > --- a/ovn-sb.xml > +++ b/ovn-sb.xml > @@ -366,6 +366,41 @@ > </group> > </table> > > + <table name="Chassis_Private" title="Chassis Private"> > + <p> > + Each row in this table maintains per chassis private data that are > + accessed only by the owning chassis (write only) and ovn-northd, not by > + any other chassis. These data are stored in this separate table > instead > + of the <ref table="Chassis"/> table for performance considerations: > + the rows in this table can be conditionally monitored by chassises so > + that each chassis only get update notifications for its own row, to > avoid > + unnecessary chassis private data update flooding in a large scale > + deployment. (Future: this separation can be avoided if ovsdb > conditional > + monitoring is supported on a set of columns) > + </p> > + > + <column name="name"> > + The name of the chassis that owns these chassis-private data. > + </column> > + > + <column name="alive_at"> > + A timestamp indicating the last time ovn-controller > + signalized it's alive. For monitoring purposes, setting the > + chassis_liveness_interval configuration to the OVN_Northhbound's > + <ref table="NB_Global" column="options"/> with an integer value (in > + seconds) causes the <code>ovn-controller</code> to update this > + column with a timestamp every N seconds. If the value is not set > + or 0, then <code>ovn-controller</code> doesn't update this column. > + </column> > + > + <group title="Common Columns"> > + The overall purpose of these columns is described under <code>Common > + Columns</code> at the beginning of this document. > + > + <column name="external_ids"/> > + </group> > + </table> > + > <table name="Encap" title="Encapsulation Types"> > <p> > The <ref column="encaps" table="Chassis"/> column in the <ref > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
