Acked-By: Venkata Anil <[email protected]> I have reviewed and tested the patch. It looks good to me.
Thanks Anil On Mon, Mar 26, 2018 at 11:09 AM, Anil Venkata <[email protected]> wrote: > > On Sat, Mar 24, 2018 at 7:06 AM, Han Zhou <[email protected]> wrote: > >> Anil, thanks for the review! >> >> On Thu, Mar 22, 2018 at 11:12 PM, Anil Venkata <[email protected]> >> wrote: >> > >> > Is it possible to add a test for this patch i.e testing if hypervisors >> correctly updating >> > Chassis_NB_Cfg's nb_cfg when a new value is assigned to NB_Global's >> nb_cfg? >> > Please see other comments inline >> > >> This mechanism has been used in some test cases (grep "wait=hv"), and >> this change doesn't change the expected behavior, so I didn't feel >> necessary to add new test cases. The impact of this change is basically the >> performance in a large scale environment. I tested it in ovn-scale-test, >> but it seems not feasible to add test cases here. >> >> > I have also manually tested the patch(i.e HV not receiving "update2" call > when other HVs update Chassis_NB_Cfg's nb_cfg). Thanks > > >> > Thanks >> > Anil >> > >> > On Thu, Mar 22, 2018 at 4:57 AM, Han Zhou <[email protected]> wrote: >> >> >> >> nb_cfg as a mechanism to "ping" OVN control plane is very useful >> >> in many ways. However, the current implementation will trigger >> >> update notifications flooding in the whole control plane. Each >> >> HV updates to SB the nb_cfg number and all these updates are >> >> notified to all the other HVs, which is O(n^2). Although updates >> >> are batched in fewers notifications than n^2, it still generates >> >> significant load on SB DB and also on ovn-controllers. >> >> >> >> To solve this problem and make the mechanism more useful in large >> >> scale producation deployment, this patch separates the per HV >> >> nb_cfg data in SB from the Chassis table to a separate table, so >> >> that each HV can conditionally monitor and get updates only for >> >> its own record. >> >> >> >> Test result shows great improvement: >> >> In a test environment with 1K sandbox HVs, and 10K ports created >> >> on 40 lswitches and 8 lrouters, do the sync test when the system >> >> is idle, with command: >> >> >> >> time ovn-nbctl --wait=hv sync >> >> >> >> Original result: >> >> real 4m52.926s >> >> user 0m0.328s >> >> sys 0m0.004s >> >> >> >> With this patch: >> >> real 0m11.405s >> >> user 0m0.316s >> >> sys 0m0.016s >> >> >> >> Signed-off-by: Han Zhou <[email protected]> >> >> --- >> >> >> >> Notes: >> >> v1 -> v2: >> >> - keep the old column in chassis table to avoid breaking upgrading >> >> - avoid the attempt of adding redundant entry during startup >> >> >> >> ovn/controller/chassis.c | 35 ++++++++++++++++++++++++++++++ >> ++++- >> >> ovn/controller/chassis.h | 9 ++++++--- >> >> ovn/controller/ovn-controller.c | 21 ++++++++++++++++----- >> >> ovn/northd/ovn-northd.c | 21 ++++++++++++++++++--- >> >> ovn/ovn-sb.ovsschema | 8 +++++++- >> >> ovn/ovn-sb.xml | 26 ++++++++++++++++++++++---- >> >> 6 files changed, 103 insertions(+), 17 deletions(-) >> >> >> >> diff --git a/ovn/controller/chassis.c b/ovn/controller/chassis.c >> >> index 6b5286a..6092dcc 100644 >> >> --- a/ovn/controller/chassis.c >> >> +++ b/ovn/controller/chassis.c >> >> @@ -72,12 +72,28 @@ get_cms_options(const struct smap *ext_ids) >> >> return smap_get_def(ext_ids, "ovn-cms-options", ""); >> >> } >> >> >> >> +static const struct sbrec_chassis_nb_cfg * >> >> +find_chassis_nb_cfg(struct ovsdb_idl *ovnsb_idl, const char >> *chassis_id) >> >> +{ >> >> + const struct sbrec_chassis_nb_cfg *chassis_nb_cfg_rec; >> >> + >> >> + SBREC_CHASSIS_NB_CFG_FOR_EACH (chassis_nb_cfg_rec, ovnsb_idl) { >> >> + if (!strcmp(chassis_nb_cfg_rec->chassis_name, chassis_id)) { >> >> + break; >> >> + } >> >> + } >> >> + >> >> + return chassis_nb_cfg_rec; >> >> +} >> >> + >> >> /* Returns this chassis's Chassis record, if it is available and is >> currently >> >> * amenable to a transaction. */ >> >> const struct sbrec_chassis * >> >> chassis_run(struct controller_ctx *ctx, const char *chassis_id, >> >> - const struct ovsrec_bridge *br_int) >> >> + const struct ovsrec_bridge *br_int, >> >> + const struct sbrec_chassis_nb_cfg **nb_cfg) >> >> { >> >> + *nb_cfg = NULL; >> >> if (!ctx->ovnsb_idl_txn) { >> >> return NULL; >> >> } >> >> @@ -135,8 +151,17 @@ chassis_run(struct controller_ctx *ctx, const >> char *chassis_id, >> >> ds_chomp(&iface_types, ','); >> >> const char *iface_types_str = ds_cstr(&iface_types); >> >> >> >> + const struct sbrec_chassis_nb_cfg *chassis_nb_cfg_rec >> >> + = find_chassis_nb_cfg(ctx->ovnsb_idl, chassis_id); >> >> + if (!chassis_nb_cfg_rec) { >> >> + chassis_nb_cfg_rec = sbrec_chassis_nb_cfg_insert(ct >> x->ovnsb_idl_txn); >> >> + sbrec_chassis_nb_cfg_set_chassis_name(chassis_nb_cfg_rec, >> chassis_id); >> >> + } >> >> + *nb_cfg = chassis_nb_cfg_rec; >> >> + >> > >> > >> > Can you please create this record after creation of chassis record i.e >> > move "sbrec_chassis_nb_cfg_insert" after >> > https://github.com/openvswitch/ovs/blob/master/ovn/ >> controller/chassis.c#L223 >> > >> The purpose is to make sure it is created, so the order seems doesn't >> matter. In fact they will be added in the same ovsdb transaction as a >> result. >> > > Ok > > >> >> >> const struct sbrec_chassis *chassis_rec >> >> = get_chassis(ctx->ovnsb_idl, chassis_id); >> >> + >> >> const char *encap_csum = smap_get_def(&cfg->external_ids, >> >> "ovn-encap-csum", "true"); >> >> if (chassis_rec) { >> >> @@ -256,6 +281,14 @@ chassis_cleanup(struct controller_ctx *ctx, >> >> "ovn-controller: unregistering >> chassis '%s'", >> >> chassis_rec->name); >> >> sbrec_chassis_delete(chassis_rec); >> >> + const struct sbrec_chassis_nb_cfg *chassis_nb_cfg_rec >> >> + = find_chassis_nb_cfg(ctx->ovnsb_idl, chassis_rec->name); >> >> + if (chassis_nb_cfg_rec) { >> >> + sbrec_chassis_nb_cfg_delete(chassis_nb_cfg_rec); >> >> + } else { >> >> + VLOG_WARN("Chassis_NB_Cfg record didn't exist for chassis >> '%s'", >> >> + chassis_rec->name); >> >> + } >> >> } >> >> return false; >> >> } >> >> diff --git a/ovn/controller/chassis.h b/ovn/controller/chassis.h >> >> index 2cc47fc..5ecf7f4 100644 >> >> --- a/ovn/controller/chassis.h >> >> +++ b/ovn/controller/chassis.h >> >> @@ -17,6 +17,7 @@ >> >> #define OVN_CHASSIS_H 1 >> >> >> >> #include <stdbool.h> >> >> +#include "ovn/lib/ovn-sb-idl.h" >> >> >> >> struct controller_ctx; >> >> struct ovsdb_idl; >> >> @@ -24,9 +25,11 @@ struct ovsrec_bridge; >> >> struct sbrec_chassis; >> >> >> >> void chassis_register_ovs_idl(struct ovsdb_idl *); >> >> -const struct sbrec_chassis *chassis_run(struct controller_ctx *, >> >> - const char *chassis_id, >> >> - const struct ovsrec_bridge >> *br_int); >> >> +const struct sbrec_chassis *chassis_run( >> >> + struct controller_ctx *, >> >> + const char *chassis_id, >> >> + const struct ovsrec_bridge *br_int, >> >> + const struct sbrec_chassis_nb_cfg **nb_cfg); >> >> bool chassis_cleanup(struct controller_ctx *, const struct >> sbrec_chassis *); >> >> >> >> #endif /* ovn/chassis.h */ >> >> diff --git a/ovn/controller/ovn-controller.c >> b/ovn/controller/ovn-controller.c >> >> index 7592bda..b2eb03b 100644 >> >> --- a/ovn/controller/ovn-controller.c >> >> +++ b/ovn/controller/ovn-controller.c >> >> @@ -150,6 +150,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl, >> >> struct ovsdb_idl_condition mb = OVSDB_IDL_CONDITION_INIT(&mb); >> >> struct ovsdb_idl_condition mg = OVSDB_IDL_CONDITION_INIT(&mg); >> >> struct ovsdb_idl_condition dns = OVSDB_IDL_CONDITION_INIT(&dns); >> >> + struct ovsdb_idl_condition nbcfg = OVSDB_IDL_CONDITION_INIT(&nbcf >> g); >> >> sbrec_port_binding_add_clause_type(&pb, OVSDB_F_EQ, "patch"); >> >> /* XXX: We can optimize this, if we find a way to only monitor >> >> * ports that have a Gateway_Chassis that point's to our own >> >> @@ -172,6 +173,12 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl, >> >> sbrec_port_binding_add_clause_options(&pb, OVSDB_F_INCLUDES, >> &l2); >> >> const struct smap l3 = SMAP_CONST1(&l3, "l3gateway-chassis", >> id); >> >> sbrec_port_binding_add_clause_options(&pb, OVSDB_F_INCLUDES, >> &l3); >> >> + >> >> + /* Monitors Chassis_NB_Cfg record for current chassis only */ >> >> + sbrec_chassis_nb_cfg_add_clause_chassis_name(&nbcfg, >> OVSDB_F_EQ, >> >> + chassis->name); >> >> + } else { >> >> + ovsdb_idl_condition_add_clause_true(&nbcfg); >> >> } >> >> if (local_ifaces) { >> >> const char *name; >> >> @@ -198,11 +205,13 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl, >> >> sbrec_mac_binding_set_condition(ovnsb_idl, &mb); >> >> sbrec_multicast_group_set_condition(ovnsb_idl, &mg); >> >> sbrec_dns_set_condition(ovnsb_idl, &dns); >> >> + sbrec_chassis_nb_cfg_set_condition(ovnsb_idl, &nbcfg); >> >> ovsdb_idl_condition_destroy(&pb); >> >> ovsdb_idl_condition_destroy(&lf); >> >> ovsdb_idl_condition_destroy(&mb); >> >> ovsdb_idl_condition_destroy(&mg); >> >> ovsdb_idl_condition_destroy(&dns); >> >> + ovsdb_idl_condition_destroy(&nbcfg); >> >> } >> >> >> >> static const struct ovsrec_bridge * >> >> @@ -621,7 +630,7 @@ main(int argc, char *argv[]) >> >> create_ovnsb_indexes(ovnsb_idl_loop.idl); >> >> lport_init(ovnsb_idl_loop.idl); >> >> >> >> - ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, >> &sbrec_chassis_col_nb_cfg); >> >> + ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, >> &sbrec_chassis_nb_cfg_col_nb_cfg); >> >> update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL); >> >> ovsdb_idl_get_initial_snapshot(ovnsb_idl_loop.idl); >> >> >> >> @@ -685,8 +694,9 @@ main(int argc, char *argv[]) >> >> chassis_index_init(&chassis_index, ctx.ovnsb_idl); >> >> >> >> const struct sbrec_chassis *chassis = NULL; >> >> + const struct sbrec_chassis_nb_cfg *chassis_nb_cfg = NULL; >> >> if (chassis_id) { >> >> - chassis = chassis_run(&ctx, chassis_id, br_int); >> >> + chassis = chassis_run(&ctx, chassis_id, br_int, >> &chassis_nb_cfg); >> >> encaps_run(&ctx, br_int, chassis_id); >> >> bfd_calculate_active_tunnels(br_int, &active_tunnels); >> >> binding_run(&ctx, br_int, chassis, >> >> @@ -730,10 +740,11 @@ main(int argc, char *argv[]) >> >> >> >> hmap_destroy(&flow_table); >> >> } >> >> - if (ctx.ovnsb_idl_txn) { >> >> + if (ctx.ovnsb_idl_txn && chassis_nb_cfg) { >> >> int64_t cur_cfg = ofctrl_get_cur_cfg(); >> >> - if (cur_cfg && cur_cfg != chassis->nb_cfg) { >> >> - sbrec_chassis_set_nb_cfg(chassis, cur_cfg); >> >> + if (cur_cfg && cur_cfg != chassis_nb_cfg->nb_cfg) >> { >> >> + sbrec_chassis_nb_cfg_set_nb_c >> fg(chassis_nb_cfg, >> >> + cur_cfg); >> >> } >> >> } >> >> } >> >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c >> >> index 3963810..9f4e0bc 100644 >> >> --- a/ovn/northd/ovn-northd.c >> >> +++ b/ovn/northd/ovn-northd.c >> >> @@ -6483,6 +6483,11 @@ static const char *rbac_chassis_auth[] = >> >> static const char *rbac_chassis_update[] = >> >> {"nb_cfg", "external_ids", "encaps", "vtep_logical_switches"}; >> >> >> >> +static const char *rbac_chassis_nb_cfg_auth[] = >> >> + {"chassis_name"}; >> >> +static const char *rbac_chassis_nb_cfg_update[] = >> >> + {"nb_cfg"}; >> >> + >> >> static const char *rbac_encap_auth[] = >> >> {"chassis_name"}; >> >> static const char *rbac_encap_update[] = >> >> @@ -6516,6 +6521,14 @@ static struct rbac_perm_cfg { >> >> .n_update = ARRAY_SIZE(rbac_chassis_update), >> >> .row = NULL >> >> },{ >> >> + .table = "Chassis_NB_Cfg", >> >> + .auth = rbac_chassis_nb_cfg_auth, >> >> + .n_auth = ARRAY_SIZE(rbac_chassis_nb_cfg_auth), >> >> + .insdel = true, >> >> + .update = rbac_chassis_nb_cfg_update, >> >> + .n_update = ARRAY_SIZE(rbac_chassis_nb_cfg_update), >> >> + .row = NULL >> >> + },{ >> >> .table = "Encap", >> >> .auth = rbac_encap_auth, >> >> .n_auth = ARRAY_SIZE(rbac_encap_auth), >> >> @@ -6676,9 +6689,9 @@ update_northbound_cfg(struct northd_context *ctx, >> >> /* Update northbound hv_cfg if appropriate. */ >> >> if (nbg) { >> >> /* Find minimum nb_cfg among all chassis. */ >> >> - const struct sbrec_chassis *chassis; >> >> + const struct sbrec_chassis_nb_cfg *chassis; >> >> int64_t hv_cfg = nbg->nb_cfg; >> >> - SBREC_CHASSIS_FOR_EACH (chassis, ctx->ovnsb_idl) { >> >> + SBREC_CHASSIS_NB_CFG_FOR_EACH (chassis, ctx->ovnsb_idl) { >> >> if (chassis->nb_cfg < hv_cfg) { >> >> hv_cfg = chassis->nb_cfg; >> >> } >> >> @@ -6911,9 +6924,11 @@ main(int argc, char *argv[]) >> >> add_column_noalert(ovnsb_idl_loop.idl, >> &sbrec_rbac_permission_col_update); >> >> >> >> ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis); >> >> - ovsdb_idl_add_column(ovnsb_idl_loop.idl, >> &sbrec_chassis_col_nb_cfg); >> >> ovsdb_idl_add_column(ovnsb_idl_loop.idl, >> &sbrec_chassis_col_name); >> >> >> >> + ovsdb_idl_add_table(ovnsb_idl_loop.idl, >> &sbrec_table_chassis_nb_cfg); >> >> + ovsdb_idl_add_column(ovnsb_idl_loop.idl, >> &sbrec_chassis_nb_cfg_col_nb_cfg); >> >> + >> >> /* Ensure that only a single ovn-northd is active in the >> deployment by >> >> * acquiring a lock called "ovn_northd" on the southbound database >> >> * and then only performing DB transactions if the lock is held. >> */ >> >> diff --git a/ovn/ovn-sb.ovsschema b/ovn/ovn-sb.ovsschema >> >> index 2abcc6b..39be87d 100644 >> >> --- a/ovn/ovn-sb.ovsschema >> >> +++ b/ovn/ovn-sb.ovsschema >> >> @@ -1,7 +1,7 @@ >> >> { >> >> "name": "OVN_Southbound", >> >> "version": "1.15.0", >> >> - "cksum": "70426956 13327", >> >> + "cksum": "1669322676 13561", >> >> "tables": { >> >> "SB_Global": { >> >> "columns": { >> >> @@ -36,6 +36,12 @@ >> >> "min": 0, "max": "unlimited"}}}, >> >> "isRoot": true, >> >> "indexes": [["name"]]}, >> >> + "Chassis_NB_Cfg": { >> >> + "columns": { >> >> + "chassis_name": {"type": "string"}, >> > >> > >> > Can we have chassis_id instead of chassis_name and add a reference to >> chassis table, >> > like how Gateway_Chassis table is referencing Chassis table >> > https://github.com/openvswitch/ovs/blob/master/ovn/ovn-sb. >> ovsschema#L258 >> > >> >> Good point. In fact I was doing the same way when firstly trying to >> implement this, but I figured out then for the RBAC to work (so that the >> owner HV can update its own row) we need an ID that client can tell, so I >> followed the same practics as how the Encap table is doing. >> > > Agree > > >> > >> > >> >> + "nb_cfg": {"type": {"key": "integer"}}}, >> >> + "isRoot": true, >> >> + "indexes": [["chassis_name"]]}, >> >> "Encap": { >> >> "columns": { >> >> "type": {"type": {"key": { >> >> diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml >> >> index 6a8b818..40e94b5 100644 >> >> --- a/ovn/ovn-sb.xml >> >> +++ b/ovn/ovn-sb.xml >> >> @@ -212,10 +212,8 @@ >> >> </column> >> >> >> >> <column name="nb_cfg"> >> >> - Sequence number for the configuration. When >> <code>ovn-controller</code> >> >> - updates the configuration of a chassis from the contents of the >> >> - southbound database, it copies <ref table="SB_Global" >> column="nb_cfg"/> >> >> - from the <ref table="SB_Global"/> table into this column. >> >> + Deprecated. This column is replaced by the <ref >> table="Chassis_NB_Cfg" >> >> + column="nb_cfg"/> column of the <ref table="Chassis_NB_Cfg"/> >> table. >> >> </column> >> >> >> >> <column name="external_ids" key="ovn-bridge-mappings"> >> >> @@ -291,6 +289,26 @@ >> >> </group> >> >> </table> >> >> >> >> + <table name="Chassis_NB_Cfg" title="Chassis NB Config"> >> >> + <p> >> >> + Each row in this table maintains the nb_cfg number that the >> corresponding >> >> + chassis with <ref table="Chassis" column="name"/> in <ref >> table="Chassis"/> >> >> + has processed. This information is stored in this separate >> table for >> >> + performance considerations. >> >> + </p> >> >> + >> >> + <column name="nb_cfg"> >> >> + Sequence number for the configuration. When >> <code>ovn-controller</code> >> >> + updates the configuration of a chassis from the contents of the >> >> + southbound database, it copies <ref table="SB_Global" >> column="nb_cfg"/> >> >> + from the <ref table="SB_Global"/> table into this column. >> >> + </column> >> >> + >> >> + <column name="chassis_name"> >> >> + The name of the chassis that created this encap. >> >> + </column> >> >> + </table> >> >> + >> >> <table name="Encap" title="Encapsulation Types"> >> >> <p> >> >> The <ref column="encaps" table="Chassis"/> column in the <ref >> >> -- >> >> 2.1.0 >> >> >> >> _______________________________________________ >> >> dev mailing list >> >> [email protected] >> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > >> > >> > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
