On Wed, Jun 28, 2023 at 12:07 PM Dumitru Ceara <[email protected]> wrote: > > On 6/28/23 19:28, Han Zhou wrote: > > On Wed, Jun 28, 2023 at 2:14 AM Dumitru Ceara <[email protected]> wrote: > >> > >> On 6/27/23 19:47, Han Zhou wrote: > >>> On Tue, Jun 27, 2023 at 6:51 AM Dumitru Ceara <[email protected]> wrote: > >>>> > >>>> On 6/8/23 07:55, Han Zhou wrote: > >>>>> On Wed, Jun 7, 2023 at 12:39 PM Numan Siddique <[email protected]> wrote: > >>>>>> > >>>>>> On Fri, Jun 2, 2023 at 12:13 AM Han Zhou <[email protected]> wrote: > >>>>>>> > >>>>>>> This patch introduces a change handler for NB logical_switch within > >>> the > >>>>>>> 'northd' node. It specifically handles cases where logical switch > >>> ports > >>>>>>> in the tracked logical switches are changed (added/updated/deleted). > >>>>>>> Only regular logical switch ports - which are common for VMs/Pods - > >>> are > >>>>>>> addressed. For other scenarios, it reverts to recompute. > >>>>>>> > >>>>>>> This update avoids recompute in the northd node (especially the > >>>>>>> resource-intensive build_ports()) for NB changes of VIF > >>>>>>> add/update/delete. However, it does not eliminate the need for > > lflow > >>>>>>> recompute. > >>>>>>> > >>>>>>> Below are the performance test results simulating an ovn-k8s > > topology > >>> of > >>>>>>> 500 nodes x 50 lsp per node: > >>>>>>> > >>>>>>> Before: > >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-del lsp_1_0001_01 > >>>>>>> ovn-northd completion: 955ms > >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 > > lsp_1_0001_01 > >>>>> -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- > >>>>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" > >>>>>>> ovn-northd completion: 919ms > >>>>>>> > >>>>>>> After: > >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-del lsp_1_0001_01 > >>>>>>> ovn-northd completion: 776ms > >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 > > lsp_1_0001_01 > >>>>> -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- > >>>>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" > >>>>>>> ovn-northd completion: 773ms > >>>>>>> > >>>>>>> Both addition and deletion show ~20% reduction of ovn-northd > >>> completion > >>>>>>> time. Note: the test uses only 1 thread of ovn-northd for flow > >>>>>>> recompute. Using multithread should show a larger percentage of > >>>>>>> improvement. > >>>>>>> > >>>>>>> Signed-off-by: Han Zhou <[email protected]> > >>>>>>> Reviewed-by: Ales Musil <[email protected]> > >>>>>> > >>>>>> Few nits below. > >>>>>> > >>>>>> Acked-by: Numan Siddique <[email protected]> > >>>>>> > >>>>>> Numan > >>>>>> > >>>> > >>>> Hi Han, Numan, > >>>> > >>>> First of all, sorry for the late review, I had these patches on my list > >>>> for a while and only got to them after they were merged. > >>>> > >>> > >>> Hi Dumitru, thanks for your diligent review and comments are always > > welcome > >>> any time. > >>> > >>>>>> > >>>>>>> --- > >>>>>>> lib/ovn-util.c | 15 ++ > >>>>>>> lib/ovn-util.h | 1 + > >>>>>>> northd/en-northd.c | 130 ++++++--- > >>>>>>> northd/en-northd.h | 2 + > >>>>>>> northd/inc-proc-northd.c | 12 +- > >>>>>>> northd/northd.c | 567 > >>> +++++++++++++++++++++++++++++++++------ > >>>>>>> northd/northd.h | 27 +- > >>>>>>> tests/ovn-northd.at | 58 ++++ > >>>>>>> 8 files changed, 685 insertions(+), 127 deletions(-) > >>>>>>> > >>>> > >>>> [...] > >>>> > >>>>>>> +bool > >>>>>>> +northd_handle_ls_changes(struct ovsdb_idl_txn *ovnsb_idl_txn, > >>>>>>> + const struct northd_input *ni, > >>>>>>> + struct northd_data *nd) > >>>>>>> +{ > >>>>>>> + const struct nbrec_logical_switch *changed_ls; > >>>>>>> + struct ls_change *ls_change = NULL; > >>>>>>> + > >>>>>>> + NBREC_LOGICAL_SWITCH_TABLE_FOR_EACH_TRACKED (changed_ls, > >>>>>>> + > >>>>> ni->nbrec_logical_switch_table) { > >>>>>>> + ls_change = NULL; > >>>>>>> + if (nbrec_logical_switch_is_new(changed_ls) || > >>>>>>> + nbrec_logical_switch_is_deleted(changed_ls)) { > >>>>>>> + goto fail; > >>>>>>> + } > >>>>>>> + struct ovn_datapath *od = ovn_datapath_find( > >>>>>>> + &nd->ls_datapaths.datapaths, > >>>>>>> + &changed_ls->header_.uuid); > >>>>>>> + if (!od) { > >>>>>>> + static struct vlog_rate_limit rl = > >>> VLOG_RATE_LIMIT_INIT(1, > >>>>> 1); > >>>>>>> + VLOG_WARN_RL(&rl, "Internal error: a tracked updated LS > >>>>> doesn't " > >>>>>>> + "exist in ls_datapaths: "UUID_FMT, > >>>>>>> + UUID_ARGS(&changed_ls->header_.uuid)); > >>>>>>> + goto fail; > >>>>>>> + } > >>>>>>> + > >>>>>>> + /* Now only able to handle lsp changes. */ > >>>>>>> + if (check_ls_changes_other_than_lsp(changed_ls)) { > >>>>>>> + goto fail; > >>>>>>> + } > >>>>>>> + > >>>>>>> + ls_change = xzalloc(sizeof *ls_change); > >>>>>>> + ls_change->od = od; > >>>>>>> + ovs_list_init(&ls_change->added_ports); > >>>>>>> + ovs_list_init(&ls_change->deleted_ports); > >>>>>>> + ovs_list_init(&ls_change->updated_ports); > >>>>>>> + > >>>>>>> + struct ovn_port *op; > >>>>>>> + HMAP_FOR_EACH (op, dp_node, &od->ports) { > >>>>>>> + op->visited = false; > >>>>>>> + } > >>>>>>> + > >>>>>>> + /* Compare the individual ports in the old and new Logical > >>>>> Switches */ > >>>>>>> + for (size_t j = 0; j < changed_ls->n_ports; ++j) { > >>>>>>> + struct nbrec_logical_switch_port *new_nbsp = > >>>>> changed_ls->ports[j]; > >>>>>>> + op = ovn_port_find_in_datapath(od, new_nbsp->name); > >>>>>>> + > >>>>>>> + if (!op) { > >>>>>>> + if (!lsp_can_be_inc_processed(new_nbsp)) { > >>>>>>> + goto fail; > >>>>>>> + } > >>>>>>> + op = ls_port_create(ovnsb_idl_txn, &nd->ls_ports, > >>>>>>> + new_nbsp->name, new_nbsp, od, > >>> NULL, > >>>>>>> + ni->sbrec_mirror_table, > >>>>>>> + ni->sbrec_chassis_table, > >>>>>>> + ni->sbrec_chassis_by_name, > >>>>>>> + ni->sbrec_chassis_by_hostname); > >>>>>>> + if (!op) { > >>>>>>> + goto fail; > >>>>>>> + } > >>>>>>> + ovs_list_push_back(&ls_change->added_ports, > >>>>>>> + &op->list); > >>>>>>> + } else if (ls_port_has_changed(op->nbsp, new_nbsp)) { > >>>>>>> + /* Existing port updated */ > >>>>>>> + bool temp = false; > >>>>>>> + if (lsp_is_type_changed(op->sb, new_nbsp, &temp) || > >>>>>>> + !op->lsp_can_be_inc_processed || > >>>>>>> + !lsp_can_be_inc_processed(new_nbsp)) { > >>>>>>> + goto fail; > >>>>>>> + } > >>>>>>> + const struct sbrec_port_binding *sb = op->sb; > >>>>>>> + if (sset_contains(&nd->svc_monitor_lsps, > >>>>> new_nbsp->name)) { > >>>>>>> + /* This port is used for svc monitor, which may > >>> be > >>>>> impacted > >>>>>>> + * by this change. Fallback to recompute. */ > >>>>>>> + goto fail; > >>>>>>> + } > >>>>>>> + ovn_port_destroy(&nd->ls_ports, op); > >>>>>>> + op = ls_port_create(ovnsb_idl_txn, &nd->ls_ports, > >>>>>>> + new_nbsp->name, new_nbsp, od, > > sb, > >>>>>>> + ni->sbrec_mirror_table, > >>>>>>> + ni->sbrec_chassis_table, > >>>>>>> + ni->sbrec_chassis_by_name, > >>>>>>> + ni->sbrec_chassis_by_hostname); > >>>>>>> + if (!op) { > >>>>>>> + goto fail; > >>>>>>> + } > >>>>>>> + ovs_list_push_back(&ls_change->updated_ports, > >>>>> &op->list); > >>>>>>> + } > >>>>>>> + op->visited = true; > >>>>>>> + } > >>>>>>> + > >>>>>>> + /* Check for deleted ports */ > >>>>>>> + HMAP_FOR_EACH_SAFE (op, dp_node, &od->ports) { > >>>>>>> + if (!op->visited) { > >>>>>>> + if (!op->lsp_can_be_inc_processed) { > >>>>>>> + goto fail; > >>>>>>> + } > >>>>>>> + if (sset_contains(&nd->svc_monitor_lsps, op->key)) > > { > >>>>>>> + /* This port was used for svc monitor, which > > may > >>> be > >>>>>>> + * impacted by this deletion. Fallback to > >>>>> recompute. */ > >>>>>>> + goto fail; > >>>>>>> + } > >>>>>>> + ovs_list_push_back(&ls_change->deleted_ports, > >>>>>>> + &op->list); > >>>>>>> + hmap_remove(&nd->ls_ports, &op->key_node); > >>>>>>> + hmap_remove(&od->ports, &op->dp_node); > >>>>>>> + sbrec_port_binding_delete(op->sb); > >>>> > >>>> Quoting from ovsdb-idl.c: > >>>> > >>>> /* Deletes 'row_' from its table. May free 'row_', so it must not be > >>>> * accessed afterward. > >>>> > >>>> So, in theory, we shouldn't call delete here because we might end up > >>>> failing for other ports/datapath and falling back to recompute. > >>>> > >>>> As far as I can tell, only newly inserted rows may be deleted by the > >>>> current IDL implementation but I don't think it's safe to assume this > >>>> will never change in the future. > >>> > >>> In general, when falling back to recompute in the middle of I-P, the > >>> recompute function will do a full sync between desired states and > > existing > >>> states. Now the existing states are partially synced during the I-P > > before > >>> recompute, the recompute function will just see the synced part as > > "already > >>> in sync" and only apply for the rest of the changes. I think this > > mechanism > >>> applies for add/update/delete the same way. Here for delete, if the > > record > >>> is deleted from IDL, it won't be found by the recompute function, so > > there > >>> is no way to access it again. Did I miss something? > >>> > >> > >> You're right, the ports we delete we first remove from the > >> datapath->ports hmap. On the other hand, I think I spotted another > >> leak. > >> > >> Deleted ports get pushed back to &ls_change->deleted_ports: > >> > >> /* Check for deleted ports */ > >> HMAP_FOR_EACH_SAFE (op, dp_node, &od->ports) { > >> if (!op->visited) { > >> if (!op->lsp_can_be_inc_processed) { > >> goto fail; > >> } > >> if (sset_contains(&nd->svc_monitor_lsps, op->key)) { > >> /* This port was used for svc monitor, which may be > >> * impacted by this deletion. Fallback to recompute. > > */ > >> goto fail; > >> } > >> ovs_list_push_back(&ls_change->deleted_ports, > >> &op->list); > >> hmap_remove(&nd->ls_ports, &op->key_node); > >> hmap_remove(&od->ports, &op->dp_node); > >> sbrec_port_binding_delete(op->sb); > >> delete_fdb_entry(ni->sbrec_fdb_by_dp_and_port, > > od->tunnel_key, > >> op->tunnel_key); > >> } > >> } > >> > >> [...] > >> > >> fail: > >> free(ls_change); > >> > >> If we deleted some ports and then got out of the loop early then here we > >> leak the ports stored in ls_change->deleted_ports. Or am I missing > >> something? > > > > The next line under fail:, after free(ls_change), is: > > destroy_northd_data_tracked_changes(nd); > > > > The deleted ports will be destroyed here. Does this address your concern? > > > > Not really, let me highlight the code I mean: > > https://github.com/ovn-org/ovn/blob/056f66e1db48c6d098ce3607ff596c1e3ce28dbb/northd/northd.c#L5132-L5171 > > Now, if there are a couple of ports being deleted at the same time and > the last one of them is used for svc monitor, then we remove these ports > from the od's "ports" map and from "nd->ls_ports" and we add them to the > "ls_change->deleted_ports" list (lines L5143-L5146). These ports are > not referenced by anything else anymore except for > "ls_change->deleted_ports". Also, they're not yet added to the northd > data "nd". For the last one we break early because it's used for svc > monitoring. > > We end up here: > > fail: > free(ls_change); > destroy_northd_data_tracked_changes(nd); > return false; > > The ports that were in "ls_change->deleted_ports" were not completely > destroyed and are not reachable anymore so they're leaked. Am I maybe > missing a different cleanup path? > You are right!! Sorry for missing your point earlier. I will come up with a fix and a test case to make sure this scenario is covered. Thanks for pointing this out! Han
> Regards, > Dumitru > > > Thanks, > > Han > > > >> > >>>> > >>>> We should probably accumulate all records that need to be deleted and > >>>> delete them at the end of the whole I-P run. Do we need some I-P infra > >>>> changes? Maybe users can provide a callback that can be called > > whenever > >>>> an I-P run has finished? > >>>> > >>> I thought about this approach, but not for the reason above. I would do > > the > >>> same for all DB changes to avoid directly updating DBs in the I-P > > engine. > >>> However, the change is going to be quite big, considering how the > > current > >>> code base depends on the IDL. Changes to IDL are all over the place. In > >>> addition to the potential memory and CPU cost to maintain the delta, > > there > >>> may be other tricky things to handle, such as: > >>> - Sometimes it is problematic to see the old data (instead of the > > updated > >>> ones) in the IDL in the same round of I-P. (this has happened in > >>> ovn-controller due to out-of-date IDL index, which was fixed > > accordingly) > >>> - Merge accumulated changes to the same DB record. > >>> > >> > >> This is inline with an older proposal Mark had about abstracting out > >> the IDL, I think. I agree it's a huge thing to do. On the other hand, > >> the current IDL interactions that sometimes happen during I-P handlers > >> and sometimes during recomputes worry me a bit too. It's hard to ensure > >> that we don't introduce tricky bugs. > >> > >> Regards, > >> Dumitru > >> > >>> So, I think it is better to avoid such a change unless it is really > >>> necessary. > >>> > >>> Thanks, > >>> Han > >>> > >>>> What do you think? > >>>> > >>>> Thanks, > >>>> Dumitru > >>>> > >>> > >> > > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
