On 6/28/23 19:28, Han Zhou wrote: > On Wed, Jun 28, 2023 at 2:14 AM Dumitru Ceara <[email protected]> wrote: >> >> On 6/27/23 19:47, Han Zhou wrote: >>> On Tue, Jun 27, 2023 at 6:51 AM Dumitru Ceara <[email protected]> wrote: >>>> >>>> On 6/8/23 07:55, Han Zhou wrote: >>>>> On Wed, Jun 7, 2023 at 12:39 PM Numan Siddique <[email protected]> wrote: >>>>>> >>>>>> On Fri, Jun 2, 2023 at 12:13 AM Han Zhou <[email protected]> wrote: >>>>>>> >>>>>>> This patch introduces a change handler for NB logical_switch within >>> the >>>>>>> 'northd' node. It specifically handles cases where logical switch >>> ports >>>>>>> in the tracked logical switches are changed (added/updated/deleted). >>>>>>> Only regular logical switch ports - which are common for VMs/Pods - >>> are >>>>>>> addressed. For other scenarios, it reverts to recompute. >>>>>>> >>>>>>> This update avoids recompute in the northd node (especially the >>>>>>> resource-intensive build_ports()) for NB changes of VIF >>>>>>> add/update/delete. However, it does not eliminate the need for > lflow >>>>>>> recompute. >>>>>>> >>>>>>> Below are the performance test results simulating an ovn-k8s > topology >>> of >>>>>>> 500 nodes x 50 lsp per node: >>>>>>> >>>>>>> Before: >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-del lsp_1_0001_01 >>>>>>> ovn-northd completion: 955ms >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 > lsp_1_0001_01 >>>>> -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- >>>>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" >>>>>>> ovn-northd completion: 919ms >>>>>>> >>>>>>> After: >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-del lsp_1_0001_01 >>>>>>> ovn-northd completion: 776ms >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 > lsp_1_0001_01 >>>>> -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- >>>>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" >>>>>>> ovn-northd completion: 773ms >>>>>>> >>>>>>> Both addition and deletion show ~20% reduction of ovn-northd >>> completion >>>>>>> time. Note: the test uses only 1 thread of ovn-northd for flow >>>>>>> recompute. Using multithread should show a larger percentage of >>>>>>> improvement. >>>>>>> >>>>>>> Signed-off-by: Han Zhou <[email protected]> >>>>>>> Reviewed-by: Ales Musil <[email protected]> >>>>>> >>>>>> Few nits below. >>>>>> >>>>>> Acked-by: Numan Siddique <[email protected]> >>>>>> >>>>>> Numan >>>>>> >>>> >>>> Hi Han, Numan, >>>> >>>> First of all, sorry for the late review, I had these patches on my list >>>> for a while and only got to them after they were merged. >>>> >>> >>> Hi Dumitru, thanks for your diligent review and comments are always > welcome >>> any time. >>> >>>>>> >>>>>>> --- >>>>>>> lib/ovn-util.c | 15 ++ >>>>>>> lib/ovn-util.h | 1 + >>>>>>> northd/en-northd.c | 130 ++++++--- >>>>>>> northd/en-northd.h | 2 + >>>>>>> northd/inc-proc-northd.c | 12 +- >>>>>>> northd/northd.c | 567 >>> +++++++++++++++++++++++++++++++++------ >>>>>>> northd/northd.h | 27 +- >>>>>>> tests/ovn-northd.at | 58 ++++ >>>>>>> 8 files changed, 685 insertions(+), 127 deletions(-) >>>>>>> >>>> >>>> [...] >>>> >>>>>>> +bool >>>>>>> +northd_handle_ls_changes(struct ovsdb_idl_txn *ovnsb_idl_txn, >>>>>>> + const struct northd_input *ni, >>>>>>> + struct northd_data *nd) >>>>>>> +{ >>>>>>> + const struct nbrec_logical_switch *changed_ls; >>>>>>> + struct ls_change *ls_change = NULL; >>>>>>> + >>>>>>> + NBREC_LOGICAL_SWITCH_TABLE_FOR_EACH_TRACKED (changed_ls, >>>>>>> + >>>>> ni->nbrec_logical_switch_table) { >>>>>>> + ls_change = NULL; >>>>>>> + if (nbrec_logical_switch_is_new(changed_ls) || >>>>>>> + nbrec_logical_switch_is_deleted(changed_ls)) { >>>>>>> + goto fail; >>>>>>> + } >>>>>>> + struct ovn_datapath *od = ovn_datapath_find( >>>>>>> + &nd->ls_datapaths.datapaths, >>>>>>> + &changed_ls->header_.uuid); >>>>>>> + if (!od) { >>>>>>> + static struct vlog_rate_limit rl = >>> VLOG_RATE_LIMIT_INIT(1, >>>>> 1); >>>>>>> + VLOG_WARN_RL(&rl, "Internal error: a tracked updated LS >>>>> doesn't " >>>>>>> + "exist in ls_datapaths: "UUID_FMT, >>>>>>> + UUID_ARGS(&changed_ls->header_.uuid)); >>>>>>> + goto fail; >>>>>>> + } >>>>>>> + >>>>>>> + /* Now only able to handle lsp changes. */ >>>>>>> + if (check_ls_changes_other_than_lsp(changed_ls)) { >>>>>>> + goto fail; >>>>>>> + } >>>>>>> + >>>>>>> + ls_change = xzalloc(sizeof *ls_change); >>>>>>> + ls_change->od = od; >>>>>>> + ovs_list_init(&ls_change->added_ports); >>>>>>> + ovs_list_init(&ls_change->deleted_ports); >>>>>>> + ovs_list_init(&ls_change->updated_ports); >>>>>>> + >>>>>>> + struct ovn_port *op; >>>>>>> + HMAP_FOR_EACH (op, dp_node, &od->ports) { >>>>>>> + op->visited = false; >>>>>>> + } >>>>>>> + >>>>>>> + /* Compare the individual ports in the old and new Logical >>>>> Switches */ >>>>>>> + for (size_t j = 0; j < changed_ls->n_ports; ++j) { >>>>>>> + struct nbrec_logical_switch_port *new_nbsp = >>>>> changed_ls->ports[j]; >>>>>>> + op = ovn_port_find_in_datapath(od, new_nbsp->name); >>>>>>> + >>>>>>> + if (!op) { >>>>>>> + if (!lsp_can_be_inc_processed(new_nbsp)) { >>>>>>> + goto fail; >>>>>>> + } >>>>>>> + op = ls_port_create(ovnsb_idl_txn, &nd->ls_ports, >>>>>>> + new_nbsp->name, new_nbsp, od, >>> NULL, >>>>>>> + ni->sbrec_mirror_table, >>>>>>> + ni->sbrec_chassis_table, >>>>>>> + ni->sbrec_chassis_by_name, >>>>>>> + ni->sbrec_chassis_by_hostname); >>>>>>> + if (!op) { >>>>>>> + goto fail; >>>>>>> + } >>>>>>> + ovs_list_push_back(&ls_change->added_ports, >>>>>>> + &op->list); >>>>>>> + } else if (ls_port_has_changed(op->nbsp, new_nbsp)) { >>>>>>> + /* Existing port updated */ >>>>>>> + bool temp = false; >>>>>>> + if (lsp_is_type_changed(op->sb, new_nbsp, &temp) || >>>>>>> + !op->lsp_can_be_inc_processed || >>>>>>> + !lsp_can_be_inc_processed(new_nbsp)) { >>>>>>> + goto fail; >>>>>>> + } >>>>>>> + const struct sbrec_port_binding *sb = op->sb; >>>>>>> + if (sset_contains(&nd->svc_monitor_lsps, >>>>> new_nbsp->name)) { >>>>>>> + /* This port is used for svc monitor, which may >>> be >>>>> impacted >>>>>>> + * by this change. Fallback to recompute. */ >>>>>>> + goto fail; >>>>>>> + } >>>>>>> + ovn_port_destroy(&nd->ls_ports, op); >>>>>>> + op = ls_port_create(ovnsb_idl_txn, &nd->ls_ports, >>>>>>> + new_nbsp->name, new_nbsp, od, > sb, >>>>>>> + ni->sbrec_mirror_table, >>>>>>> + ni->sbrec_chassis_table, >>>>>>> + ni->sbrec_chassis_by_name, >>>>>>> + ni->sbrec_chassis_by_hostname); >>>>>>> + if (!op) { >>>>>>> + goto fail; >>>>>>> + } >>>>>>> + ovs_list_push_back(&ls_change->updated_ports, >>>>> &op->list); >>>>>>> + } >>>>>>> + op->visited = true; >>>>>>> + } >>>>>>> + >>>>>>> + /* Check for deleted ports */ >>>>>>> + HMAP_FOR_EACH_SAFE (op, dp_node, &od->ports) { >>>>>>> + if (!op->visited) { >>>>>>> + if (!op->lsp_can_be_inc_processed) { >>>>>>> + goto fail; >>>>>>> + } >>>>>>> + if (sset_contains(&nd->svc_monitor_lsps, op->key)) > { >>>>>>> + /* This port was used for svc monitor, which > may >>> be >>>>>>> + * impacted by this deletion. Fallback to >>>>> recompute. */ >>>>>>> + goto fail; >>>>>>> + } >>>>>>> + ovs_list_push_back(&ls_change->deleted_ports, >>>>>>> + &op->list); >>>>>>> + hmap_remove(&nd->ls_ports, &op->key_node); >>>>>>> + hmap_remove(&od->ports, &op->dp_node); >>>>>>> + sbrec_port_binding_delete(op->sb); >>>> >>>> Quoting from ovsdb-idl.c: >>>> >>>> /* Deletes 'row_' from its table. May free 'row_', so it must not be >>>> * accessed afterward. >>>> >>>> So, in theory, we shouldn't call delete here because we might end up >>>> failing for other ports/datapath and falling back to recompute. >>>> >>>> As far as I can tell, only newly inserted rows may be deleted by the >>>> current IDL implementation but I don't think it's safe to assume this >>>> will never change in the future. >>> >>> In general, when falling back to recompute in the middle of I-P, the >>> recompute function will do a full sync between desired states and > existing >>> states. Now the existing states are partially synced during the I-P > before >>> recompute, the recompute function will just see the synced part as > "already >>> in sync" and only apply for the rest of the changes. I think this > mechanism >>> applies for add/update/delete the same way. Here for delete, if the > record >>> is deleted from IDL, it won't be found by the recompute function, so > there >>> is no way to access it again. Did I miss something? >>> >> >> You're right, the ports we delete we first remove from the >> datapath->ports hmap. On the other hand, I think I spotted another >> leak. >> >> Deleted ports get pushed back to &ls_change->deleted_ports: >> >> /* Check for deleted ports */ >> HMAP_FOR_EACH_SAFE (op, dp_node, &od->ports) { >> if (!op->visited) { >> if (!op->lsp_can_be_inc_processed) { >> goto fail; >> } >> if (sset_contains(&nd->svc_monitor_lsps, op->key)) { >> /* This port was used for svc monitor, which may be >> * impacted by this deletion. Fallback to recompute. > */ >> goto fail; >> } >> ovs_list_push_back(&ls_change->deleted_ports, >> &op->list); >> hmap_remove(&nd->ls_ports, &op->key_node); >> hmap_remove(&od->ports, &op->dp_node); >> sbrec_port_binding_delete(op->sb); >> delete_fdb_entry(ni->sbrec_fdb_by_dp_and_port, > od->tunnel_key, >> op->tunnel_key); >> } >> } >> >> [...] >> >> fail: >> free(ls_change); >> >> If we deleted some ports and then got out of the loop early then here we >> leak the ports stored in ls_change->deleted_ports. Or am I missing >> something? > > The next line under fail:, after free(ls_change), is: > destroy_northd_data_tracked_changes(nd); > > The deleted ports will be destroyed here. Does this address your concern? >
Not really, let me highlight the code I mean: https://github.com/ovn-org/ovn/blob/056f66e1db48c6d098ce3607ff596c1e3ce28dbb/northd/northd.c#L5132-L5171 Now, if there are a couple of ports being deleted at the same time and the last one of them is used for svc monitor, then we remove these ports from the od's "ports" map and from "nd->ls_ports" and we add them to the "ls_change->deleted_ports" list (lines L5143-L5146). These ports are not referenced by anything else anymore except for "ls_change->deleted_ports". Also, they're not yet added to the northd data "nd". For the last one we break early because it's used for svc monitoring. We end up here: fail: free(ls_change); destroy_northd_data_tracked_changes(nd); return false; The ports that were in "ls_change->deleted_ports" were not completely destroyed and are not reachable anymore so they're leaked. Am I maybe missing a different cleanup path? Regards, Dumitru > Thanks, > Han > >> >>>> >>>> We should probably accumulate all records that need to be deleted and >>>> delete them at the end of the whole I-P run. Do we need some I-P infra >>>> changes? Maybe users can provide a callback that can be called > whenever >>>> an I-P run has finished? >>>> >>> I thought about this approach, but not for the reason above. I would do > the >>> same for all DB changes to avoid directly updating DBs in the I-P > engine. >>> However, the change is going to be quite big, considering how the > current >>> code base depends on the IDL. Changes to IDL are all over the place. In >>> addition to the potential memory and CPU cost to maintain the delta, > there >>> may be other tricky things to handle, such as: >>> - Sometimes it is problematic to see the old data (instead of the > updated >>> ones) in the IDL in the same round of I-P. (this has happened in >>> ovn-controller due to out-of-date IDL index, which was fixed > accordingly) >>> - Merge accumulated changes to the same DB record. >>> >> >> This is inline with an older proposal Mark had about abstracting out >> the IDL, I think. I agree it's a huge thing to do. On the other hand, >> the current IDL interactions that sometimes happen during I-P handlers >> and sometimes during recomputes worry me a bit too. It's hard to ensure >> that we don't introduce tricky bugs. >> >> Regards, >> Dumitru >> >>> So, I think it is better to avoid such a change unless it is really >>> necessary. >>> >>> Thanks, >>> Han >>> >>>> What do you think? >>>> >>>> Thanks, >>>> Dumitru >>>> >>> >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
