On Wed, Jun 28, 2023 at 1:17 PM Han Zhou <[email protected]> wrote: > > > > On Wed, Jun 28, 2023 at 12:07 PM Dumitru Ceara <[email protected]> wrote: > > > > On 6/28/23 19:28, Han Zhou wrote: > > > On Wed, Jun 28, 2023 at 2:14 AM Dumitru Ceara <[email protected]> wrote: > > >> > > >> On 6/27/23 19:47, Han Zhou wrote: > > >>> On Tue, Jun 27, 2023 at 6:51 AM Dumitru Ceara <[email protected]> wrote: > > >>>> > > >>>> On 6/8/23 07:55, Han Zhou wrote: > > >>>>> On Wed, Jun 7, 2023 at 12:39 PM Numan Siddique <[email protected]> wrote: > > >>>>>> > > >>>>>> On Fri, Jun 2, 2023 at 12:13 AM Han Zhou <[email protected]> wrote: > > >>>>>>> > > >>>>>>> This patch introduces a change handler for NB logical_switch within > > >>> the > > >>>>>>> 'northd' node. It specifically handles cases where logical switch > > >>> ports > > >>>>>>> in the tracked logical switches are changed (added/updated/deleted). > > >>>>>>> Only regular logical switch ports - which are common for VMs/Pods - > > >>> are > > >>>>>>> addressed. For other scenarios, it reverts to recompute. > > >>>>>>> > > >>>>>>> This update avoids recompute in the northd node (especially the > > >>>>>>> resource-intensive build_ports()) for NB changes of VIF > > >>>>>>> add/update/delete. However, it does not eliminate the need for > > > lflow > > >>>>>>> recompute. > > >>>>>>> > > >>>>>>> Below are the performance test results simulating an ovn-k8s > > > topology > > >>> of > > >>>>>>> 500 nodes x 50 lsp per node: > > >>>>>>> > > >>>>>>> Before: > > >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-del lsp_1_0001_01 > > >>>>>>> ovn-northd completion: 955ms > > >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 > > > lsp_1_0001_01 > > >>>>> -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- > > >>>>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" > > >>>>>>> ovn-northd completion: 919ms > > >>>>>>> > > >>>>>>> After: > > >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-del lsp_1_0001_01 > > >>>>>>> ovn-northd completion: 776ms > > >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 > > > lsp_1_0001_01 > > >>>>> -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- > > >>>>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" > > >>>>>>> ovn-northd completion: 773ms > > >>>>>>> > > >>>>>>> Both addition and deletion show ~20% reduction of ovn-northd > > >>> completion > > >>>>>>> time. Note: the test uses only 1 thread of ovn-northd for flow > > >>>>>>> recompute. Using multithread should show a larger percentage of > > >>>>>>> improvement. > > >>>>>>> > > >>>>>>> Signed-off-by: Han Zhou <[email protected]> > > >>>>>>> Reviewed-by: Ales Musil <[email protected]> > > >>>>>> > > >>>>>> Few nits below. > > >>>>>> > > >>>>>> Acked-by: Numan Siddique <[email protected]> > > >>>>>> > > >>>>>> Numan > > >>>>>> > > >>>> > > >>>> Hi Han, Numan, > > >>>> > > >>>> First of all, sorry for the late review, I had these patches on my list > > >>>> for a while and only got to them after they were merged. > > >>>> > > >>> > > >>> Hi Dumitru, thanks for your diligent review and comments are always > > > welcome > > >>> any time. > > >>> > > >>>>>> > > >>>>>>> --- > > >>>>>>> lib/ovn-util.c | 15 ++ > > >>>>>>> lib/ovn-util.h | 1 + > > >>>>>>> northd/en-northd.c | 130 ++++++--- > > >>>>>>> northd/en-northd.h | 2 + > > >>>>>>> northd/inc-proc-northd.c | 12 +- > > >>>>>>> northd/northd.c | 567 > > >>> +++++++++++++++++++++++++++++++++------ > > >>>>>>> northd/northd.h | 27 +- > > >>>>>>> tests/ovn-northd.at | 58 ++++ > > >>>>>>> 8 files changed, 685 insertions(+), 127 deletions(-) > > >>>>>>> > > >>>> > > >>>> [...] > > >>>> > > >>>>>>> +bool > > >>>>>>> +northd_handle_ls_changes(struct ovsdb_idl_txn *ovnsb_idl_txn, > > >>>>>>> + const struct northd_input *ni, > > >>>>>>> + struct northd_data *nd) > > >>>>>>> +{ > > >>>>>>> + const struct nbrec_logical_switch *changed_ls; > > >>>>>>> + struct ls_change *ls_change = NULL; > > >>>>>>> + > > >>>>>>> + NBREC_LOGICAL_SWITCH_TABLE_FOR_EACH_TRACKED (changed_ls, > > >>>>>>> + > > >>>>> ni->nbrec_logical_switch_table) { > > >>>>>>> + ls_change = NULL; > > >>>>>>> + if (nbrec_logical_switch_is_new(changed_ls) || > > >>>>>>> + nbrec_logical_switch_is_deleted(changed_ls)) { > > >>>>>>> + goto fail; > > >>>>>>> + } > > >>>>>>> + struct ovn_datapath *od = ovn_datapath_find( > > >>>>>>> + &nd->ls_datapaths.datapaths, > > >>>>>>> + &changed_ls->header_.uuid); > > >>>>>>> + if (!od) { > > >>>>>>> + static struct vlog_rate_limit rl = > > >>> VLOG_RATE_LIMIT_INIT(1, > > >>>>> 1); > > >>>>>>> + VLOG_WARN_RL(&rl, "Internal error: a tracked updated LS > > >>>>> doesn't " > > >>>>>>> + "exist in ls_datapaths: "UUID_FMT, > > >>>>>>> + UUID_ARGS(&changed_ls->header_.uuid)); > > >>>>>>> + goto fail; > > >>>>>>> + } > > >>>>>>> + > > >>>>>>> + /* Now only able to handle lsp changes. */ > > >>>>>>> + if (check_ls_changes_other_than_lsp(changed_ls)) { > > >>>>>>> + goto fail; > > >>>>>>> + } > > >>>>>>> + > > >>>>>>> + ls_change = xzalloc(sizeof *ls_change); > > >>>>>>> + ls_change->od = od; > > >>>>>>> + ovs_list_init(&ls_change->added_ports); > > >>>>>>> + ovs_list_init(&ls_change->deleted_ports); > > >>>>>>> + ovs_list_init(&ls_change->updated_ports); > > >>>>>>> + > > >>>>>>> + struct ovn_port *op; > > >>>>>>> + HMAP_FOR_EACH (op, dp_node, &od->ports) { > > >>>>>>> + op->visited = false; > > >>>>>>> + } > > >>>>>>> + > > >>>>>>> + /* Compare the individual ports in the old and new Logical > > >>>>> Switches */ > > >>>>>>> + for (size_t j = 0; j < changed_ls->n_ports; ++j) { > > >>>>>>> + struct nbrec_logical_switch_port *new_nbsp = > > >>>>> changed_ls->ports[j]; > > >>>>>>> + op = ovn_port_find_in_datapath(od, new_nbsp->name); > > >>>>>>> + > > >>>>>>> + if (!op) { > > >>>>>>> + if (!lsp_can_be_inc_processed(new_nbsp)) { > > >>>>>>> + goto fail; > > >>>>>>> + } > > >>>>>>> + op = ls_port_create(ovnsb_idl_txn, &nd->ls_ports, > > >>>>>>> + new_nbsp->name, new_nbsp, od, > > >>> NULL, > > >>>>>>> + ni->sbrec_mirror_table, > > >>>>>>> + ni->sbrec_chassis_table, > > >>>>>>> + ni->sbrec_chassis_by_name, > > >>>>>>> + ni->sbrec_chassis_by_hostname); > > >>>>>>> + if (!op) { > > >>>>>>> + goto fail; > > >>>>>>> + } > > >>>>>>> + ovs_list_push_back(&ls_change->added_ports, > > >>>>>>> + &op->list); > > >>>>>>> + } else if (ls_port_has_changed(op->nbsp, new_nbsp)) { > > >>>>>>> + /* Existing port updated */ > > >>>>>>> + bool temp = false; > > >>>>>>> + if (lsp_is_type_changed(op->sb, new_nbsp, &temp) || > > >>>>>>> + !op->lsp_can_be_inc_processed || > > >>>>>>> + !lsp_can_be_inc_processed(new_nbsp)) { > > >>>>>>> + goto fail; > > >>>>>>> + } > > >>>>>>> + const struct sbrec_port_binding *sb = op->sb; > > >>>>>>> + if (sset_contains(&nd->svc_monitor_lsps, > > >>>>> new_nbsp->name)) { > > >>>>>>> + /* This port is used for svc monitor, which may > > >>> be > > >>>>> impacted > > >>>>>>> + * by this change. Fallback to recompute. */ > > >>>>>>> + goto fail; > > >>>>>>> + } > > >>>>>>> + ovn_port_destroy(&nd->ls_ports, op); > > >>>>>>> + op = ls_port_create(ovnsb_idl_txn, &nd->ls_ports, > > >>>>>>> + new_nbsp->name, new_nbsp, od, > > > sb, > > >>>>>>> + ni->sbrec_mirror_table, > > >>>>>>> + ni->sbrec_chassis_table, > > >>>>>>> + ni->sbrec_chassis_by_name, > > >>>>>>> + ni->sbrec_chassis_by_hostname); > > >>>>>>> + if (!op) { > > >>>>>>> + goto fail; > > >>>>>>> + } > > >>>>>>> + ovs_list_push_back(&ls_change->updated_ports, > > >>>>> &op->list); > > >>>>>>> + } > > >>>>>>> + op->visited = true; > > >>>>>>> + } > > >>>>>>> + > > >>>>>>> + /* Check for deleted ports */ > > >>>>>>> + HMAP_FOR_EACH_SAFE (op, dp_node, &od->ports) { > > >>>>>>> + if (!op->visited) { > > >>>>>>> + if (!op->lsp_can_be_inc_processed) { > > >>>>>>> + goto fail; > > >>>>>>> + } > > >>>>>>> + if (sset_contains(&nd->svc_monitor_lsps, op->key)) > > > { > > >>>>>>> + /* This port was used for svc monitor, which > > > may > > >>> be > > >>>>>>> + * impacted by this deletion. Fallback to > > >>>>> recompute. */ > > >>>>>>> + goto fail; > > >>>>>>> + } > > >>>>>>> + ovs_list_push_back(&ls_change->deleted_ports, > > >>>>>>> + &op->list); > > >>>>>>> + hmap_remove(&nd->ls_ports, &op->key_node); > > >>>>>>> + hmap_remove(&od->ports, &op->dp_node); > > >>>>>>> + sbrec_port_binding_delete(op->sb); > > >>>> > > >>>> Quoting from ovsdb-idl.c: > > >>>> > > >>>> /* Deletes 'row_' from its table. May free 'row_', so it must not be > > >>>> * accessed afterward. > > >>>> > > >>>> So, in theory, we shouldn't call delete here because we might end up > > >>>> failing for other ports/datapath and falling back to recompute. > > >>>> > > >>>> As far as I can tell, only newly inserted rows may be deleted by the > > >>>> current IDL implementation but I don't think it's safe to assume this > > >>>> will never change in the future. > > >>> > > >>> In general, when falling back to recompute in the middle of I-P, the > > >>> recompute function will do a full sync between desired states and > > > existing > > >>> states. Now the existing states are partially synced during the I-P > > > before > > >>> recompute, the recompute function will just see the synced part as > > > "already > > >>> in sync" and only apply for the rest of the changes. I think this > > > mechanism > > >>> applies for add/update/delete the same way. Here for delete, if the > > > record > > >>> is deleted from IDL, it won't be found by the recompute function, so > > > there > > >>> is no way to access it again. Did I miss something? > > >>> > > >> > > >> You're right, the ports we delete we first remove from the > > >> datapath->ports hmap. On the other hand, I think I spotted another > > >> leak. > > >> > > >> Deleted ports get pushed back to &ls_change->deleted_ports: > > >> > > >> /* Check for deleted ports */ > > >> HMAP_FOR_EACH_SAFE (op, dp_node, &od->ports) { > > >> if (!op->visited) { > > >> if (!op->lsp_can_be_inc_processed) { > > >> goto fail; > > >> } > > >> if (sset_contains(&nd->svc_monitor_lsps, op->key)) { > > >> /* This port was used for svc monitor, which may be > > >> * impacted by this deletion. Fallback to recompute. > > > */ > > >> goto fail; > > >> } > > >> ovs_list_push_back(&ls_change->deleted_ports, > > >> &op->list); > > >> hmap_remove(&nd->ls_ports, &op->key_node); > > >> hmap_remove(&od->ports, &op->dp_node); > > >> sbrec_port_binding_delete(op->sb); > > >> delete_fdb_entry(ni->sbrec_fdb_by_dp_and_port, > > > od->tunnel_key, > > >> op->tunnel_key); > > >> } > > >> } > > >> > > >> [...] > > >> > > >> fail: > > >> free(ls_change); > > >> > > >> If we deleted some ports and then got out of the loop early then here we > > >> leak the ports stored in ls_change->deleted_ports. Or am I missing > > >> something? > > > > > > The next line under fail:, after free(ls_change), is: > > > destroy_northd_data_tracked_changes(nd); > > > > > > The deleted ports will be destroyed here. Does this address your concern? > > > > > > > Not really, let me highlight the code I mean: > > > > https://github.com/ovn-org/ovn/blob/056f66e1db48c6d098ce3607ff596c1e3ce28dbb/northd/northd.c#L5132-L5171 > > > > Now, if there are a couple of ports being deleted at the same time and > > the last one of them is used for svc monitor, then we remove these ports > > from the od's "ports" map and from "nd->ls_ports" and we add them to the > > "ls_change->deleted_ports" list (lines L5143-L5146). These ports are > > not referenced by anything else anymore except for > > "ls_change->deleted_ports". Also, they're not yet added to the northd > > data "nd". For the last one we break early because it's used for svc > > monitoring. > > > > We end up here: > > > > fail: > > free(ls_change); > > destroy_northd_data_tracked_changes(nd); > > return false; > > > > The ports that were in "ls_change->deleted_ports" were not completely > > destroyed and are not reachable anymore so they're leaked. Am I maybe > > missing a different cleanup path? > > > You are right!! Sorry for missing your point earlier. I will come up with a fix and a test case to make sure this scenario is covered. > Thanks for pointing this out! > Han > Fix posted here: https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ PTAL. Thanks again for catching it!
> > Regards, > > Dumitru > > > > > Thanks, > > > Han > > > > > >> > > >>>> > > >>>> We should probably accumulate all records that need to be deleted and > > >>>> delete them at the end of the whole I-P run. Do we need some I-P infra > > >>>> changes? Maybe users can provide a callback that can be called > > > whenever > > >>>> an I-P run has finished? > > >>>> > > >>> I thought about this approach, but not for the reason above. I would do > > > the > > >>> same for all DB changes to avoid directly updating DBs in the I-P > > > engine. > > >>> However, the change is going to be quite big, considering how the > > > current > > >>> code base depends on the IDL. Changes to IDL are all over the place. In > > >>> addition to the potential memory and CPU cost to maintain the delta, > > > there > > >>> may be other tricky things to handle, such as: > > >>> - Sometimes it is problematic to see the old data (instead of the > > > updated > > >>> ones) in the IDL in the same round of I-P. (this has happened in > > >>> ovn-controller due to out-of-date IDL index, which was fixed > > > accordingly) > > >>> - Merge accumulated changes to the same DB record. > > >>> > > >> > > >> This is inline with an older proposal Mark had about abstracting out > > >> the IDL, I think. I agree it's a huge thing to do. On the other hand, > > >> the current IDL interactions that sometimes happen during I-P handlers > > >> and sometimes during recomputes worry me a bit too. It's hard to ensure > > >> that we don't introduce tricky bugs. > > >> > > >> Regards, > > >> Dumitru > > >> > > >>> So, I think it is better to avoid such a change unless it is really > > >>> necessary. > > >>> > > >>> Thanks, > > >>> Han > > >>> > > >>>> What do you think? > > >>>> > > >>>> Thanks, > > >>>> Dumitru > > >>>> > > >>> > > >> > > > > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
