On Wed, Jun 28, 2023 at 12:07 PM Dumitru Ceara <[email protected]> wrote:
>
> On 6/28/23 19:28, Han Zhou wrote:
> > On Wed, Jun 28, 2023 at 2:14 AM Dumitru Ceara <[email protected]> wrote:
> >>
> >> On 6/27/23 19:47, Han Zhou wrote:
> >>> On Tue, Jun 27, 2023 at 6:51 AM Dumitru Ceara <[email protected]>
wrote:
> >>>>
> >>>> On 6/8/23 07:55, Han Zhou wrote:
> >>>>> On Wed, Jun 7, 2023 at 12:39 PM Numan Siddique <[email protected]>
wrote:
> >>>>>>
> >>>>>> On Fri, Jun 2, 2023 at 12:13 AM Han Zhou <[email protected]> wrote:
> >>>>>>>
> >>>>>>> This patch introduces a change handler for NB logical_switch
within
> >>> the
> >>>>>>> 'northd' node. It specifically handles cases where logical switch
> >>> ports
> >>>>>>> in the tracked logical switches are changed
(added/updated/deleted).
> >>>>>>> Only regular logical switch ports - which are common for VMs/Pods
-
> >>> are
> >>>>>>> addressed. For other scenarios, it reverts to recompute.
> >>>>>>>
> >>>>>>> This update avoids recompute in the northd node (especially the
> >>>>>>> resource-intensive build_ports()) for NB changes of VIF
> >>>>>>> add/update/delete.  However, it does not eliminate the need for
> > lflow
> >>>>>>> recompute.
> >>>>>>>
> >>>>>>> Below are the performance test results simulating an ovn-k8s
> > topology
> >>> of
> >>>>>>> 500 nodes x 50 lsp per node:
> >>>>>>>
> >>>>>>> Before:
> >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-del lsp_1_0001_01
> >>>>>>>     ovn-northd completion:          955ms
> >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001
> > lsp_1_0001_01
> >>>>> -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" --
> >>>>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
> >>>>>>>     ovn-northd completion:          919ms
> >>>>>>>
> >>>>>>> After:
> >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-del lsp_1_0001_01
> >>>>>>>     ovn-northd completion:          776ms
> >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001
> > lsp_1_0001_01
> >>>>> -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" --
> >>>>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
> >>>>>>>     ovn-northd completion:          773ms
> >>>>>>>
> >>>>>>> Both addition and deletion show ~20% reduction of ovn-northd
> >>> completion
> >>>>>>> time.  Note: the test uses only 1 thread of ovn-northd for flow
> >>>>>>> recompute. Using multithread should show a larger percentage of
> >>>>>>> improvement.
> >>>>>>>
> >>>>>>> Signed-off-by: Han Zhou <[email protected]>
> >>>>>>> Reviewed-by: Ales Musil <[email protected]>
> >>>>>>
> >>>>>> Few nits below.
> >>>>>>
> >>>>>> Acked-by: Numan Siddique <[email protected]>
> >>>>>>
> >>>>>> Numan
> >>>>>>
> >>>>
> >>>> Hi Han, Numan,
> >>>>
> >>>> First of all, sorry for the late review, I had these patches on my
list
> >>>> for a while and only got to them after they were merged.
> >>>>
> >>>
> >>> Hi Dumitru, thanks for your diligent review and comments are always
> > welcome
> >>> any time.
> >>>
> >>>>>>
> >>>>>>> ---
> >>>>>>>  lib/ovn-util.c           |  15 ++
> >>>>>>>  lib/ovn-util.h           |   1 +
> >>>>>>>  northd/en-northd.c       | 130 ++++++---
> >>>>>>>  northd/en-northd.h       |   2 +
> >>>>>>>  northd/inc-proc-northd.c |  12 +-
> >>>>>>>  northd/northd.c          | 567
> >>> +++++++++++++++++++++++++++++++++------
> >>>>>>>  northd/northd.h          |  27 +-
> >>>>>>>  tests/ovn-northd.at      |  58 ++++
> >>>>>>>  8 files changed, 685 insertions(+), 127 deletions(-)
> >>>>>>>
> >>>>
> >>>> [...]
> >>>>
> >>>>>>> +bool
> >>>>>>> +northd_handle_ls_changes(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >>>>>>> +                         const struct northd_input *ni,
> >>>>>>> +                         struct northd_data *nd)
> >>>>>>> +{
> >>>>>>> +    const struct nbrec_logical_switch *changed_ls;
> >>>>>>> +    struct ls_change *ls_change = NULL;
> >>>>>>> +
> >>>>>>> +    NBREC_LOGICAL_SWITCH_TABLE_FOR_EACH_TRACKED (changed_ls,
> >>>>>>> +
> >>>>> ni->nbrec_logical_switch_table) {
> >>>>>>> +        ls_change = NULL;
> >>>>>>> +        if (nbrec_logical_switch_is_new(changed_ls) ||
> >>>>>>> +            nbrec_logical_switch_is_deleted(changed_ls)) {
> >>>>>>> +            goto fail;
> >>>>>>> +        }
> >>>>>>> +        struct ovn_datapath *od = ovn_datapath_find(
> >>>>>>> +                                    &nd->ls_datapaths.datapaths,
> >>>>>>> +                                    &changed_ls->header_.uuid);
> >>>>>>> +        if (!od) {
> >>>>>>> +            static struct vlog_rate_limit rl =
> >>> VLOG_RATE_LIMIT_INIT(1,
> >>>>> 1);
> >>>>>>> +            VLOG_WARN_RL(&rl, "Internal error: a tracked updated
LS
> >>>>> doesn't "
> >>>>>>> +                         "exist in ls_datapaths: "UUID_FMT,
> >>>>>>> +                         UUID_ARGS(&changed_ls->header_.uuid));
> >>>>>>> +            goto fail;
> >>>>>>> +        }
> >>>>>>> +
> >>>>>>> +        /* Now only able to handle lsp changes. */
> >>>>>>> +        if (check_ls_changes_other_than_lsp(changed_ls)) {
> >>>>>>> +            goto fail;
> >>>>>>> +        }
> >>>>>>> +
> >>>>>>> +        ls_change = xzalloc(sizeof *ls_change);
> >>>>>>> +        ls_change->od = od;
> >>>>>>> +        ovs_list_init(&ls_change->added_ports);
> >>>>>>> +        ovs_list_init(&ls_change->deleted_ports);
> >>>>>>> +        ovs_list_init(&ls_change->updated_ports);
> >>>>>>> +
> >>>>>>> +        struct ovn_port *op;
> >>>>>>> +        HMAP_FOR_EACH (op, dp_node, &od->ports) {
> >>>>>>> +            op->visited = false;
> >>>>>>> +        }
> >>>>>>> +
> >>>>>>> +        /* Compare the individual ports in the old and new
Logical
> >>>>> Switches */
> >>>>>>> +        for (size_t j = 0; j < changed_ls->n_ports; ++j) {
> >>>>>>> +            struct nbrec_logical_switch_port *new_nbsp =
> >>>>> changed_ls->ports[j];
> >>>>>>> +            op = ovn_port_find_in_datapath(od, new_nbsp->name);
> >>>>>>> +
> >>>>>>> +            if (!op) {
> >>>>>>> +                if (!lsp_can_be_inc_processed(new_nbsp)) {
> >>>>>>> +                    goto fail;
> >>>>>>> +                }
> >>>>>>> +                op = ls_port_create(ovnsb_idl_txn, &nd->ls_ports,
> >>>>>>> +                                    new_nbsp->name, new_nbsp, od,
> >>> NULL,
> >>>>>>> +                                    ni->sbrec_mirror_table,
> >>>>>>> +                                    ni->sbrec_chassis_table,
> >>>>>>> +                                    ni->sbrec_chassis_by_name,
> >>>>>>> +
 ni->sbrec_chassis_by_hostname);
> >>>>>>> +                if (!op) {
> >>>>>>> +                    goto fail;
> >>>>>>> +                }
> >>>>>>> +                ovs_list_push_back(&ls_change->added_ports,
> >>>>>>> +                                   &op->list);
> >>>>>>> +            } else if (ls_port_has_changed(op->nbsp, new_nbsp)) {
> >>>>>>> +                /* Existing port updated */
> >>>>>>> +                bool temp = false;
> >>>>>>> +                if (lsp_is_type_changed(op->sb, new_nbsp, &temp)
||
> >>>>>>> +                    !op->lsp_can_be_inc_processed ||
> >>>>>>> +                    !lsp_can_be_inc_processed(new_nbsp)) {
> >>>>>>> +                    goto fail;
> >>>>>>> +                }
> >>>>>>> +                const struct sbrec_port_binding *sb = op->sb;
> >>>>>>> +                if (sset_contains(&nd->svc_monitor_lsps,
> >>>>> new_nbsp->name)) {
> >>>>>>> +                    /* This port is used for svc monitor, which
may
> >>> be
> >>>>> impacted
> >>>>>>> +                     * by this change. Fallback to recompute. */
> >>>>>>> +                    goto fail;
> >>>>>>> +                }
> >>>>>>> +                ovn_port_destroy(&nd->ls_ports, op);
> >>>>>>> +                op = ls_port_create(ovnsb_idl_txn, &nd->ls_ports,
> >>>>>>> +                                    new_nbsp->name, new_nbsp, od,
> > sb,
> >>>>>>> +                                    ni->sbrec_mirror_table,
> >>>>>>> +                                    ni->sbrec_chassis_table,
> >>>>>>> +                                    ni->sbrec_chassis_by_name,
> >>>>>>> +
 ni->sbrec_chassis_by_hostname);
> >>>>>>> +                if (!op) {
> >>>>>>> +                    goto fail;
> >>>>>>> +                }
> >>>>>>> +                ovs_list_push_back(&ls_change->updated_ports,
> >>>>> &op->list);
> >>>>>>> +            }
> >>>>>>> +            op->visited = true;
> >>>>>>> +        }
> >>>>>>> +
> >>>>>>> +        /* Check for deleted ports */
> >>>>>>> +        HMAP_FOR_EACH_SAFE (op, dp_node, &od->ports) {
> >>>>>>> +            if (!op->visited) {
> >>>>>>> +                if (!op->lsp_can_be_inc_processed) {
> >>>>>>> +                    goto fail;
> >>>>>>> +                }
> >>>>>>> +                if (sset_contains(&nd->svc_monitor_lsps,
op->key))
> > {
> >>>>>>> +                    /* This port was used for svc monitor, which
> > may
> >>> be
> >>>>>>> +                     * impacted by this deletion. Fallback to
> >>>>> recompute. */
> >>>>>>> +                    goto fail;
> >>>>>>> +                }
> >>>>>>> +                ovs_list_push_back(&ls_change->deleted_ports,
> >>>>>>> +                                   &op->list);
> >>>>>>> +                hmap_remove(&nd->ls_ports, &op->key_node);
> >>>>>>> +                hmap_remove(&od->ports, &op->dp_node);
> >>>>>>> +                sbrec_port_binding_delete(op->sb);
> >>>>
> >>>> Quoting from ovsdb-idl.c:
> >>>>
> >>>> /* Deletes 'row_' from its table.  May free 'row_', so it must not be
> >>>>  * accessed afterward.
> >>>>
> >>>> So, in theory, we shouldn't call delete here because we might end up
> >>>> failing for other ports/datapath and falling back to recompute.
> >>>>
> >>>> As far as I can tell, only newly inserted rows may be deleted by the
> >>>> current IDL implementation but I don't think it's safe to assume this
> >>>> will never change in the future.
> >>>
> >>> In general, when falling back to recompute in the middle of I-P, the
> >>> recompute function will do a full sync between desired states and
> > existing
> >>> states. Now the existing states are partially synced during the I-P
> > before
> >>> recompute, the recompute function will just see the synced part as
> > "already
> >>> in sync" and only apply for the rest of the changes. I think this
> > mechanism
> >>> applies for add/update/delete the same way. Here for delete, if the
> > record
> >>> is deleted from IDL, it won't be found by the recompute function, so
> > there
> >>> is no way to access it again. Did I miss something?
> >>>
> >>
> >> You're right, the ports we delete we first remove from the
> >> datapath->ports hmap.  On the other hand, I think I spotted another
> >> leak.
> >>
> >> Deleted ports get pushed back to &ls_change->deleted_ports:
> >>
> >>         /* Check for deleted ports */
> >>         HMAP_FOR_EACH_SAFE (op, dp_node, &od->ports) {
> >>             if (!op->visited) {
> >>                 if (!op->lsp_can_be_inc_processed) {
> >>                     goto fail;
> >>                 }
> >>                 if (sset_contains(&nd->svc_monitor_lsps, op->key)) {
> >>                     /* This port was used for svc monitor, which may be
> >>                      * impacted by this deletion. Fallback to
recompute.
> > */
> >>                     goto fail;
> >>                 }
> >>                 ovs_list_push_back(&ls_change->deleted_ports,
> >>                                    &op->list);
> >>                 hmap_remove(&nd->ls_ports, &op->key_node);
> >>                 hmap_remove(&od->ports, &op->dp_node);
> >>                 sbrec_port_binding_delete(op->sb);
> >>                 delete_fdb_entry(ni->sbrec_fdb_by_dp_and_port,
> > od->tunnel_key,
> >>                                  op->tunnel_key);
> >>             }
> >>         }
> >>
> >> [...]
> >>
> >> fail:
> >>     free(ls_change);
> >>
> >> If we deleted some ports and then got out of the loop early then here
we
> >> leak the ports stored in ls_change->deleted_ports.  Or am I missing
> >> something?
> >
> > The next line under fail:, after free(ls_change), is:
> >     destroy_northd_data_tracked_changes(nd);
> >
> > The deleted ports will be destroyed here. Does this address your
concern?
> >
>
> Not really, let me highlight the code I mean:
>
>
https://github.com/ovn-org/ovn/blob/056f66e1db48c6d098ce3607ff596c1e3ce28dbb/northd/northd.c#L5132-L5171
>
> Now, if there are a couple of ports being deleted at the same time and
> the last one of them is used for svc monitor, then we remove these ports
> from the od's "ports" map and from "nd->ls_ports" and we add them to the
> "ls_change->deleted_ports" list (lines L5143-L5146).  These ports are
> not referenced by anything else anymore except for
> "ls_change->deleted_ports".  Also, they're not yet added to the northd
> data "nd".  For the last one we break early because it's used for svc
> monitoring.
>
> We end up here:
>
> fail:
>     free(ls_change);
>     destroy_northd_data_tracked_changes(nd);
>     return false;
>
> The ports that were in "ls_change->deleted_ports" were not completely
> destroyed and are not reachable anymore so they're leaked.  Am I maybe
> missing a different cleanup path?
>
You are right!! Sorry for missing your point earlier. I will come up with a
fix and a test case to make sure this scenario is covered.
Thanks for pointing this out!
Han

> Regards,
> Dumitru
>
> > Thanks,
> > Han
> >
> >>
> >>>>
> >>>> We should probably accumulate all records that need to be deleted and
> >>>> delete them at the end of the whole I-P run.  Do we need some I-P
infra
> >>>> changes?  Maybe users can provide a callback that can be called
> > whenever
> >>>> an I-P run has finished?
> >>>>
> >>> I thought about this approach, but not for the reason above. I would
do
> > the
> >>> same for all DB changes to avoid directly updating DBs in the I-P
> > engine.
> >>> However, the change is going to be quite big, considering how the
> > current
> >>> code base depends on the IDL. Changes to IDL are all over the place.
In
> >>> addition to the potential memory and CPU cost to maintain the delta,
> > there
> >>> may be other tricky things to handle, such as:
> >>> - Sometimes it is problematic to see the old data (instead of the
> > updated
> >>> ones) in the IDL in the same round of I-P. (this has happened in
> >>> ovn-controller due to out-of-date IDL index, which was fixed
> > accordingly)
> >>> - Merge accumulated changes to the same DB record.
> >>>
> >>
> >> This is inline with an older proposal Mark had about abstracting out
> >> the IDL, I think.  I agree it's a huge thing to do.  On the other hand,
> >> the current IDL interactions that sometimes happen during I-P handlers
> >> and sometimes during recomputes worry me a bit too.  It's hard to
ensure
> >> that we don't introduce tricky bugs.
> >>
> >> Regards,
> >> Dumitru
> >>
> >>> So, I think it is better to avoid such a change unless it is really
> >>> necessary.
> >>>
> >>> Thanks,
> >>> Han
> >>>
> >>>> What do you think?
> >>>>
> >>>> Thanks,
> >>>> Dumitru
> >>>>
> >>>
> >>
> >
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to