On Wed, Jun 28, 2023 at 2:14 AM Dumitru Ceara <[email protected]> wrote:
>
> On 6/27/23 19:47, Han Zhou wrote:
> > On Tue, Jun 27, 2023 at 6:51 AM Dumitru Ceara <[email protected]> wrote:
> >>
> >> On 6/8/23 07:55, Han Zhou wrote:
> >>> On Wed, Jun 7, 2023 at 12:39 PM Numan Siddique <[email protected]> wrote:
> >>>>
> >>>> On Fri, Jun 2, 2023 at 12:13 AM Han Zhou <[email protected]> wrote:
> >>>>>
> >>>>> This patch introduces a change handler for NB logical_switch within
> > the
> >>>>> 'northd' node. It specifically handles cases where logical switch
> > ports
> >>>>> in the tracked logical switches are changed (added/updated/deleted).
> >>>>> Only regular logical switch ports - which are common for VMs/Pods -
> > are
> >>>>> addressed. For other scenarios, it reverts to recompute.
> >>>>>
> >>>>> This update avoids recompute in the northd node (especially the
> >>>>> resource-intensive build_ports()) for NB changes of VIF
> >>>>> add/update/delete.  However, it does not eliminate the need for
lflow
> >>>>> recompute.
> >>>>>
> >>>>> Below are the performance test results simulating an ovn-k8s
topology
> > of
> >>>>> 500 nodes x 50 lsp per node:
> >>>>>
> >>>>> Before:
> >>>>> ovn-nbctl --wait=hv --print-wait-time lsp-del lsp_1_0001_01
> >>>>>     ovn-northd completion:          955ms
> >>>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001
lsp_1_0001_01
> >>> -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" --
> >>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
> >>>>>     ovn-northd completion:          919ms
> >>>>>
> >>>>> After:
> >>>>> ovn-nbctl --wait=hv --print-wait-time lsp-del lsp_1_0001_01
> >>>>>     ovn-northd completion:          776ms
> >>>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001
lsp_1_0001_01
> >>> -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" --
> >>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
> >>>>>     ovn-northd completion:          773ms
> >>>>>
> >>>>> Both addition and deletion show ~20% reduction of ovn-northd
> > completion
> >>>>> time.  Note: the test uses only 1 thread of ovn-northd for flow
> >>>>> recompute. Using multithread should show a larger percentage of
> >>>>> improvement.
> >>>>>
> >>>>> Signed-off-by: Han Zhou <[email protected]>
> >>>>> Reviewed-by: Ales Musil <[email protected]>
> >>>>
> >>>> Few nits below.
> >>>>
> >>>> Acked-by: Numan Siddique <[email protected]>
> >>>>
> >>>> Numan
> >>>>
> >>
> >> Hi Han, Numan,
> >>
> >> First of all, sorry for the late review, I had these patches on my list
> >> for a while and only got to them after they were merged.
> >>
> >
> > Hi Dumitru, thanks for your diligent review and comments are always
welcome
> > any time.
> >
> >>>>
> >>>>> ---
> >>>>>  lib/ovn-util.c           |  15 ++
> >>>>>  lib/ovn-util.h           |   1 +
> >>>>>  northd/en-northd.c       | 130 ++++++---
> >>>>>  northd/en-northd.h       |   2 +
> >>>>>  northd/inc-proc-northd.c |  12 +-
> >>>>>  northd/northd.c          | 567
> > +++++++++++++++++++++++++++++++++------
> >>>>>  northd/northd.h          |  27 +-
> >>>>>  tests/ovn-northd.at      |  58 ++++
> >>>>>  8 files changed, 685 insertions(+), 127 deletions(-)
> >>>>>
> >>
> >> [...]
> >>
> >>>>> +bool
> >>>>> +northd_handle_ls_changes(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >>>>> +                         const struct northd_input *ni,
> >>>>> +                         struct northd_data *nd)
> >>>>> +{
> >>>>> +    const struct nbrec_logical_switch *changed_ls;
> >>>>> +    struct ls_change *ls_change = NULL;
> >>>>> +
> >>>>> +    NBREC_LOGICAL_SWITCH_TABLE_FOR_EACH_TRACKED (changed_ls,
> >>>>> +
> >>> ni->nbrec_logical_switch_table) {
> >>>>> +        ls_change = NULL;
> >>>>> +        if (nbrec_logical_switch_is_new(changed_ls) ||
> >>>>> +            nbrec_logical_switch_is_deleted(changed_ls)) {
> >>>>> +            goto fail;
> >>>>> +        }
> >>>>> +        struct ovn_datapath *od = ovn_datapath_find(
> >>>>> +                                    &nd->ls_datapaths.datapaths,
> >>>>> +                                    &changed_ls->header_.uuid);
> >>>>> +        if (!od) {
> >>>>> +            static struct vlog_rate_limit rl =
> > VLOG_RATE_LIMIT_INIT(1,
> >>> 1);
> >>>>> +            VLOG_WARN_RL(&rl, "Internal error: a tracked updated LS
> >>> doesn't "
> >>>>> +                         "exist in ls_datapaths: "UUID_FMT,
> >>>>> +                         UUID_ARGS(&changed_ls->header_.uuid));
> >>>>> +            goto fail;
> >>>>> +        }
> >>>>> +
> >>>>> +        /* Now only able to handle lsp changes. */
> >>>>> +        if (check_ls_changes_other_than_lsp(changed_ls)) {
> >>>>> +            goto fail;
> >>>>> +        }
> >>>>> +
> >>>>> +        ls_change = xzalloc(sizeof *ls_change);
> >>>>> +        ls_change->od = od;
> >>>>> +        ovs_list_init(&ls_change->added_ports);
> >>>>> +        ovs_list_init(&ls_change->deleted_ports);
> >>>>> +        ovs_list_init(&ls_change->updated_ports);
> >>>>> +
> >>>>> +        struct ovn_port *op;
> >>>>> +        HMAP_FOR_EACH (op, dp_node, &od->ports) {
> >>>>> +            op->visited = false;
> >>>>> +        }
> >>>>> +
> >>>>> +        /* Compare the individual ports in the old and new Logical
> >>> Switches */
> >>>>> +        for (size_t j = 0; j < changed_ls->n_ports; ++j) {
> >>>>> +            struct nbrec_logical_switch_port *new_nbsp =
> >>> changed_ls->ports[j];
> >>>>> +            op = ovn_port_find_in_datapath(od, new_nbsp->name);
> >>>>> +
> >>>>> +            if (!op) {
> >>>>> +                if (!lsp_can_be_inc_processed(new_nbsp)) {
> >>>>> +                    goto fail;
> >>>>> +                }
> >>>>> +                op = ls_port_create(ovnsb_idl_txn, &nd->ls_ports,
> >>>>> +                                    new_nbsp->name, new_nbsp, od,
> > NULL,
> >>>>> +                                    ni->sbrec_mirror_table,
> >>>>> +                                    ni->sbrec_chassis_table,
> >>>>> +                                    ni->sbrec_chassis_by_name,
> >>>>> +                                    ni->sbrec_chassis_by_hostname);
> >>>>> +                if (!op) {
> >>>>> +                    goto fail;
> >>>>> +                }
> >>>>> +                ovs_list_push_back(&ls_change->added_ports,
> >>>>> +                                   &op->list);
> >>>>> +            } else if (ls_port_has_changed(op->nbsp, new_nbsp)) {
> >>>>> +                /* Existing port updated */
> >>>>> +                bool temp = false;
> >>>>> +                if (lsp_is_type_changed(op->sb, new_nbsp, &temp) ||
> >>>>> +                    !op->lsp_can_be_inc_processed ||
> >>>>> +                    !lsp_can_be_inc_processed(new_nbsp)) {
> >>>>> +                    goto fail;
> >>>>> +                }
> >>>>> +                const struct sbrec_port_binding *sb = op->sb;
> >>>>> +                if (sset_contains(&nd->svc_monitor_lsps,
> >>> new_nbsp->name)) {
> >>>>> +                    /* This port is used for svc monitor, which may
> > be
> >>> impacted
> >>>>> +                     * by this change. Fallback to recompute. */
> >>>>> +                    goto fail;
> >>>>> +                }
> >>>>> +                ovn_port_destroy(&nd->ls_ports, op);
> >>>>> +                op = ls_port_create(ovnsb_idl_txn, &nd->ls_ports,
> >>>>> +                                    new_nbsp->name, new_nbsp, od,
sb,
> >>>>> +                                    ni->sbrec_mirror_table,
> >>>>> +                                    ni->sbrec_chassis_table,
> >>>>> +                                    ni->sbrec_chassis_by_name,
> >>>>> +                                    ni->sbrec_chassis_by_hostname);
> >>>>> +                if (!op) {
> >>>>> +                    goto fail;
> >>>>> +                }
> >>>>> +                ovs_list_push_back(&ls_change->updated_ports,
> >>> &op->list);
> >>>>> +            }
> >>>>> +            op->visited = true;
> >>>>> +        }
> >>>>> +
> >>>>> +        /* Check for deleted ports */
> >>>>> +        HMAP_FOR_EACH_SAFE (op, dp_node, &od->ports) {
> >>>>> +            if (!op->visited) {
> >>>>> +                if (!op->lsp_can_be_inc_processed) {
> >>>>> +                    goto fail;
> >>>>> +                }
> >>>>> +                if (sset_contains(&nd->svc_monitor_lsps, op->key))
{
> >>>>> +                    /* This port was used for svc monitor, which
may
> > be
> >>>>> +                     * impacted by this deletion. Fallback to
> >>> recompute. */
> >>>>> +                    goto fail;
> >>>>> +                }
> >>>>> +                ovs_list_push_back(&ls_change->deleted_ports,
> >>>>> +                                   &op->list);
> >>>>> +                hmap_remove(&nd->ls_ports, &op->key_node);
> >>>>> +                hmap_remove(&od->ports, &op->dp_node);
> >>>>> +                sbrec_port_binding_delete(op->sb);
> >>
> >> Quoting from ovsdb-idl.c:
> >>
> >> /* Deletes 'row_' from its table.  May free 'row_', so it must not be
> >>  * accessed afterward.
> >>
> >> So, in theory, we shouldn't call delete here because we might end up
> >> failing for other ports/datapath and falling back to recompute.
> >>
> >> As far as I can tell, only newly inserted rows may be deleted by the
> >> current IDL implementation but I don't think it's safe to assume this
> >> will never change in the future.
> >
> > In general, when falling back to recompute in the middle of I-P, the
> > recompute function will do a full sync between desired states and
existing
> > states. Now the existing states are partially synced during the I-P
before
> > recompute, the recompute function will just see the synced part as
"already
> > in sync" and only apply for the rest of the changes. I think this
mechanism
> > applies for add/update/delete the same way. Here for delete, if the
record
> > is deleted from IDL, it won't be found by the recompute function, so
there
> > is no way to access it again. Did I miss something?
> >
>
> You're right, the ports we delete we first remove from the
> datapath->ports hmap.  On the other hand, I think I spotted another
> leak.
>
> Deleted ports get pushed back to &ls_change->deleted_ports:
>
>         /* Check for deleted ports */
>         HMAP_FOR_EACH_SAFE (op, dp_node, &od->ports) {
>             if (!op->visited) {
>                 if (!op->lsp_can_be_inc_processed) {
>                     goto fail;
>                 }
>                 if (sset_contains(&nd->svc_monitor_lsps, op->key)) {
>                     /* This port was used for svc monitor, which may be
>                      * impacted by this deletion. Fallback to recompute.
*/
>                     goto fail;
>                 }
>                 ovs_list_push_back(&ls_change->deleted_ports,
>                                    &op->list);
>                 hmap_remove(&nd->ls_ports, &op->key_node);
>                 hmap_remove(&od->ports, &op->dp_node);
>                 sbrec_port_binding_delete(op->sb);
>                 delete_fdb_entry(ni->sbrec_fdb_by_dp_and_port,
od->tunnel_key,
>                                  op->tunnel_key);
>             }
>         }
>
> [...]
>
> fail:
>     free(ls_change);
>
> If we deleted some ports and then got out of the loop early then here we
> leak the ports stored in ls_change->deleted_ports.  Or am I missing
> something?

The next line under fail:, after free(ls_change), is:
    destroy_northd_data_tracked_changes(nd);

The deleted ports will be destroyed here. Does this address your concern?

Thanks,
Han

>
> >>
> >> We should probably accumulate all records that need to be deleted and
> >> delete them at the end of the whole I-P run.  Do we need some I-P infra
> >> changes?  Maybe users can provide a callback that can be called
whenever
> >> an I-P run has finished?
> >>
> > I thought about this approach, but not for the reason above. I would do
the
> > same for all DB changes to avoid directly updating DBs in the I-P
engine.
> > However, the change is going to be quite big, considering how the
current
> > code base depends on the IDL. Changes to IDL are all over the place. In
> > addition to the potential memory and CPU cost to maintain the delta,
there
> > may be other tricky things to handle, such as:
> > - Sometimes it is problematic to see the old data (instead of the
updated
> > ones) in the IDL in the same round of I-P. (this has happened in
> > ovn-controller due to out-of-date IDL index, which was fixed
accordingly)
> > - Merge accumulated changes to the same DB record.
> >
>
> This is inline with an older proposal Mark had about abstracting out
> the IDL, I think.  I agree it's a huge thing to do.  On the other hand,
> the current IDL interactions that sometimes happen during I-P handlers
> and sometimes during recomputes worry me a bit too.  It's hard to ensure
> that we don't introduce tricky bugs.
>
> Regards,
> Dumitru
>
> > So, I think it is better to avoid such a change unless it is really
> > necessary.
> >
> > Thanks,
> > Han
> >
> >> What do you think?
> >>
> >> Thanks,
> >> Dumitru
> >>
> >
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to