On Wed, Jun 28, 2023 at 1:17 PM Han Zhou <[email protected]> wrote:
>
>
>
> On Wed, Jun 28, 2023 at 12:07 PM Dumitru Ceara <[email protected]> wrote:
> >
> > On 6/28/23 19:28, Han Zhou wrote:
> > > On Wed, Jun 28, 2023 at 2:14 AM Dumitru Ceara <[email protected]>
wrote:
> > >>
> > >> On 6/27/23 19:47, Han Zhou wrote:
> > >>> On Tue, Jun 27, 2023 at 6:51 AM Dumitru Ceara <[email protected]>
wrote:
> > >>>>
> > >>>> On 6/8/23 07:55, Han Zhou wrote:
> > >>>>> On Wed, Jun 7, 2023 at 12:39 PM Numan Siddique <[email protected]>
wrote:
> > >>>>>>
> > >>>>>> On Fri, Jun 2, 2023 at 12:13 AM Han Zhou <[email protected]> wrote:
> > >>>>>>>
> > >>>>>>> This patch introduces a change handler for NB logical_switch
within
> > >>> the
> > >>>>>>> 'northd' node. It specifically handles cases where logical
switch
> > >>> ports
> > >>>>>>> in the tracked logical switches are changed
(added/updated/deleted).
> > >>>>>>> Only regular logical switch ports - which are common for
VMs/Pods -
> > >>> are
> > >>>>>>> addressed. For other scenarios, it reverts to recompute.
> > >>>>>>>
> > >>>>>>> This update avoids recompute in the northd node (especially the
> > >>>>>>> resource-intensive build_ports()) for NB changes of VIF
> > >>>>>>> add/update/delete.  However, it does not eliminate the need for
> > > lflow
> > >>>>>>> recompute.
> > >>>>>>>
> > >>>>>>> Below are the performance test results simulating an ovn-k8s
> > > topology
> > >>> of
> > >>>>>>> 500 nodes x 50 lsp per node:
> > >>>>>>>
> > >>>>>>> Before:
> > >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-del lsp_1_0001_01
> > >>>>>>>     ovn-northd completion:          955ms
> > >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001
> > > lsp_1_0001_01
> > >>>>> -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
--
> > >>>>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
> > >>>>>>>     ovn-northd completion:          919ms
> > >>>>>>>
> > >>>>>>> After:
> > >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-del lsp_1_0001_01
> > >>>>>>>     ovn-northd completion:          776ms
> > >>>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001
> > > lsp_1_0001_01
> > >>>>> -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
--
> > >>>>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
> > >>>>>>>     ovn-northd completion:          773ms
> > >>>>>>>
> > >>>>>>> Both addition and deletion show ~20% reduction of ovn-northd
> > >>> completion
> > >>>>>>> time.  Note: the test uses only 1 thread of ovn-northd for flow
> > >>>>>>> recompute. Using multithread should show a larger percentage of
> > >>>>>>> improvement.
> > >>>>>>>
> > >>>>>>> Signed-off-by: Han Zhou <[email protected]>
> > >>>>>>> Reviewed-by: Ales Musil <[email protected]>
> > >>>>>>
> > >>>>>> Few nits below.
> > >>>>>>
> > >>>>>> Acked-by: Numan Siddique <[email protected]>
> > >>>>>>
> > >>>>>> Numan
> > >>>>>>
> > >>>>
> > >>>> Hi Han, Numan,
> > >>>>
> > >>>> First of all, sorry for the late review, I had these patches on my
list
> > >>>> for a while and only got to them after they were merged.
> > >>>>
> > >>>
> > >>> Hi Dumitru, thanks for your diligent review and comments are always
> > > welcome
> > >>> any time.
> > >>>
> > >>>>>>
> > >>>>>>> ---
> > >>>>>>>  lib/ovn-util.c           |  15 ++
> > >>>>>>>  lib/ovn-util.h           |   1 +
> > >>>>>>>  northd/en-northd.c       | 130 ++++++---
> > >>>>>>>  northd/en-northd.h       |   2 +
> > >>>>>>>  northd/inc-proc-northd.c |  12 +-
> > >>>>>>>  northd/northd.c          | 567
> > >>> +++++++++++++++++++++++++++++++++------
> > >>>>>>>  northd/northd.h          |  27 +-
> > >>>>>>>  tests/ovn-northd.at      |  58 ++++
> > >>>>>>>  8 files changed, 685 insertions(+), 127 deletions(-)
> > >>>>>>>
> > >>>>
> > >>>> [...]
> > >>>>
> > >>>>>>> +bool
> > >>>>>>> +northd_handle_ls_changes(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > >>>>>>> +                         const struct northd_input *ni,
> > >>>>>>> +                         struct northd_data *nd)
> > >>>>>>> +{
> > >>>>>>> +    const struct nbrec_logical_switch *changed_ls;
> > >>>>>>> +    struct ls_change *ls_change = NULL;
> > >>>>>>> +
> > >>>>>>> +    NBREC_LOGICAL_SWITCH_TABLE_FOR_EACH_TRACKED (changed_ls,
> > >>>>>>> +
> > >>>>> ni->nbrec_logical_switch_table) {
> > >>>>>>> +        ls_change = NULL;
> > >>>>>>> +        if (nbrec_logical_switch_is_new(changed_ls) ||
> > >>>>>>> +            nbrec_logical_switch_is_deleted(changed_ls)) {
> > >>>>>>> +            goto fail;
> > >>>>>>> +        }
> > >>>>>>> +        struct ovn_datapath *od = ovn_datapath_find(
> > >>>>>>> +
 &nd->ls_datapaths.datapaths,
> > >>>>>>> +                                    &changed_ls->header_.uuid);
> > >>>>>>> +        if (!od) {
> > >>>>>>> +            static struct vlog_rate_limit rl =
> > >>> VLOG_RATE_LIMIT_INIT(1,
> > >>>>> 1);
> > >>>>>>> +            VLOG_WARN_RL(&rl, "Internal error: a tracked
updated LS
> > >>>>> doesn't "
> > >>>>>>> +                         "exist in ls_datapaths: "UUID_FMT,
> > >>>>>>> +                         UUID_ARGS(&changed_ls->header_.uuid));
> > >>>>>>> +            goto fail;
> > >>>>>>> +        }
> > >>>>>>> +
> > >>>>>>> +        /* Now only able to handle lsp changes. */
> > >>>>>>> +        if (check_ls_changes_other_than_lsp(changed_ls)) {
> > >>>>>>> +            goto fail;
> > >>>>>>> +        }
> > >>>>>>> +
> > >>>>>>> +        ls_change = xzalloc(sizeof *ls_change);
> > >>>>>>> +        ls_change->od = od;
> > >>>>>>> +        ovs_list_init(&ls_change->added_ports);
> > >>>>>>> +        ovs_list_init(&ls_change->deleted_ports);
> > >>>>>>> +        ovs_list_init(&ls_change->updated_ports);
> > >>>>>>> +
> > >>>>>>> +        struct ovn_port *op;
> > >>>>>>> +        HMAP_FOR_EACH (op, dp_node, &od->ports) {
> > >>>>>>> +            op->visited = false;
> > >>>>>>> +        }
> > >>>>>>> +
> > >>>>>>> +        /* Compare the individual ports in the old and new
Logical
> > >>>>> Switches */
> > >>>>>>> +        for (size_t j = 0; j < changed_ls->n_ports; ++j) {
> > >>>>>>> +            struct nbrec_logical_switch_port *new_nbsp =
> > >>>>> changed_ls->ports[j];
> > >>>>>>> +            op = ovn_port_find_in_datapath(od, new_nbsp->name);
> > >>>>>>> +
> > >>>>>>> +            if (!op) {
> > >>>>>>> +                if (!lsp_can_be_inc_processed(new_nbsp)) {
> > >>>>>>> +                    goto fail;
> > >>>>>>> +                }
> > >>>>>>> +                op = ls_port_create(ovnsb_idl_txn,
&nd->ls_ports,
> > >>>>>>> +                                    new_nbsp->name, new_nbsp,
od,
> > >>> NULL,
> > >>>>>>> +                                    ni->sbrec_mirror_table,
> > >>>>>>> +                                    ni->sbrec_chassis_table,
> > >>>>>>> +                                    ni->sbrec_chassis_by_name,
> > >>>>>>> +
 ni->sbrec_chassis_by_hostname);
> > >>>>>>> +                if (!op) {
> > >>>>>>> +                    goto fail;
> > >>>>>>> +                }
> > >>>>>>> +                ovs_list_push_back(&ls_change->added_ports,
> > >>>>>>> +                                   &op->list);
> > >>>>>>> +            } else if (ls_port_has_changed(op->nbsp,
new_nbsp)) {
> > >>>>>>> +                /* Existing port updated */
> > >>>>>>> +                bool temp = false;
> > >>>>>>> +                if (lsp_is_type_changed(op->sb, new_nbsp,
&temp) ||
> > >>>>>>> +                    !op->lsp_can_be_inc_processed ||
> > >>>>>>> +                    !lsp_can_be_inc_processed(new_nbsp)) {
> > >>>>>>> +                    goto fail;
> > >>>>>>> +                }
> > >>>>>>> +                const struct sbrec_port_binding *sb = op->sb;
> > >>>>>>> +                if (sset_contains(&nd->svc_monitor_lsps,
> > >>>>> new_nbsp->name)) {
> > >>>>>>> +                    /* This port is used for svc monitor,
which may
> > >>> be
> > >>>>> impacted
> > >>>>>>> +                     * by this change. Fallback to recompute.
*/
> > >>>>>>> +                    goto fail;
> > >>>>>>> +                }
> > >>>>>>> +                ovn_port_destroy(&nd->ls_ports, op);
> > >>>>>>> +                op = ls_port_create(ovnsb_idl_txn,
&nd->ls_ports,
> > >>>>>>> +                                    new_nbsp->name, new_nbsp,
od,
> > > sb,
> > >>>>>>> +                                    ni->sbrec_mirror_table,
> > >>>>>>> +                                    ni->sbrec_chassis_table,
> > >>>>>>> +                                    ni->sbrec_chassis_by_name,
> > >>>>>>> +
 ni->sbrec_chassis_by_hostname);
> > >>>>>>> +                if (!op) {
> > >>>>>>> +                    goto fail;
> > >>>>>>> +                }
> > >>>>>>> +                ovs_list_push_back(&ls_change->updated_ports,
> > >>>>> &op->list);
> > >>>>>>> +            }
> > >>>>>>> +            op->visited = true;
> > >>>>>>> +        }
> > >>>>>>> +
> > >>>>>>> +        /* Check for deleted ports */
> > >>>>>>> +        HMAP_FOR_EACH_SAFE (op, dp_node, &od->ports) {
> > >>>>>>> +            if (!op->visited) {
> > >>>>>>> +                if (!op->lsp_can_be_inc_processed) {
> > >>>>>>> +                    goto fail;
> > >>>>>>> +                }
> > >>>>>>> +                if (sset_contains(&nd->svc_monitor_lsps,
op->key))
> > > {
> > >>>>>>> +                    /* This port was used for svc monitor,
which
> > > may
> > >>> be
> > >>>>>>> +                     * impacted by this deletion. Fallback to
> > >>>>> recompute. */
> > >>>>>>> +                    goto fail;
> > >>>>>>> +                }
> > >>>>>>> +                ovs_list_push_back(&ls_change->deleted_ports,
> > >>>>>>> +                                   &op->list);
> > >>>>>>> +                hmap_remove(&nd->ls_ports, &op->key_node);
> > >>>>>>> +                hmap_remove(&od->ports, &op->dp_node);
> > >>>>>>> +                sbrec_port_binding_delete(op->sb);
> > >>>>
> > >>>> Quoting from ovsdb-idl.c:
> > >>>>
> > >>>> /* Deletes 'row_' from its table.  May free 'row_', so it must not
be
> > >>>>  * accessed afterward.
> > >>>>
> > >>>> So, in theory, we shouldn't call delete here because we might end
up
> > >>>> failing for other ports/datapath and falling back to recompute.
> > >>>>
> > >>>> As far as I can tell, only newly inserted rows may be deleted by
the
> > >>>> current IDL implementation but I don't think it's safe to assume
this
> > >>>> will never change in the future.
> > >>>
> > >>> In general, when falling back to recompute in the middle of I-P, the
> > >>> recompute function will do a full sync between desired states and
> > > existing
> > >>> states. Now the existing states are partially synced during the I-P
> > > before
> > >>> recompute, the recompute function will just see the synced part as
> > > "already
> > >>> in sync" and only apply for the rest of the changes. I think this
> > > mechanism
> > >>> applies for add/update/delete the same way. Here for delete, if the
> > > record
> > >>> is deleted from IDL, it won't be found by the recompute function, so
> > > there
> > >>> is no way to access it again. Did I miss something?
> > >>>
> > >>
> > >> You're right, the ports we delete we first remove from the
> > >> datapath->ports hmap.  On the other hand, I think I spotted another
> > >> leak.
> > >>
> > >> Deleted ports get pushed back to &ls_change->deleted_ports:
> > >>
> > >>         /* Check for deleted ports */
> > >>         HMAP_FOR_EACH_SAFE (op, dp_node, &od->ports) {
> > >>             if (!op->visited) {
> > >>                 if (!op->lsp_can_be_inc_processed) {
> > >>                     goto fail;
> > >>                 }
> > >>                 if (sset_contains(&nd->svc_monitor_lsps, op->key)) {
> > >>                     /* This port was used for svc monitor, which may
be
> > >>                      * impacted by this deletion. Fallback to
recompute.
> > > */
> > >>                     goto fail;
> > >>                 }
> > >>                 ovs_list_push_back(&ls_change->deleted_ports,
> > >>                                    &op->list);
> > >>                 hmap_remove(&nd->ls_ports, &op->key_node);
> > >>                 hmap_remove(&od->ports, &op->dp_node);
> > >>                 sbrec_port_binding_delete(op->sb);
> > >>                 delete_fdb_entry(ni->sbrec_fdb_by_dp_and_port,
> > > od->tunnel_key,
> > >>                                  op->tunnel_key);
> > >>             }
> > >>         }
> > >>
> > >> [...]
> > >>
> > >> fail:
> > >>     free(ls_change);
> > >>
> > >> If we deleted some ports and then got out of the loop early then
here we
> > >> leak the ports stored in ls_change->deleted_ports.  Or am I missing
> > >> something?
> > >
> > > The next line under fail:, after free(ls_change), is:
> > >     destroy_northd_data_tracked_changes(nd);
> > >
> > > The deleted ports will be destroyed here. Does this address your
concern?
> > >
> >
> > Not really, let me highlight the code I mean:
> >
> >
https://github.com/ovn-org/ovn/blob/056f66e1db48c6d098ce3607ff596c1e3ce28dbb/northd/northd.c#L5132-L5171
> >
> > Now, if there are a couple of ports being deleted at the same time and
> > the last one of them is used for svc monitor, then we remove these ports
> > from the od's "ports" map and from "nd->ls_ports" and we add them to the
> > "ls_change->deleted_ports" list (lines L5143-L5146).  These ports are
> > not referenced by anything else anymore except for
> > "ls_change->deleted_ports".  Also, they're not yet added to the northd
> > data "nd".  For the last one we break early because it's used for svc
> > monitoring.
> >
> > We end up here:
> >
> > fail:
> >     free(ls_change);
> >     destroy_northd_data_tracked_changes(nd);
> >     return false;
> >
> > The ports that were in "ls_change->deleted_ports" were not completely
> > destroyed and are not reachable anymore so they're leaked.  Am I maybe
> > missing a different cleanup path?
> >
> You are right!! Sorry for missing your point earlier. I will come up with
a fix and a test case to make sure this scenario is covered.
> Thanks for pointing this out!
> Han
>
Fix posted here:
https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/
PTAL. Thanks again for catching it!

> > Regards,
> > Dumitru
> >
> > > Thanks,
> > > Han
> > >
> > >>
> > >>>>
> > >>>> We should probably accumulate all records that need to be deleted
and
> > >>>> delete them at the end of the whole I-P run.  Do we need some I-P
infra
> > >>>> changes?  Maybe users can provide a callback that can be called
> > > whenever
> > >>>> an I-P run has finished?
> > >>>>
> > >>> I thought about this approach, but not for the reason above. I
would do
> > > the
> > >>> same for all DB changes to avoid directly updating DBs in the I-P
> > > engine.
> > >>> However, the change is going to be quite big, considering how the
> > > current
> > >>> code base depends on the IDL. Changes to IDL are all over the
place. In
> > >>> addition to the potential memory and CPU cost to maintain the delta,
> > > there
> > >>> may be other tricky things to handle, such as:
> > >>> - Sometimes it is problematic to see the old data (instead of the
> > > updated
> > >>> ones) in the IDL in the same round of I-P. (this has happened in
> > >>> ovn-controller due to out-of-date IDL index, which was fixed
> > > accordingly)
> > >>> - Merge accumulated changes to the same DB record.
> > >>>
> > >>
> > >> This is inline with an older proposal Mark had about abstracting out
> > >> the IDL, I think.  I agree it's a huge thing to do.  On the other
hand,
> > >> the current IDL interactions that sometimes happen during I-P
handlers
> > >> and sometimes during recomputes worry me a bit too.  It's hard to
ensure
> > >> that we don't introduce tricky bugs.
> > >>
> > >> Regards,
> > >> Dumitru
> > >>
> > >>> So, I think it is better to avoid such a change unless it is really
> > >>> necessary.
> > >>>
> > >>> Thanks,
> > >>> Han
> > >>>
> > >>>> What do you think?
> > >>>>
> > >>>> Thanks,
> > >>>> Dumitru
> > >>>>
> > >>>
> > >>
> > >
> >
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to