On 6/27/23 19:47, Han Zhou wrote:
> On Tue, Jun 27, 2023 at 6:51 AM Dumitru Ceara <[email protected]> wrote:
>>
>> On 6/8/23 07:55, Han Zhou wrote:
>>> On Wed, Jun 7, 2023 at 12:39 PM Numan Siddique <[email protected]> wrote:
>>>>
>>>> On Fri, Jun 2, 2023 at 12:13 AM Han Zhou <[email protected]> wrote:
>>>>>
>>>>> This patch introduces a change handler for NB logical_switch within
> the
>>>>> 'northd' node. It specifically handles cases where logical switch
> ports
>>>>> in the tracked logical switches are changed (added/updated/deleted).
>>>>> Only regular logical switch ports - which are common for VMs/Pods -
> are
>>>>> addressed. For other scenarios, it reverts to recompute.
>>>>>
>>>>> This update avoids recompute in the northd node (especially the
>>>>> resource-intensive build_ports()) for NB changes of VIF
>>>>> add/update/delete.  However, it does not eliminate the need for lflow
>>>>> recompute.
>>>>>
>>>>> Below are the performance test results simulating an ovn-k8s topology
> of
>>>>> 500 nodes x 50 lsp per node:
>>>>>
>>>>> Before:
>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-del lsp_1_0001_01
>>>>>     ovn-northd completion:          955ms
>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01
>>> -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" --
>>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
>>>>>     ovn-northd completion:          919ms
>>>>>
>>>>> After:
>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-del lsp_1_0001_01
>>>>>     ovn-northd completion:          776ms
>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01
>>> -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" --
>>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
>>>>>     ovn-northd completion:          773ms
>>>>>
>>>>> Both addition and deletion show ~20% reduction of ovn-northd
> completion
>>>>> time.  Note: the test uses only 1 thread of ovn-northd for flow
>>>>> recompute. Using multithread should show a larger percentage of
>>>>> improvement.
>>>>>
>>>>> Signed-off-by: Han Zhou <[email protected]>
>>>>> Reviewed-by: Ales Musil <[email protected]>
>>>>
>>>> Few nits below.
>>>>
>>>> Acked-by: Numan Siddique <[email protected]>
>>>>
>>>> Numan
>>>>
>>
>> Hi Han, Numan,
>>
>> First of all, sorry for the late review, I had these patches on my list
>> for a while and only got to them after they were merged.
>>
> 
> Hi Dumitru, thanks for your diligent review and comments are always welcome
> any time.
> 
>>>>
>>>>> ---
>>>>>  lib/ovn-util.c           |  15 ++
>>>>>  lib/ovn-util.h           |   1 +
>>>>>  northd/en-northd.c       | 130 ++++++---
>>>>>  northd/en-northd.h       |   2 +
>>>>>  northd/inc-proc-northd.c |  12 +-
>>>>>  northd/northd.c          | 567
> +++++++++++++++++++++++++++++++++------
>>>>>  northd/northd.h          |  27 +-
>>>>>  tests/ovn-northd.at      |  58 ++++
>>>>>  8 files changed, 685 insertions(+), 127 deletions(-)
>>>>>
>>
>> [...]
>>
>>>>> +bool
>>>>> +northd_handle_ls_changes(struct ovsdb_idl_txn *ovnsb_idl_txn,
>>>>> +                         const struct northd_input *ni,
>>>>> +                         struct northd_data *nd)
>>>>> +{
>>>>> +    const struct nbrec_logical_switch *changed_ls;
>>>>> +    struct ls_change *ls_change = NULL;
>>>>> +
>>>>> +    NBREC_LOGICAL_SWITCH_TABLE_FOR_EACH_TRACKED (changed_ls,
>>>>> +
>>> ni->nbrec_logical_switch_table) {
>>>>> +        ls_change = NULL;
>>>>> +        if (nbrec_logical_switch_is_new(changed_ls) ||
>>>>> +            nbrec_logical_switch_is_deleted(changed_ls)) {
>>>>> +            goto fail;
>>>>> +        }
>>>>> +        struct ovn_datapath *od = ovn_datapath_find(
>>>>> +                                    &nd->ls_datapaths.datapaths,
>>>>> +                                    &changed_ls->header_.uuid);
>>>>> +        if (!od) {
>>>>> +            static struct vlog_rate_limit rl =
> VLOG_RATE_LIMIT_INIT(1,
>>> 1);
>>>>> +            VLOG_WARN_RL(&rl, "Internal error: a tracked updated LS
>>> doesn't "
>>>>> +                         "exist in ls_datapaths: "UUID_FMT,
>>>>> +                         UUID_ARGS(&changed_ls->header_.uuid));
>>>>> +            goto fail;
>>>>> +        }
>>>>> +
>>>>> +        /* Now only able to handle lsp changes. */
>>>>> +        if (check_ls_changes_other_than_lsp(changed_ls)) {
>>>>> +            goto fail;
>>>>> +        }
>>>>> +
>>>>> +        ls_change = xzalloc(sizeof *ls_change);
>>>>> +        ls_change->od = od;
>>>>> +        ovs_list_init(&ls_change->added_ports);
>>>>> +        ovs_list_init(&ls_change->deleted_ports);
>>>>> +        ovs_list_init(&ls_change->updated_ports);
>>>>> +
>>>>> +        struct ovn_port *op;
>>>>> +        HMAP_FOR_EACH (op, dp_node, &od->ports) {
>>>>> +            op->visited = false;
>>>>> +        }
>>>>> +
>>>>> +        /* Compare the individual ports in the old and new Logical
>>> Switches */
>>>>> +        for (size_t j = 0; j < changed_ls->n_ports; ++j) {
>>>>> +            struct nbrec_logical_switch_port *new_nbsp =
>>> changed_ls->ports[j];
>>>>> +            op = ovn_port_find_in_datapath(od, new_nbsp->name);
>>>>> +
>>>>> +            if (!op) {
>>>>> +                if (!lsp_can_be_inc_processed(new_nbsp)) {
>>>>> +                    goto fail;
>>>>> +                }
>>>>> +                op = ls_port_create(ovnsb_idl_txn, &nd->ls_ports,
>>>>> +                                    new_nbsp->name, new_nbsp, od,
> NULL,
>>>>> +                                    ni->sbrec_mirror_table,
>>>>> +                                    ni->sbrec_chassis_table,
>>>>> +                                    ni->sbrec_chassis_by_name,
>>>>> +                                    ni->sbrec_chassis_by_hostname);
>>>>> +                if (!op) {
>>>>> +                    goto fail;
>>>>> +                }
>>>>> +                ovs_list_push_back(&ls_change->added_ports,
>>>>> +                                   &op->list);
>>>>> +            } else if (ls_port_has_changed(op->nbsp, new_nbsp)) {
>>>>> +                /* Existing port updated */
>>>>> +                bool temp = false;
>>>>> +                if (lsp_is_type_changed(op->sb, new_nbsp, &temp) ||
>>>>> +                    !op->lsp_can_be_inc_processed ||
>>>>> +                    !lsp_can_be_inc_processed(new_nbsp)) {
>>>>> +                    goto fail;
>>>>> +                }
>>>>> +                const struct sbrec_port_binding *sb = op->sb;
>>>>> +                if (sset_contains(&nd->svc_monitor_lsps,
>>> new_nbsp->name)) {
>>>>> +                    /* This port is used for svc monitor, which may
> be
>>> impacted
>>>>> +                     * by this change. Fallback to recompute. */
>>>>> +                    goto fail;
>>>>> +                }
>>>>> +                ovn_port_destroy(&nd->ls_ports, op);
>>>>> +                op = ls_port_create(ovnsb_idl_txn, &nd->ls_ports,
>>>>> +                                    new_nbsp->name, new_nbsp, od, sb,
>>>>> +                                    ni->sbrec_mirror_table,
>>>>> +                                    ni->sbrec_chassis_table,
>>>>> +                                    ni->sbrec_chassis_by_name,
>>>>> +                                    ni->sbrec_chassis_by_hostname);
>>>>> +                if (!op) {
>>>>> +                    goto fail;
>>>>> +                }
>>>>> +                ovs_list_push_back(&ls_change->updated_ports,
>>> &op->list);
>>>>> +            }
>>>>> +            op->visited = true;
>>>>> +        }
>>>>> +
>>>>> +        /* Check for deleted ports */
>>>>> +        HMAP_FOR_EACH_SAFE (op, dp_node, &od->ports) {
>>>>> +            if (!op->visited) {
>>>>> +                if (!op->lsp_can_be_inc_processed) {
>>>>> +                    goto fail;
>>>>> +                }
>>>>> +                if (sset_contains(&nd->svc_monitor_lsps, op->key)) {
>>>>> +                    /* This port was used for svc monitor, which may
> be
>>>>> +                     * impacted by this deletion. Fallback to
>>> recompute. */
>>>>> +                    goto fail;
>>>>> +                }
>>>>> +                ovs_list_push_back(&ls_change->deleted_ports,
>>>>> +                                   &op->list);
>>>>> +                hmap_remove(&nd->ls_ports, &op->key_node);
>>>>> +                hmap_remove(&od->ports, &op->dp_node);
>>>>> +                sbrec_port_binding_delete(op->sb);
>>
>> Quoting from ovsdb-idl.c:
>>
>> /* Deletes 'row_' from its table.  May free 'row_', so it must not be
>>  * accessed afterward.
>>
>> So, in theory, we shouldn't call delete here because we might end up
>> failing for other ports/datapath and falling back to recompute.
>>
>> As far as I can tell, only newly inserted rows may be deleted by the
>> current IDL implementation but I don't think it's safe to assume this
>> will never change in the future.
> 
> In general, when falling back to recompute in the middle of I-P, the
> recompute function will do a full sync between desired states and existing
> states. Now the existing states are partially synced during the I-P before
> recompute, the recompute function will just see the synced part as "already
> in sync" and only apply for the rest of the changes. I think this mechanism
> applies for add/update/delete the same way. Here for delete, if the record
> is deleted from IDL, it won't be found by the recompute function, so there
> is no way to access it again. Did I miss something?
> 

You're right, the ports we delete we first remove from the
datapath->ports hmap.  On the other hand, I think I spotted another
leak.

Deleted ports get pushed back to &ls_change->deleted_ports:

        /* Check for deleted ports */
        HMAP_FOR_EACH_SAFE (op, dp_node, &od->ports) {
            if (!op->visited) {
                if (!op->lsp_can_be_inc_processed) {
                    goto fail;
                }
                if (sset_contains(&nd->svc_monitor_lsps, op->key)) {
                    /* This port was used for svc monitor, which may be
                     * impacted by this deletion. Fallback to recompute. */
                    goto fail;
                }
                ovs_list_push_back(&ls_change->deleted_ports,
                                   &op->list);
                hmap_remove(&nd->ls_ports, &op->key_node);
                hmap_remove(&od->ports, &op->dp_node);
                sbrec_port_binding_delete(op->sb);
                delete_fdb_entry(ni->sbrec_fdb_by_dp_and_port, od->tunnel_key,
                                 op->tunnel_key);
            }
        }

[...]

fail:
    free(ls_change);

If we deleted some ports and then got out of the loop early then here we
leak the ports stored in ls_change->deleted_ports.  Or am I missing
something?

>>
>> We should probably accumulate all records that need to be deleted and
>> delete them at the end of the whole I-P run.  Do we need some I-P infra
>> changes?  Maybe users can provide a callback that can be called whenever
>> an I-P run has finished?
>>
> I thought about this approach, but not for the reason above. I would do the
> same for all DB changes to avoid directly updating DBs in the I-P engine.
> However, the change is going to be quite big, considering how the current
> code base depends on the IDL. Changes to IDL are all over the place. In
> addition to the potential memory and CPU cost to maintain the delta, there
> may be other tricky things to handle, such as:
> - Sometimes it is problematic to see the old data (instead of the updated
> ones) in the IDL in the same round of I-P. (this has happened in
> ovn-controller due to out-of-date IDL index, which was fixed accordingly)
> - Merge accumulated changes to the same DB record.
> 

This is inline with an older proposal Mark had about abstracting out
the IDL, I think.  I agree it's a huge thing to do.  On the other hand,
the current IDL interactions that sometimes happen during I-P handlers
and sometimes during recomputes worry me a bit too.  It's hard to ensure
that we don't introduce tricky bugs.

Regards,
Dumitru

> So, I think it is better to avoid such a change unless it is really
> necessary.
> 
> Thanks,
> Han
> 
>> What do you think?
>>
>> Thanks,
>> Dumitru
>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to