On 6/27/23 19:47, Han Zhou wrote:
> On Tue, Jun 27, 2023 at 6:51 AM Dumitru Ceara <[email protected]> wrote:
>>
>> On 6/8/23 07:55, Han Zhou wrote:
>>> On Wed, Jun 7, 2023 at 12:39 PM Numan Siddique <[email protected]> wrote:
>>>>
>>>> On Fri, Jun 2, 2023 at 12:13 AM Han Zhou <[email protected]> wrote:
>>>>>
>>>>> This patch introduces a change handler for NB logical_switch within
> the
>>>>> 'northd' node. It specifically handles cases where logical switch
> ports
>>>>> in the tracked logical switches are changed (added/updated/deleted).
>>>>> Only regular logical switch ports - which are common for VMs/Pods -
> are
>>>>> addressed. For other scenarios, it reverts to recompute.
>>>>>
>>>>> This update avoids recompute in the northd node (especially the
>>>>> resource-intensive build_ports()) for NB changes of VIF
>>>>> add/update/delete. However, it does not eliminate the need for lflow
>>>>> recompute.
>>>>>
>>>>> Below are the performance test results simulating an ovn-k8s topology
> of
>>>>> 500 nodes x 50 lsp per node:
>>>>>
>>>>> Before:
>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-del lsp_1_0001_01
>>>>> ovn-northd completion: 955ms
>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01
>>> -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" --
>>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
>>>>> ovn-northd completion: 919ms
>>>>>
>>>>> After:
>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-del lsp_1_0001_01
>>>>> ovn-northd completion: 776ms
>>>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01
>>> -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" --
>>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
>>>>> ovn-northd completion: 773ms
>>>>>
>>>>> Both addition and deletion show ~20% reduction of ovn-northd
> completion
>>>>> time. Note: the test uses only 1 thread of ovn-northd for flow
>>>>> recompute. Using multithread should show a larger percentage of
>>>>> improvement.
>>>>>
>>>>> Signed-off-by: Han Zhou <[email protected]>
>>>>> Reviewed-by: Ales Musil <[email protected]>
>>>>
>>>> Few nits below.
>>>>
>>>> Acked-by: Numan Siddique <[email protected]>
>>>>
>>>> Numan
>>>>
>>
>> Hi Han, Numan,
>>
>> First of all, sorry for the late review, I had these patches on my list
>> for a while and only got to them after they were merged.
>>
>
> Hi Dumitru, thanks for your diligent review and comments are always welcome
> any time.
>
>>>>
>>>>> ---
>>>>> lib/ovn-util.c | 15 ++
>>>>> lib/ovn-util.h | 1 +
>>>>> northd/en-northd.c | 130 ++++++---
>>>>> northd/en-northd.h | 2 +
>>>>> northd/inc-proc-northd.c | 12 +-
>>>>> northd/northd.c | 567
> +++++++++++++++++++++++++++++++++------
>>>>> northd/northd.h | 27 +-
>>>>> tests/ovn-northd.at | 58 ++++
>>>>> 8 files changed, 685 insertions(+), 127 deletions(-)
>>>>>
>>
>> [...]
>>
>>>>> +bool
>>>>> +northd_handle_ls_changes(struct ovsdb_idl_txn *ovnsb_idl_txn,
>>>>> + const struct northd_input *ni,
>>>>> + struct northd_data *nd)
>>>>> +{
>>>>> + const struct nbrec_logical_switch *changed_ls;
>>>>> + struct ls_change *ls_change = NULL;
>>>>> +
>>>>> + NBREC_LOGICAL_SWITCH_TABLE_FOR_EACH_TRACKED (changed_ls,
>>>>> +
>>> ni->nbrec_logical_switch_table) {
>>>>> + ls_change = NULL;
>>>>> + if (nbrec_logical_switch_is_new(changed_ls) ||
>>>>> + nbrec_logical_switch_is_deleted(changed_ls)) {
>>>>> + goto fail;
>>>>> + }
>>>>> + struct ovn_datapath *od = ovn_datapath_find(
>>>>> + &nd->ls_datapaths.datapaths,
>>>>> + &changed_ls->header_.uuid);
>>>>> + if (!od) {
>>>>> + static struct vlog_rate_limit rl =
> VLOG_RATE_LIMIT_INIT(1,
>>> 1);
>>>>> + VLOG_WARN_RL(&rl, "Internal error: a tracked updated LS
>>> doesn't "
>>>>> + "exist in ls_datapaths: "UUID_FMT,
>>>>> + UUID_ARGS(&changed_ls->header_.uuid));
>>>>> + goto fail;
>>>>> + }
>>>>> +
>>>>> + /* Now only able to handle lsp changes. */
>>>>> + if (check_ls_changes_other_than_lsp(changed_ls)) {
>>>>> + goto fail;
>>>>> + }
>>>>> +
>>>>> + ls_change = xzalloc(sizeof *ls_change);
>>>>> + ls_change->od = od;
>>>>> + ovs_list_init(&ls_change->added_ports);
>>>>> + ovs_list_init(&ls_change->deleted_ports);
>>>>> + ovs_list_init(&ls_change->updated_ports);
>>>>> +
>>>>> + struct ovn_port *op;
>>>>> + HMAP_FOR_EACH (op, dp_node, &od->ports) {
>>>>> + op->visited = false;
>>>>> + }
>>>>> +
>>>>> + /* Compare the individual ports in the old and new Logical
>>> Switches */
>>>>> + for (size_t j = 0; j < changed_ls->n_ports; ++j) {
>>>>> + struct nbrec_logical_switch_port *new_nbsp =
>>> changed_ls->ports[j];
>>>>> + op = ovn_port_find_in_datapath(od, new_nbsp->name);
>>>>> +
>>>>> + if (!op) {
>>>>> + if (!lsp_can_be_inc_processed(new_nbsp)) {
>>>>> + goto fail;
>>>>> + }
>>>>> + op = ls_port_create(ovnsb_idl_txn, &nd->ls_ports,
>>>>> + new_nbsp->name, new_nbsp, od,
> NULL,
>>>>> + ni->sbrec_mirror_table,
>>>>> + ni->sbrec_chassis_table,
>>>>> + ni->sbrec_chassis_by_name,
>>>>> + ni->sbrec_chassis_by_hostname);
>>>>> + if (!op) {
>>>>> + goto fail;
>>>>> + }
>>>>> + ovs_list_push_back(&ls_change->added_ports,
>>>>> + &op->list);
>>>>> + } else if (ls_port_has_changed(op->nbsp, new_nbsp)) {
>>>>> + /* Existing port updated */
>>>>> + bool temp = false;
>>>>> + if (lsp_is_type_changed(op->sb, new_nbsp, &temp) ||
>>>>> + !op->lsp_can_be_inc_processed ||
>>>>> + !lsp_can_be_inc_processed(new_nbsp)) {
>>>>> + goto fail;
>>>>> + }
>>>>> + const struct sbrec_port_binding *sb = op->sb;
>>>>> + if (sset_contains(&nd->svc_monitor_lsps,
>>> new_nbsp->name)) {
>>>>> + /* This port is used for svc monitor, which may
> be
>>> impacted
>>>>> + * by this change. Fallback to recompute. */
>>>>> + goto fail;
>>>>> + }
>>>>> + ovn_port_destroy(&nd->ls_ports, op);
>>>>> + op = ls_port_create(ovnsb_idl_txn, &nd->ls_ports,
>>>>> + new_nbsp->name, new_nbsp, od, sb,
>>>>> + ni->sbrec_mirror_table,
>>>>> + ni->sbrec_chassis_table,
>>>>> + ni->sbrec_chassis_by_name,
>>>>> + ni->sbrec_chassis_by_hostname);
>>>>> + if (!op) {
>>>>> + goto fail;
>>>>> + }
>>>>> + ovs_list_push_back(&ls_change->updated_ports,
>>> &op->list);
>>>>> + }
>>>>> + op->visited = true;
>>>>> + }
>>>>> +
>>>>> + /* Check for deleted ports */
>>>>> + HMAP_FOR_EACH_SAFE (op, dp_node, &od->ports) {
>>>>> + if (!op->visited) {
>>>>> + if (!op->lsp_can_be_inc_processed) {
>>>>> + goto fail;
>>>>> + }
>>>>> + if (sset_contains(&nd->svc_monitor_lsps, op->key)) {
>>>>> + /* This port was used for svc monitor, which may
> be
>>>>> + * impacted by this deletion. Fallback to
>>> recompute. */
>>>>> + goto fail;
>>>>> + }
>>>>> + ovs_list_push_back(&ls_change->deleted_ports,
>>>>> + &op->list);
>>>>> + hmap_remove(&nd->ls_ports, &op->key_node);
>>>>> + hmap_remove(&od->ports, &op->dp_node);
>>>>> + sbrec_port_binding_delete(op->sb);
>>
>> Quoting from ovsdb-idl.c:
>>
>> /* Deletes 'row_' from its table. May free 'row_', so it must not be
>> * accessed afterward.
>>
>> So, in theory, we shouldn't call delete here because we might end up
>> failing for other ports/datapath and falling back to recompute.
>>
>> As far as I can tell, only newly inserted rows may be deleted by the
>> current IDL implementation but I don't think it's safe to assume this
>> will never change in the future.
>
> In general, when falling back to recompute in the middle of I-P, the
> recompute function will do a full sync between desired states and existing
> states. Now the existing states are partially synced during the I-P before
> recompute, the recompute function will just see the synced part as "already
> in sync" and only apply for the rest of the changes. I think this mechanism
> applies for add/update/delete the same way. Here for delete, if the record
> is deleted from IDL, it won't be found by the recompute function, so there
> is no way to access it again. Did I miss something?
>
You're right, the ports we delete we first remove from the
datapath->ports hmap. On the other hand, I think I spotted another
leak.
Deleted ports get pushed back to &ls_change->deleted_ports:
/* Check for deleted ports */
HMAP_FOR_EACH_SAFE (op, dp_node, &od->ports) {
if (!op->visited) {
if (!op->lsp_can_be_inc_processed) {
goto fail;
}
if (sset_contains(&nd->svc_monitor_lsps, op->key)) {
/* This port was used for svc monitor, which may be
* impacted by this deletion. Fallback to recompute. */
goto fail;
}
ovs_list_push_back(&ls_change->deleted_ports,
&op->list);
hmap_remove(&nd->ls_ports, &op->key_node);
hmap_remove(&od->ports, &op->dp_node);
sbrec_port_binding_delete(op->sb);
delete_fdb_entry(ni->sbrec_fdb_by_dp_and_port, od->tunnel_key,
op->tunnel_key);
}
}
[...]
fail:
free(ls_change);
If we deleted some ports and then got out of the loop early then here we
leak the ports stored in ls_change->deleted_ports. Or am I missing
something?
>>
>> We should probably accumulate all records that need to be deleted and
>> delete them at the end of the whole I-P run. Do we need some I-P infra
>> changes? Maybe users can provide a callback that can be called whenever
>> an I-P run has finished?
>>
> I thought about this approach, but not for the reason above. I would do the
> same for all DB changes to avoid directly updating DBs in the I-P engine.
> However, the change is going to be quite big, considering how the current
> code base depends on the IDL. Changes to IDL are all over the place. In
> addition to the potential memory and CPU cost to maintain the delta, there
> may be other tricky things to handle, such as:
> - Sometimes it is problematic to see the old data (instead of the updated
> ones) in the IDL in the same round of I-P. (this has happened in
> ovn-controller due to out-of-date IDL index, which was fixed accordingly)
> - Merge accumulated changes to the same DB record.
>
This is inline with an older proposal Mark had about abstracting out
the IDL, I think. I agree it's a huge thing to do. On the other hand,
the current IDL interactions that sometimes happen during I-P handlers
and sometimes during recomputes worry me a bit too. It's hard to ensure
that we don't introduce tricky bugs.
Regards,
Dumitru
> So, I think it is better to avoid such a change unless it is really
> necessary.
>
> Thanks,
> Han
>
>> What do you think?
>>
>> Thanks,
>> Dumitru
>>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev