On 10/3/25 10:06 PM, Mark Michelson via dev wrote: > Thanks Lucas and Numan. > > I merged this to main. > > On Thu, Sep 25, 2025 at 10:25 AM Numan Siddique <[email protected]> wrote: >> >> On Thu, Sep 25, 2025 at 8:16 AM Lucas Vargas Dias via dev >> <[email protected]> wrote: >>> >>> Change the logic to save sbflow uuid and just update if >>> the lflow is reused. Otherwise, it's removed. >>> Also, reduce sbflow searching with uuidset instead of >>> searching through all lflow table. >>> Add lflow states: >>> LFLOW_STALE - Lflow is not relevant >>> LFLOW_TO_SYNC - Lflow needs to be synced with SB DB >>> LFLOW_SYNCED - Lflow is synced with SB SB >>> >>> It generates the following results in a scenario with: >>> - LSPs: 56548 >>> - LRPs: 27304 >>> - LRs: 7922 >>> - LSs: 28602 >>> >>> Without the commit: >>> ovn-nbctl --no-leader-only --print-wait-time --wait=sb lr-add lr9-1 >>> Time spent on processing nb_cfg 275438: >>> ovn-northd delay before processing: 16069ms >>> ovn-northd completion: 32828ms >>> ovn-nbctl --no-leader-only --print-wait-time --wait=sb ls-add bar9-1 >>> Time spent on processing nb_cfg 275439: >>> ovn-northd delay before processing: 15019ms >>> ovn-northd completion: 33207ms >>> >>> ovn-nbctl --no-leader-only --print-wait-time --wait=sb lrp-add lr9-1 >>> rp9-1-bar 00:01:ff:22:00:01 192.168.10.1/24 2801:80:3eaf:4401::1/64 >>> Time spent on processing nb_cfg 275440: >>> ovn-northd delay before processing: 14784ms >>> ovn-northd completion: 33577ms >>> >>> ovn-nbctl --no-leader-only --print-wait-time --wait=sb lsp-add bar9-1 >>> bar-rp9-1 -- set Logical_Switch_Port bar-rp9-1 type=router >>> options:router-port=rp9-1-bar -- lsp-set-addresses bar-rp9-1 router >>> Time spent on processing nb_cfg 275441: >>> ovn-northd delay before processing: 14598ms >>> ovn-northd completion: 31942ms >>> >>> With the commit: >>> ovn-nbctl --no-leader-only --print-wait-time --wait=sb lr-add lr9-1 >>> Time spent on processing nb_cfg 275401: >>> ovn-northd delay before processing: 12602ms >>> ovn-northd completion: 26103ms >>> ovn-nbctl --no-leader-only --print-wait-time --wait=sb ls-add bar9-1 >>> Time spent on processing nb_cfg 275402: >>> ovn-northd delay before processing: 12639ms >>> ovn-northd completion: 26759ms >>> ovn-nbctl --no-leader-only --print-wait-time --wait=sb lrp-add lr9-1 >>> rp9-1-bar 00:01:ff:22:00:01 192.168.10.1/24 2801:80:3eaf:4401::1/64 >>> Time spent on processing nb_cfg 275403: >>> ovn-northd delay before processing: 11874ms >>> ovn-northd completion: 29733ms >>> ovn-nbctl --no-leader-only --print-wait-time --wait=sb lsp-add bar9-1 >>> bar-rp9-1 -- set Logical_Switch_Port bar-rp9-1 type=router >>> options:router-port=rp9-1-bar -- lsp-set-addresses bar-rp9-1 router >>> Time spent on processing nb_cfg 275404: >>> ovn-northd delay before processing: 4058ms >>> ovn-northd completion: 17323ms >>> >>> Signed-off-by: Lucas Vargas Dias <[email protected]> >> >> Thanks for addressing the comments and for the great improvements. >> >> Acked-by: Numan Siddique <[email protected]> >> >> Numan >> >>> --- >>> northd/en-lflow.c | 2 +- >>> northd/lflow-mgr.c | 108 ++++++++++++++++++++++++-------------------- >>> northd/lflow-mgr.h | 10 +++- >>> northd/northd.c | 8 ++-- >>> tests/ovn-northd.at | 32 +++++++++++++ >>> 5 files changed, 106 insertions(+), 54 deletions(-) >>> >>> diff --git a/northd/en-lflow.c b/northd/en-lflow.c >>> index 50570b611..13c5e3119 100644 >>> --- a/northd/en-lflow.c >>> +++ b/northd/en-lflow.c >>> @@ -122,7 +122,7 @@ en_lflow_run(struct engine_node *node, void *data) >>> stopwatch_start(BUILD_LFLOWS_STOPWATCH_NAME, time_msec()); >>> >>> struct lflow_data *lflow_data = data; >>> - lflow_table_clear(lflow_data->lflow_table); >>> + lflow_table_clear(lflow_data->lflow_table, false); >>> lflow_reset_northd_refs(&lflow_input); >>> lflow_ref_clear(lflow_input.igmp_lflow_ref); >>> >>> diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c >>> index 6a66a9718..5a989fe19 100644 >>> --- a/northd/lflow-mgr.c >>> +++ b/northd/lflow-mgr.c >>> @@ -26,6 +26,7 @@ >>> #include "lflow-mgr.h" >>> #include "lib/ovn-parallel-hmap.h" >>> #include "lib/ovn-util.h" >>> +#include "lib/uuidset.h" >>> >>> VLOG_DEFINE_THIS_MODULE(lflow_mgr); >>> >>> @@ -37,7 +38,8 @@ static void ovn_lflow_init(struct ovn_lflow *, struct >>> ovn_datapath *od, >>> uint16_t priority, char *match, >>> char *actions, char *io_port, >>> char *ctrl_meter, char *stage_hint, >>> - const char *where, const char *flow_desc); >>> + const char *where, const char *flow_desc, >>> + struct uuid sbuuid); >>> static struct ovn_lflow *ovn_lflow_find(const struct hmap *lflows, >>> enum ovn_stage stage, >>> uint16_t priority, const char >>> *match, >>> @@ -147,6 +149,13 @@ static struct ovs_mutex >>> lflow_hash_locks[LFLOW_HASH_LOCK_MASK + 1]; >>> */ >>> extern struct ovs_mutex fake_hash_mutex; >>> >>> + >>> +enum ovn_lflow_state { >>> + LFLOW_STALE, >>> + LFLOW_TO_SYNC, >>> + LFLOW_SYNCED, >>> +}; >>> + >>> /* Represents a logical ovn flow (lflow). >>> * >>> * A logical flow with match 'M' and actions 'A' - L(M, A) is created >>> @@ -181,14 +190,7 @@ struct ovn_lflow { >>> struct hmap dp_refcnts_map; /* Maintains the number of times this >>> ovn_lflow >>> * is referenced by a given datapath. >>> * Contains 'struct dp_refcnt' in the map. >>> */ >>> -}; >>> - >>> -/* Logical flow table. */ >>> -struct lflow_table { >>> - struct hmap entries; /* hmap of lflows. */ >>> - struct hmap ls_dp_groups; /* hmap of logical switch dp groups. */ >>> - struct hmap lr_dp_groups; /* hmap of logical router dp groups. */ >>> - ssize_t max_seen_lflow_size; >>> + enum ovn_lflow_state sync_state; >>> }; >>> >>> struct lflow_table * >>> @@ -210,10 +212,14 @@ lflow_table_init(struct lflow_table *lflow_table) >>> } >>> >>> void >>> -lflow_table_clear(struct lflow_table *lflow_table) >>> +lflow_table_clear(struct lflow_table *lflow_table, bool destroy_all) >>> { >>> struct ovn_lflow *lflow; >>> HMAP_FOR_EACH_SAFE (lflow, hmap_node, &lflow_table->entries) { >>> + if (!destroy_all) { >>> + lflow->sync_state = LFLOW_STALE; >>> + continue; >>> + } >>> ovn_lflow_destroy(lflow_table, lflow); >>> } >>> >>> @@ -224,7 +230,7 @@ lflow_table_clear(struct lflow_table *lflow_table) >>> void >>> lflow_table_destroy(struct lflow_table *lflow_table) >>> { >>> - lflow_table_clear(lflow_table); >>> + lflow_table_clear(lflow_table, true); >>> hmap_destroy(&lflow_table->entries); >>> ovn_dp_groups_destroy(&lflow_table->ls_dp_groups); >>> ovn_dp_groups_destroy(&lflow_table->lr_dp_groups); >>> @@ -257,16 +263,42 @@ lflow_table_sync_to_sb(struct lflow_table >>> *lflow_table, >>> const struct sbrec_logical_flow_table >>> *sb_flow_table, >>> const struct sbrec_logical_dp_group_table >>> *dpgrp_table) >>> { >>> + struct uuidset sb_uuid_set = UUIDSET_INITIALIZER(&sb_uuid_set); >>> struct hmap lflows_temp = HMAP_INITIALIZER(&lflows_temp); >>> struct hmap *lflows = &lflow_table->entries; >>> struct ovn_lflow *lflow; >>> + const struct sbrec_logical_flow *sbflow; >>> >>> fast_hmap_size_for(&lflows_temp, >>> lflow_table->max_seen_lflow_size); >>> >>> + HMAP_FOR_EACH_SAFE (lflow, hmap_node, lflows) { >>> + if (lflow->sync_state == LFLOW_STALE) { >>> + ovn_lflow_destroy(lflow_table, lflow); >>> + continue; >>> + } >>> + sbflow = NULL; >>> + if (!uuid_is_zero(&lflow->sb_uuid)) { >>> + sbflow = sbrec_logical_flow_table_get_for_uuid(sb_flow_table, >>> + >>> &lflow->sb_uuid); >>> + } >>> + sync_lflow_to_sb(lflow, ovnsb_txn, lflow_table, ls_datapaths, >>> + lr_datapaths, ovn_internal_version_changed, >>> + sbflow, dpgrp_table); >>> + uuidset_insert(&sb_uuid_set, &lflow->sb_uuid); >>> + hmap_remove(lflows, &lflow->hmap_node); >>> + hmap_insert(&lflows_temp, &lflow->hmap_node, >>> + hmap_node_hash(&lflow->hmap_node)); >>> + }
Hi, Lucas, Numan, Mark, others. This change introduces a significant performance issue when northd is restarted or when one northd goes on standby and the other becomes active. The problem is that the code above is using lflow->sb_uuid that is always zero for logical flows not created by the currently active northd process. So, if the flows existed in the database before the current process became active, it will remove all of these flows and re-create anew. The issue can be easily reproduced in the sanbdbox by adding some resources, then stopping northd and restarting it. Checking the database file we can see all the logical flows re-created. In ovn-heater setup this behavior is causing a significant iteration time spike for up to 140 seconds in 500-node cluster-density test every time SB leader changes and a new northd becomes active. It overwrites about 200 MB of logical flows, making Sb ovsdb-server send 200 x 150-ish (clients) MB of data momentarily consuming 30 GB of RAM. All ovn-controllers also spend 20-30 seconds processing the new update. We still need a way to search for logical flows in the southbound database directly, as not every lflow was created by the current process. We need to find a way to do that on top of this change or revert it. I assume, revert might be easier in the short term as serching Sb is exacly what this change was trying to avoid. Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
