Hi Aleksandra, Ilya, On 3/11/25 12:01 AM, Ilya Maximets wrote: > On 3/7/25 16:51, Aleksandra Rukomoinikova wrote: >> Hi Ilya! >> >> I’m not entirely clear on your point about the impact of this change on >> performance. >> Formally, without my changes, two strings are already added to the hash: >> action and clone, >> which are usually much longer than just the stage name. >> I don’t think the recalculation of flows based on hashes is particularly >> significant. >> Could you explain the impact? Is this somehow related to datapath groups? > > Each lflow is getting hashed and compared against existing ones right after > creation. At least in the past, that was a fairly CPU intensive operation. > Especially because in some cases northd generated a large amount of duplicated > logical flows, so the actual number of comparisons was significantly larger > than the total number of logical flows that ended up in the southbound DB. > > So, even a smallest of changes in a way lflows are hashed and compared would > result in very significant changes in the overall performance of ovn-northd. > > However, over time we optimized different aspects of lflow generation > including > optimized lflow generation for datapath groups in places where northd knows > that all the flows will indeed be grouped. And there are not so many cases > now where duplicates can be generated, so it seems to be less of a concern in > the latest versions of OVN. > >> >> I also thought about an alternative approach where we would know the hashes >> for each >> stage name in advance and simply add this hash instead of computing it from >> the string, >> since it’s a stateful value. However, ovn_stage_to_str doesn’t access the >> database but >> retrieves the string from memory, and this didn’t seem like a significant >> optimization to me. >> >> I conducted some research on the impact of this patch on the performance of >> upstream version. >> In a setup with 6000 logical switches and 15,000 ports, I collected some >> metrics regarding performance. >> >> The average utilization metrics for hash maps during recomputation were: >> • Without the patch: >> hmap_expand: 5928.750/sec (average), 98.8125/sec (overall average) >> >> • With the patch: >> hmap_expand: 5921.117/sec (average), 98.6853/sec (overall average) >> > > This one is not a very useful metric. At least for this measurement. > >> Statistics for lflows_to_sb: >> • With the patch: 154.000000 msec >> • Without the patch: 151.000000 msec >> >> These results seem to fall within the range of normal deviation. > > It seems, you're right. I did a few tests of my own and I also see an > insignificant performance difference in most cases. It is a little surprising > to not see much difference, but maybe it shouldn't bee. All the changes we > made > in the last few years indeed decreased the importance of the lflow hashing and > comparisons. > > I would like to run a larger set of ovn-heater tests though before concluding > that > there is actually no real difference. However, ... > > It seems like an OVN_INTERNAL_MINOR_VER wasn't increased for a long time now > and > it supposed to be increased every time logical pipeline stages are > added/modified. > That might be the root of the issue you're trying to solve. > > When the internal version changes, that should trigger updates for all the > stage > names and hints and the source locators for all the logical flows, see the > 'if (ovn_internal_version_changed)' branch in the sync_lflow_to_sb(). > > I think, we just need to bump that number and maybe add some build assertions, > so we do not forget about it again... >
That's what the comment above the OVN_INTERNAL_MINOR_VER definition says, yes: /* Increment this for any logical flow changes, if an existing OVN action is * modified or a stage is added to a logical pipeline. * * This value is also used to handle some backward compatibility during * upgrading. It should never decrease or rewind. */ So we could try to add a compile time check to make sure we didn't forget to bump this. However, ... :) > What do you think? Dumitru, what's your take on this? > The definition of ovn_get_internal_version() is: /* Returns the OVN version. The caller must free the returned value. */ char * ovn_get_internal_version(void) { return xasprintf("%s-%s-%d.%d", OVN_PACKAGE_VERSION, sbrec_get_db_version(), N_OVNACTS, OVN_INTERNAL_MINOR_VER); } I'd expect production upgrades move to a new version so OVN_PACKAGE_VERSION should change, or am I missing something? > Best regards, Ilya Maximets. > Thanks, Dumitru >> >> >>> On 6 Mar 2025, at 16:50, Ilya Maximets <i.maxim...@ovn.org> wrote: >>> >>> On 3/6/25 09:41, Alexandra Rukomoinikova wrote: >>>> When changing the order of stages in the LogicalFlow >>>> table in SB, some rules will be added with an incorrect >>>> stage-name in the external-ids of existing datapath groups. >>>> When adding a stage, new lflows in a new table with the same >>>> match and action that already existed with the same table-id >>>> will remain in the database with unupdated stage names. >>>> >>>> Signed-off-by: Alexandra Rukomoinikova <arukomoinikova@k2.cloud> >>>> --- >>>> lib/ovn-util.c | 5 ++++- >>>> lib/ovn-util.h | 2 +- >>>> northd/lflow-mgr.c | 4 +++- >>>> 3 files changed, 8 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c >>>> index c65b36bb5..d1572d5ec 100644 >>>> --- a/lib/ovn-util.c >>>> +++ b/lib/ovn-util.c >>>> @@ -646,16 +646,19 @@ sbrec_logical_flow_hash(const struct >>>> sbrec_logical_flow *lf) >>>> { >>>> return ovn_logical_flow_hash(lf->table_id, >>>> ovn_pipeline_from_name(lf->pipeline), >>>> + smap_get_def(&lf->external_ids, >>>> + "stage-name", ""), >>>> lf->priority, lf->match, >>>> lf->actions); >>>> } >>> >>> Hi, Alexandra. Thanks for the patch! >>> >>> Though, IIRC, it was an intentional decision to not include the stage >>> name in the hash/comparison to avoid a significant performance impact >>> that this hashmap lookup and the string hashing bring. >>> >>> The issue only affects debugging capabilities, but the performance >>> impact of hashing the name on each database update and comparing extra >>> strings on lflow generation as well as re-writing most of the flows >>> on stage updates is significant and affects every operation in northd. >>> >>> Did you run some scale tests with this change? What's the impact in >>> current OVN? >>> >>> Best regards, Ilya Maximets. >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev