Hi Aleksandra, Ilya,

On 3/11/25 12:01 AM, Ilya Maximets wrote:
> On 3/7/25 16:51, Aleksandra Rukomoinikova wrote:
>> Hi Ilya!
>>
>> I’m not entirely clear on your point about the impact of this change on 
>> performance.
>> Formally, without my changes, two strings are already added to the hash: 
>> action and clone,
>> which are usually much longer than just the stage name.
>> I don’t think the recalculation of flows based on hashes is particularly 
>> significant. 
>> Could you explain the impact? Is this somehow related to datapath groups?
> 
> Each lflow is getting hashed and compared against existing ones right after
> creation.  At least in the past, that was a fairly CPU intensive operation.
> Especially because in some cases northd generated a large amount of duplicated
> logical flows, so the actual number of comparisons was significantly larger
> than the total number of logical flows that ended up in the southbound DB.
> 
> So, even a smallest of changes in a way lflows are hashed and compared would
> result in very significant changes in the overall performance of ovn-northd.
> 
> However, over time we optimized different aspects of lflow generation 
> including
> optimized lflow generation for datapath groups in places where northd knows
> that all the flows will indeed be grouped.  And there are not so many cases
> now where duplicates can be generated, so it seems to be less of a concern in
> the latest versions of OVN.
> 
>>
>> I also thought about an alternative approach where we would know the hashes 
>> for each
>> stage name in advance and simply add this hash instead of computing it from 
>> the string,
>> since it’s a stateful value. However, ovn_stage_to_str doesn’t access the 
>> database but
>> retrieves the string from memory, and this didn’t seem like a significant 
>> optimization to me.
>>
>> I conducted some research on the impact of this patch on the performance of 
>> upstream version.
>> In a setup with 6000 logical switches and 15,000 ports, I collected some 
>> metrics regarding performance.
>>
>> The average utilization metrics for hash maps during recomputation were:
>>  • Without the patch:
>> hmap_expand: 5928.750/sec (average), 98.8125/sec (overall average)
>>
>>  • With the patch:
>> hmap_expand: 5921.117/sec (average), 98.6853/sec (overall average)
>>
> 
> This one is not a very useful metric.  At least for this measurement.
> 
>> Statistics for lflows_to_sb:
>>  • With the patch: 154.000000 msec
>>  • Without the patch: 151.000000 msec
>>
>> These results seem to fall within the range of normal deviation.
> 
> It seems, you're right.  I did a few tests of my own and I also see an
> insignificant performance difference in most cases.  It is a little surprising
> to not see much difference, but maybe it shouldn't bee.  All the changes we 
> made
> in the last few years indeed decreased the importance of the lflow hashing and
> comparisons.
> 
> I would like to run a larger set of ovn-heater tests though before concluding 
> that
> there is actually no real difference.  However, ...
> 
> It seems like an OVN_INTERNAL_MINOR_VER wasn't increased for a long time now 
> and
> it supposed to be increased every time logical pipeline stages are 
> added/modified.
> That might be the root of the issue you're trying to solve.
> 
> When the internal version changes, that should trigger updates for all the 
> stage
> names and hints and the source locators for all the logical flows, see the
> 'if (ovn_internal_version_changed)' branch in the sync_lflow_to_sb().
> 
> I think, we just need to bump that number and maybe add some build assertions,
> so we do not forget about it again...
> 

That's what the comment above the OVN_INTERNAL_MINOR_VER definition
says, yes:

/* Increment this for any logical flow changes, if an existing OVN action is
 * modified or a stage is added to a logical pipeline.
 *
 * This value is also used to handle some backward compatibility during
 * upgrading. It should never decrease or rewind. */

So we could try to add a compile time check to make sure we didn't
forget to bump this.  However, ... :)

> What do you think?  Dumitru, what's your take on this?
> 

The definition of ovn_get_internal_version() is:

/* Returns the OVN version. The caller must free the returned value. */
char *
ovn_get_internal_version(void)
{
    return xasprintf("%s-%s-%d.%d", OVN_PACKAGE_VERSION,
                     sbrec_get_db_version(),
                     N_OVNACTS, OVN_INTERNAL_MINOR_VER);
}

I'd expect production upgrades move to a new version so OVN_PACKAGE_VERSION
should change, or am I missing something?

> Best regards, Ilya Maximets.
> 

Thanks,
Dumitru

>>
>>
>>> On 6 Mar 2025, at 16:50, Ilya Maximets <i.maxim...@ovn.org> wrote:
>>>
>>> On 3/6/25 09:41, Alexandra Rukomoinikova wrote:
>>>> When changing the order of stages in the LogicalFlow
>>>> table in SB, some rules will be added with an incorrect
>>>> stage-name in the external-ids of existing datapath groups.
>>>> When adding a stage, new lflows in a new table with the same
>>>> match and action that already existed with the same table-id
>>>> will remain in the database with unupdated stage names.
>>>>
>>>> Signed-off-by: Alexandra Rukomoinikova <arukomoinikova@k2.cloud>
>>>> ---
>>>> lib/ovn-util.c     | 5 ++++-
>>>> lib/ovn-util.h     | 2 +-
>>>> northd/lflow-mgr.c | 4 +++-
>>>> 3 files changed, 8 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
>>>> index c65b36bb5..d1572d5ec 100644
>>>> --- a/lib/ovn-util.c
>>>> +++ b/lib/ovn-util.c
>>>> @@ -646,16 +646,19 @@ sbrec_logical_flow_hash(const struct 
>>>> sbrec_logical_flow *lf)
>>>> {
>>>>     return ovn_logical_flow_hash(lf->table_id,
>>>>                                  ovn_pipeline_from_name(lf->pipeline),
>>>> +                                 smap_get_def(&lf->external_ids,
>>>> +                                              "stage-name", ""),
>>>>                                  lf->priority, lf->match,
>>>>                                  lf->actions);
>>>> }
>>>
>>> Hi, Alexandra.  Thanks for the patch!
>>>
>>> Though, IIRC, it was an intentional decision to not include the stage
>>> name in the hash/comparison to avoid a significant performance impact
>>> that this hashmap lookup and the string hashing bring.
>>>
>>> The issue only affects debugging capabilities, but the performance
>>> impact of hashing the name on each database update and comparing extra
>>> strings on lflow generation as well as re-writing most of the flows
>>> on stage updates is significant and affects every operation in northd.
>>>
>>> Did you run some scale tests with this change?  What's the impact in
>>> current OVN?
>>>
>>> Best regards, Ilya Maximets.
>>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to