On 3/11/25 11:40 AM, Ilya Maximets wrote:
> On 3/11/25 11:11, Dumitru Ceara wrote:
>> Hi Aleksandra, Ilya,
>>
>> On 3/11/25 12:01 AM, Ilya Maximets wrote:
>>> On 3/7/25 16:51, Aleksandra Rukomoinikova wrote:
>>>> Hi Ilya!
>>>>
>>>> I’m not entirely clear on your point about the impact of this change on 
>>>> performance.
>>>> Formally, without my changes, two strings are already added to the hash: 
>>>> action and clone,
>>>> which are usually much longer than just the stage name.
>>>> I don’t think the recalculation of flows based on hashes is particularly 
>>>> significant. 
>>>> Could you explain the impact? Is this somehow related to datapath groups?
>>>
>>> Each lflow is getting hashed and compared against existing ones right after
>>> creation.  At least in the past, that was a fairly CPU intensive operation.
>>> Especially because in some cases northd generated a large amount of 
>>> duplicated
>>> logical flows, so the actual number of comparisons was significantly larger
>>> than the total number of logical flows that ended up in the southbound DB.
>>>
>>> So, even a smallest of changes in a way lflows are hashed and compared would
>>> result in very significant changes in the overall performance of ovn-northd.
>>>
>>> However, over time we optimized different aspects of lflow generation 
>>> including
>>> optimized lflow generation for datapath groups in places where northd knows
>>> that all the flows will indeed be grouped.  And there are not so many cases
>>> now where duplicates can be generated, so it seems to be less of a concern 
>>> in
>>> the latest versions of OVN.
>>>
>>>>
>>>> I also thought about an alternative approach where we would know the 
>>>> hashes for each
>>>> stage name in advance and simply add this hash instead of computing it 
>>>> from the string,
>>>> since it’s a stateful value. However, ovn_stage_to_str doesn’t access the 
>>>> database but
>>>> retrieves the string from memory, and this didn’t seem like a significant 
>>>> optimization to me.
>>>>
>>>> I conducted some research on the impact of this patch on the performance 
>>>> of upstream version.
>>>> In a setup with 6000 logical switches and 15,000 ports, I collected some 
>>>> metrics regarding performance.
>>>>
>>>> The average utilization metrics for hash maps during recomputation were:
>>>>  • Without the patch:
>>>> hmap_expand: 5928.750/sec (average), 98.8125/sec (overall average)
>>>>
>>>>  • With the patch:
>>>> hmap_expand: 5921.117/sec (average), 98.6853/sec (overall average)
>>>>
>>>
>>> This one is not a very useful metric.  At least for this measurement.
>>>
>>>> Statistics for lflows_to_sb:
>>>>  • With the patch: 154.000000 msec
>>>>  • Without the patch: 151.000000 msec
>>>>
>>>> These results seem to fall within the range of normal deviation.
>>>
>>> It seems, you're right.  I did a few tests of my own and I also see an
>>> insignificant performance difference in most cases.  It is a little 
>>> surprising
>>> to not see much difference, but maybe it shouldn't bee.  All the changes we 
>>> made
>>> in the last few years indeed decreased the importance of the lflow hashing 
>>> and
>>> comparisons.
>>>
>>> I would like to run a larger set of ovn-heater tests though before 
>>> concluding that
>>> there is actually no real difference.  However, ...
>>>
>>> It seems like an OVN_INTERNAL_MINOR_VER wasn't increased for a long time 
>>> now and
>>> it supposed to be increased every time logical pipeline stages are 
>>> added/modified.
>>> That might be the root of the issue you're trying to solve.
>>>
>>> When the internal version changes, that should trigger updates for all the 
>>> stage
>>> names and hints and the source locators for all the logical flows, see the
>>> 'if (ovn_internal_version_changed)' branch in the sync_lflow_to_sb().
>>>
>>> I think, we just need to bump that number and maybe add some build 
>>> assertions,
>>> so we do not forget about it again...
>>>
>>
>> That's what the comment above the OVN_INTERNAL_MINOR_VER definition
>> says, yes:
>>
>> /* Increment this for any logical flow changes, if an existing OVN action is
>>  * modified or a stage is added to a logical pipeline.
>>  *
>>  * This value is also used to handle some backward compatibility during
>>  * upgrading. It should never decrease or rewind. */
>>
>> So we could try to add a compile time check to make sure we didn't
>> forget to bump this.  However, ... :)
>>
>>> What do you think?  Dumitru, what's your take on this?
>>>
>>
>> The definition of ovn_get_internal_version() is:
>>
>> /* Returns the OVN version. The caller must free the returned value. */
>> char *
>> ovn_get_internal_version(void)
>> {
>>     return xasprintf("%s-%s-%d.%d", OVN_PACKAGE_VERSION,
>>                      sbrec_get_db_version(),
>>                      N_OVNACTS, OVN_INTERNAL_MINOR_VER);
>> }
>>
>> I'd expect production upgrades move to a new version so OVN_PACKAGE_VERSION
>> should change, or am I missing something?
> 
> AFAICT, we backport pipeline stage changes from time to time, and the internal
> version will not change in this case, if the downstream consumer of those
> changes (distribution packages) do not have custom patches to bump the 
> internal
> minor version.  Same will be true for someone who is following a bleeding edge
> main branch for one reason or another.
> 
> So, it's true that this issue should not appear while updating between 
> official
> OVN upstream releases (minor or major), but it may be seen mid-way, and so it
> can be seen in distributions.
> 

That's a good point.

> That's also assuming that 'ovn_internal_version_changed' logic actually works
> as expected (I didn't test).
> 

It seems to work.. it's a bit awkward because it checks if the version
in NB.NB_Global.options:northd_internal_version changed.  I would've
expected it to check the SB.SB_Global table.  However, that's not wrong
as ovn-northd (old version too) syncs the value from SB options to NB
options.

Aleksandra, I guess, we need to understand in which cases you hit this
problem?  Is it after an upgrade?  Were you upgrading between
minor/major upstream OVN versions?  Or upgrading mid-way like Ilya
mentioned above.

If it's the former we must have a hidden bug somewhere.

If it's the latter maybe it's enough to just bump OVN_INTERNAL_MINOR_VER
as Ilya suggested and add a compile time check.

What do you think?

>>
>>> Best regards, Ilya Maximets.
>>>
>>
>> Thanks,
>> Dumitru
>>
>>>>
>>>>
>>>>> On 6 Mar 2025, at 16:50, Ilya Maximets <i.maxim...@ovn.org> wrote:
>>>>>
>>>>> On 3/6/25 09:41, Alexandra Rukomoinikova wrote:
>>>>>> When changing the order of stages in the LogicalFlow
>>>>>> table in SB, some rules will be added with an incorrect
>>>>>> stage-name in the external-ids of existing datapath groups.
>>>>>> When adding a stage, new lflows in a new table with the same
>>>>>> match and action that already existed with the same table-id
>>>>>> will remain in the database with unupdated stage names.
>>>>>>
>>>>>> Signed-off-by: Alexandra Rukomoinikova <arukomoinikova@k2.cloud>
>>>>>> ---
>>>>>> lib/ovn-util.c     | 5 ++++-
>>>>>> lib/ovn-util.h     | 2 +-
>>>>>> northd/lflow-mgr.c | 4 +++-
>>>>>> 3 files changed, 8 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
>>>>>> index c65b36bb5..d1572d5ec 100644
>>>>>> --- a/lib/ovn-util.c
>>>>>> +++ b/lib/ovn-util.c
>>>>>> @@ -646,16 +646,19 @@ sbrec_logical_flow_hash(const struct 
>>>>>> sbrec_logical_flow *lf)
>>>>>> {
>>>>>>     return ovn_logical_flow_hash(lf->table_id,
>>>>>>                                  ovn_pipeline_from_name(lf->pipeline),
>>>>>> +                                 smap_get_def(&lf->external_ids,
>>>>>> +                                              "stage-name", ""),
>>>>>>                                  lf->priority, lf->match,
>>>>>>                                  lf->actions);
>>>>>> }
>>>>>
>>>>> Hi, Alexandra.  Thanks for the patch!
>>>>>
>>>>> Though, IIRC, it was an intentional decision to not include the stage
>>>>> name in the hash/comparison to avoid a significant performance impact
>>>>> that this hashmap lookup and the string hashing bring.
>>>>>
>>>>> The issue only affects debugging capabilities, but the performance
>>>>> impact of hashing the name on each database update and comparing extra
>>>>> strings on lflow generation as well as re-writing most of the flows
>>>>> on stage updates is significant and affects every operation in northd.
>>>>>
>>>>> Did you run some scale tests with this change?  What's the impact in
>>>>> current OVN?
>>>>>
>>>>> Best regards, Ilya Maximets.
>>>>
>>>
>>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to