Hi Aleksandra,

I see you posted [0] to add checksum checks for northd stages.

We plan to bump OVN_INTERNAL_MINOR_VER as a side effect of a different
change anyway [1] - the v5 of that patch will likely be backported to
all stable branches.

I'm considering moving "[PATCH ovn] northd: Fixed stage name mapping in
lflow table." [2] to "Rejected" in patchwork as [0] will cover these
cases in the future.

What do you think?

Thanks,
Dumitru

[0]
https://patchwork.ozlabs.org/project/ovn/patch/20250312105127.66815-1-arukomoinikova@k2.cloud/
[1]
https://patchwork.ozlabs.org/project/ovn/patch/20250311212935.1826800-1-rosema...@redhat.com/#3477523
[2]
https://patchwork.ozlabs.org/project/ovn/patch/20250306084202.4593-1-arukomoinikova@k2.cloud/

On 3/11/25 5:28 PM, Ilya Maximets wrote:
> On 3/11/25 16:58, Aleksandra Rukomoinikova wrote:
>> *Good evening,*
>>
>> You are absolutely correct about the Bump version being helpful. I observed 
>> this
>> while working on the intermediate update.
>>
>> Perhaps we should consider this and calculate the checksum similarly to the 
>> base,
>> but specifically for the order of stages in northd?
> 
> I'd suggest we add a new PHONY target in the Makefile.am for this.  See the
> check-ifconfig target for example.  It will run for every build.
> And we can do a checksum comparison trick similarly as we do for database
> schema checksums.  For example, we could add something like:
> "The current pipeline checksum is '1317960254 5844'" to the comment above
> the definition of OVN_INTERNAL_MINOR_VER.  And our new build target would
> do something like "grep '^\s*PIPELINE_STAGE(' northd/northd.h | cksum",
> then grep out the checksum value from the comment and compare them, failing
> the build if they do not match.
> 
> This will ensure that every time someone touches the pipeline definitions,
> they will have to update the checksum above OVN_INTERNAL_MINOR_VER definition
> and so they will consider updating the version itself.
> 
> What do you think?
> 
> Best regards, Ilya Maximets.
> 
>>
>> I must admit, the manual Bump version process is a bit confusing in my 
>> opinion.
>>
>>> On 11 Mar 2025, at 14:03, Dumitru Ceara <dce...@redhat.com> wrote:
>>>
>>> On 3/11/25 11:40 AM, Ilya Maximets wrote:
>>>> On 3/11/25 11:11, Dumitru Ceara wrote:
>>>>> Hi Aleksandra, Ilya,
>>>>>
>>>>> On 3/11/25 12:01 AM, Ilya Maximets wrote:
>>>>>> On 3/7/25 16:51, Aleksandra Rukomoinikova wrote:
>>>>>>> Hi Ilya!
>>>>>>>
>>>>>>> I’m not entirely clear on your point about the impact of this change on 
>>>>>>> performance.
>>>>>>> Formally, without my changes, two strings are already added to the 
>>>>>>> hash: action and clone,
>>>>>>> which are usually much longer than just the stage name.
>>>>>>> I don’t think the recalculation of flows based on hashes is 
>>>>>>> particularly significant. 
>>>>>>> Could you explain the impact? Is this somehow related to datapath 
>>>>>>> groups?
>>>>>>
>>>>>> Each lflow is getting hashed and compared against existing ones right 
>>>>>> after
>>>>>> creation.  At least in the past, that was a fairly CPU intensive 
>>>>>> operation.
>>>>>> Especially because in some cases northd generated a large amount of 
>>>>>> duplicated
>>>>>> logical flows, so the actual number of comparisons was significantly 
>>>>>> larger
>>>>>> than the total number of logical flows that ended up in the southbound 
>>>>>> DB.
>>>>>>
>>>>>> So, even a smallest of changes in a way lflows are hashed and compared 
>>>>>> would
>>>>>> result in very significant changes in the overall performance of 
>>>>>> ovn-northd.
>>>>>>
>>>>>> However, over time we optimized different aspects of lflow generation 
>>>>>> including
>>>>>> optimized lflow generation for datapath groups in places where northd 
>>>>>> knows
>>>>>> that all the flows will indeed be grouped.  And there are not so many 
>>>>>> cases
>>>>>> now where duplicates can be generated, so it seems to be less of a 
>>>>>> concern in
>>>>>> the latest versions of OVN.
>>>>>>
>>>>>>>
>>>>>>> I also thought about an alternative approach where we would know the 
>>>>>>> hashes for each
>>>>>>> stage name in advance and simply add this hash instead of computing it 
>>>>>>> from the string,
>>>>>>> since it’s a stateful value. However, ovn_stage_to_str doesn’t access 
>>>>>>> the database but
>>>>>>> retrieves the string from memory, and this didn’t seem like a 
>>>>>>> significant optimization to me.
>>>>>>>
>>>>>>> I conducted some research on the impact of this patch on the 
>>>>>>> performance of upstream version.
>>>>>>> In a setup with 6000 logical switches and 15,000 ports, I collected 
>>>>>>> some metrics regarding performance.
>>>>>>>
>>>>>>> The average utilization metrics for hash maps during recomputation were:
>>>>>>>  • Without the patch:
>>>>>>> hmap_expand: 5928.750/sec (average), 98.8125/sec (overall average)
>>>>>>>
>>>>>>>  • With the patch:
>>>>>>> hmap_expand: 5921.117/sec (average), 98.6853/sec (overall average)
>>>>>>>
>>>>>>
>>>>>> This one is not a very useful metric.  At least for this measurement.
>>>>>>
>>>>>>> Statistics for lflows_to_sb:
>>>>>>>  • With the patch: 154.000000 msec
>>>>>>>  • Without the patch: 151.000000 msec
>>>>>>>
>>>>>>> These results seem to fall within the range of normal deviation.
>>>>>>
>>>>>> It seems, you're right.  I did a few tests of my own and I also see an
>>>>>> insignificant performance difference in most cases.  It is a little 
>>>>>> surprising
>>>>>> to not see much difference, but maybe it shouldn't bee.  All the changes 
>>>>>> we made
>>>>>> in the last few years indeed decreased the importance of the lflow 
>>>>>> hashing and
>>>>>> comparisons.
>>>>>>
>>>>>> I would like to run a larger set of ovn-heater tests though before 
>>>>>> concluding that
>>>>>> there is actually no real difference.  However, ...
>>>>>>
>>>>>> It seems like an OVN_INTERNAL_MINOR_VER wasn't increased for a long time 
>>>>>> now and
>>>>>> it supposed to be increased every time logical pipeline stages are 
>>>>>> added/modified.
>>>>>> That might be the root of the issue you're trying to solve.
>>>>>>
>>>>>> When the internal version changes, that should trigger updates for all 
>>>>>> the stage
>>>>>> names and hints and the source locators for all the logical flows, see 
>>>>>> the
>>>>>> 'if (ovn_internal_version_changed)' branch in the sync_lflow_to_sb().
>>>>>>
>>>>>> I think, we just need to bump that number and maybe add some build 
>>>>>> assertions,
>>>>>> so we do not forget about it again...
>>>>>>
>>>>>
>>>>> That's what the comment above the OVN_INTERNAL_MINOR_VER definition
>>>>> says, yes:
>>>>>
>>>>> /* Increment this for any logical flow changes, if an existing OVN action 
>>>>> is
>>>>> * modified or a stage is added to a logical pipeline.
>>>>> *
>>>>> * This value is also used to handle some backward compatibility during
>>>>> * upgrading. It should never decrease or rewind. */
>>>>>
>>>>> So we could try to add a compile time check to make sure we didn't
>>>>> forget to bump this.  However, ... :)
>>>>>
>>>>>> What do you think?  Dumitru, what's your take on this?
>>>>>>
>>>>>
>>>>> The definition of ovn_get_internal_version() is:
>>>>>
>>>>> /* Returns the OVN version. The caller must free the returned value. */
>>>>> char *
>>>>> ovn_get_internal_version(void)
>>>>> {
>>>>>    return xasprintf("%s-%s-%d.%d", OVN_PACKAGE_VERSION,
>>>>>                     sbrec_get_db_version(),
>>>>>                     N_OVNACTS, OVN_INTERNAL_MINOR_VER);
>>>>> }
>>>>>
>>>>> I'd expect production upgrades move to a new version so 
>>>>> OVN_PACKAGE_VERSION
>>>>> should change, or am I missing something?
>>>>
>>>> AFAICT, we backport pipeline stage changes from time to time, and the 
>>>> internal
>>>> version will not change in this case, if the downstream consumer of those
>>>> changes (distribution packages) do not have custom patches to bump the 
>>>> internal
>>>> minor version.  Same will be true for someone who is following a bleeding 
>>>> edge
>>>> main branch for one reason or another.
>>>>
>>>> So, it's true that this issue should not appear while updating between 
>>>> official
>>>> OVN upstream releases (minor or major), but it may be seen mid-way, and so 
>>>> it
>>>> can be seen in distributions.
>>>>
>>>
>>> That's a good point.
>>>
>>>> That's also assuming that 'ovn_internal_version_changed' logic actually 
>>>> works
>>>> as expected (I didn't test).
>>>>
>>>
>>> It seems to work.. it's a bit awkward because it checks if the version
>>> in NB.NB_Global.options:northd_internal_version changed.  I would've
>>> expected it to check the SB.SB_Global table.  However, that's not wrong
>>> as ovn-northd (old version too) syncs the value from SB options to NB
>>> options.
>>>
>>> Aleksandra, I guess, we need to understand in which cases you hit this
>>> problem?  Is it after an upgrade?  Were you upgrading between
>>> minor/major upstream OVN versions?  Or upgrading mid-way like Ilya
>>> mentioned above.
>>>
>>> If it's the former we must have a hidden bug somewhere.
>>>
>>> If it's the latter maybe it's enough to just bump OVN_INTERNAL_MINOR_VER
>>> as Ilya suggested and add a compile time check.
>>>
>>> What do you think?
>>>
>>>>>
>>>>>> Best regards, Ilya Maximets.
>>>>>>
>>>>>
>>>>> Thanks,
>>>>> Dumitru
>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On 6 Mar 2025, at 16:50, Ilya Maximets <i.maxim...@ovn.org> wrote:
>>>>>>>>
>>>>>>>> On 3/6/25 09:41, Alexandra Rukomoinikova wrote:
>>>>>>>>> When changing the order of stages in the LogicalFlow
>>>>>>>>> table in SB, some rules will be added with an incorrect
>>>>>>>>> stage-name in the external-ids of existing datapath groups.
>>>>>>>>> When adding a stage, new lflows in a new table with the same
>>>>>>>>> match and action that already existed with the same table-id
>>>>>>>>> will remain in the database with unupdated stage names.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Alexandra Rukomoinikova <arukomoinikova@k2.cloud>
>>>>>>>>> ---
>>>>>>>>> lib/ovn-util.c     | 5 ++++-
>>>>>>>>> lib/ovn-util.h     | 2 +-
>>>>>>>>> northd/lflow-mgr.c | 4 +++-
>>>>>>>>> 3 files changed, 8 insertions(+), 3 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
>>>>>>>>> index c65b36bb5..d1572d5ec 100644
>>>>>>>>> --- a/lib/ovn-util.c
>>>>>>>>> +++ b/lib/ovn-util.c
>>>>>>>>> @@ -646,16 +646,19 @@ sbrec_logical_flow_hash(const struct 
>>>>>>>>> sbrec_logical_flow *lf)
>>>>>>>>> {
>>>>>>>>>     return ovn_logical_flow_hash(lf->table_id,
>>>>>>>>>                                  ovn_pipeline_from_name(lf->pipeline),
>>>>>>>>> +                                 smap_get_def(&lf->external_ids,
>>>>>>>>> +                                              "stage-name", ""),
>>>>>>>>>                                  lf->priority, lf->match,
>>>>>>>>>                                  lf->actions);
>>>>>>>>> }
>>>>>>>>
>>>>>>>> Hi, Alexandra.  Thanks for the patch!
>>>>>>>>
>>>>>>>> Though, IIRC, it was an intentional decision to not include the stage
>>>>>>>> name in the hash/comparison to avoid a significant performance impact
>>>>>>>> that this hashmap lookup and the string hashing bring.
>>>>>>>>
>>>>>>>> The issue only affects debugging capabilities, but the performance
>>>>>>>> impact of hashing the name on each database update and comparing extra
>>>>>>>> strings on lflow generation as well as re-writing most of the flows
>>>>>>>> on stage updates is significant and affects every operation in northd.
>>>>>>>>
>>>>>>>> Did you run some scale tests with this change?  What's the impact in
>>>>>>>> current OVN?
>>>>>>>>
>>>>>>>> Best regards, Ilya Maximets.
>>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to