On 2/5/24 09:23, Dumitru Ceara wrote:
> On 2/5/24 08:13, Han Zhou wrote:
>> On Sun, Feb 4, 2024 at 9:26 PM Numan Siddique <num...@ovn.org> wrote:
>>>
>>> On Sun, Feb 4, 2024 at 9:53 PM Han Zhou <hz...@ovn.org> wrote:
>>>>
>>>> On Sun, Feb 4, 2024 at 5:46 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
>>>>>
>>>>>>>>
>>>>>>>>>  35 files changed, 9681 insertions(+), 4645 deletions(-)
>>>>>>>>
>>>>>>>> I had another look at this series and acked the remaining
>> patches.  I
>>>>>>>> just had some minor comments that can be easily fixed when
>> applying
>>>> the
>>>>>>>> patches to the main branch.
>>>>>>>>
>>>>>>>> Thanks for all the work on this!  It was a very large change but
>> it
>>>>>>>> improves northd performance significantly.  I just hope we don't
>>>>>>>> introduce too many bugs.  Hopefully the time we have until release
>>>> will
>>>>>>>> allow us to further test this change on the 24.03 branch.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Dumitru
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks a lot Dumitru and Han for the reviews and patience.
>>>>>>>
>>>>>>> I addressed the comments and applied the patches to main and also
>> to
>>>>>> branch-24.03.
>>>>>>>
>>>>>>> @Han - I know you wanted to take another look in to v6.  I didn't
>> want
>>>> to
>>>>>> delay further as branch-24.03 was created.  I'm more than happy to
>>>> submit
>>>>>> follow up patches if you have any comments to address.  Please let
>> me
>>>> know.
>>>>>>>
>>>>>>
>>>>>> Hi Numan,
>>>>>>
>>>>>> I was writing the reply and saw your email just now. Thanks a lot
>> for
>>>>>> taking a huge effort to achieve the great optimization. I only left
>> one
>>>>>> comment on the implicit dependency left for the en_lrnat ->
>> en_lflow.
>>>> Feel
>>>>>> free to address it with a followup and no need to block the
>> branching.
>>>> And
>>>>>> take my Ack for the series with that addressed.
>>>>>>
>>>>>> Acked-by: Han Zhou <hzhou at ovn.org>
>>>>>
>>>>>
>>>>> Hi, Numan, Dumitru and Han.
>>>>>
>>>>> I see a huge negative performance impact, most likely from this set,
>> on
>>>>> ovn-heater's cluster-density tests.  The memory consumption on northd
> 
> Thanks for reporting this, Ilya!
> 
>>>>> jumped about 4x and it constantly recomputes due to failures of
>> port_group
>>>>> handler:
>>>>>
>>>>> 2024-02-03T11:09:12.441Z|01680|inc_proc_eng|INFO|node: lflow,
>> recompute
>>>> (failed handler for input port_group) took 9762ms
>>>>> 2024-02-03T11:09:12.444Z|01681|timeval|WARN|Unreasonably long 9898ms
>> poll
>>>> interval (5969ms user, 1786ms system)
>>>>> ...
>>>>> 2024-02-03T11:09:23.770Z|01690|inc_proc_eng|INFO|node: lflow,
>> recompute
>>>> (failed handler for input port_group) took 9014ms
>>>>> 2024-02-03T11:09:23.773Z|01691|timeval|WARN|Unreasonably long 9118ms
>> poll
>>>> interval (5376ms user, 1515ms system)
>>>>> ...
>>>>> 2024-02-03T11:09:36.692Z|01699|inc_proc_eng|INFO|node: lflow,
>> recompute
>>>> (failed handler for input port_group) took 10695ms
>>>>> 2024-02-03T11:09:36.696Z|01700|timeval|WARN|Unreasonably long 10890ms
>>>> poll interval (6085ms user, 2745ms system)
>>>>> ...
>>>>> 2024-02-03T11:09:49.133Z|01708|inc_proc_eng|INFO|node: lflow,
>> recompute
>>>> (failed handler for input port_group) took 9985ms
>>>>> 2024-02-03T11:09:49.137Z|01709|timeval|WARN|Unreasonably long 10108ms
>>>> poll interval (5521ms user, 2440ms system)
>>>>>
>>>>> That increases 95%% ovn-installed latency in 500node cluster-density
>> from
>>>>> 3.6 seconds last week to 21.5 seconds this week.
>>>>>
>>>>> I think, this should be a release blocker.
>>>>>
>>>>> Memory usage is also very concerning.  Unfortunately it is not tied
>> to the
>>>>> cluster-density test.  The same 4-5x RSS jump is also seen in other
>> test
>>>>> like density-heavy.  Last week RSS of ovn-northd in cluster-density
>> 500
>>>> node
>>>>> was between 1.5 and 2.5 GB, this week we have a range between 5.5 and
>> 8.5
>>>> GB.
>>>>>
>>>>> I would consider this as a release blocker as well.
>>>>>
> 
> I agree, we shouldn't release 24.03.0 unless these two issues are
> (sufficiently) addressed.  We do have until March 1st (official release
> date) to do that or to revert any patches that cause regressions.
> 
>>>>>
>>>>> I don't have direct evidence that this particular series is a
>> culprit, but
>>>>> it looks like the most likely candidate.  I can dig more into
>>>> investigation
>>>>> on Monday.
>>>>>
>>>>> Best regards, Ilya Maximets.
>>>>
>>>> Thanks Ilya for reporting this. 95% latency and 4x RSS increase is a
>> little
>>>> surprising to me. I did test this series with my scale test scripts for
>>>> recompute performance regression. It was 10+% increase in latency. I
>> even
>>>> digged a little into it, and noticed ~5% increase caused by the hmap
>> used
>>>> to maintain the lflows in each lflow_ref. This was discussed in the code
>>>> review for an earlier version (v2/v3). Overall it looked not very bad,
>> if
>>>> we now handle most common scenarios incrementally, and it is reasonable
>> to
>>>> have some cost for maintaining the references/index for incremental
>>>> processing. I wonder if my test scenario was too simple (didn't have LBs
>>>> included) to find the problems, so today I did another test by
>> including a
>>>> LB group with 1k LBs applied to 100 node-LS & GR, and another 1K LBs per
>>>> node-LS & GR (101K LBs in total), and I did see more performance penalty
>>>> but still within ~20%. While for memory I didn't notice a significant
>>>> increase (<10%). I believe I am missing some specific scenario that had
>> the
>>>> big impact in the ovn-heater's tests. Please share if you dig out more
>>>> clues .
>>>
>>> Hi Ilya,
>>>
>>> Thanks for reporting these details.
>>>
>>> I had a look at this regression.   There is a significant increase in
>>> the lflow recompute
>>> time (around 4x) in my local testing and this definitely not acceptable.
>>>
>>> In this particular cluster density test, whenever a port group is
>>> created it results in a full recompute
>>> and now since recompute time has increased, it has a cumulative effect
>>> on the latency of the test.
>>>
>>> The dp reference counting [1] added in the v4 of this series has
>>> introduced this regression (both CPU and memory).
>>> I'm working on this fix and I think I should be able to address this soon.
>>>
>>> [1] - https://github.com/ovn-org/ovn/blob/main/northd/lflow-mgr.c#L696
>>>
>>>
>> Indeed, the same concern I mentioned when reviewing v5:
>> https://mail.openvswitch.org/pipermail/ovs-dev/2024-January/411194.html

Yeah.  In this cluster-density 500node test, dp_refcnt_use() allocates
120 M references and puts them into hash maps.  That alone takes several
GB of memory.  And it is a largest consumer according to massif.
It takes ~45% of total memory allocations.  Another ~25% are taken by hash
maps where the refcounts are stored.

There is a simple optimization that can be made - not allocate refcounts
if it is the first time we're trying to add them.  The original bitmap
should be enough.  And if we try to use it again while it is already in
a bitmap, then allocate.  That saves a lot of space, because 80% of all
refcounts in that test never used more than once.  However, it might not
be enough.  Needs testing.  Ideally the whole structure needs to go or
be replaced by something different.  Frequent iterations over the bitmaps
and per-datapath*per-lflow memory allocations do not scale.  Even if we
can make it work reasonably well for 500 nodes, it will cause problems on
higher scale.

>>
>> I just realized that my earlier test had too many individual LBs which hide
>> this bottleneck, and now with a small change to my script by using just the
>> single LB group that contains 1k LBs applied to 1k node-LS & GRs I did
>> reproduce the performance regression with >50% increase in both latency and
>> memory during recompute. The ovn-heater test you ran may have more LB
>> groups which may have made it even worse.
>>
> 
> 
> Unfortunately onv-heater runs weekly on the latest version of the
> upstream branch so it can catch patches that introduce regressions only
> after they get merged.
> 
> That's due to the fact that we run ovn-heater downstream inside our lab
> and it's not that straight forward to automatically run it on each patch
> submission we get upstream.

FWIW, the memory usage jump is present in density-heavy tests.  And those
were performed for this patch set for the stats in the cover letter.
Week to week RSS jump for density-heavy test is from 1.5 GB to 5-7 GB.

> 
> Maybe it would be an idea to integrate some of Han's performance testing
> scripts into the set of tests we already have in the upstream repo,
> ovn-performance.at [0], and run those in GitHub actions too.
> 
> [0] https://github.com/ovn-org/ovn/blob/main/tests/ovn-performance.at
> 
> Han, others, what do you think?

Define "performance", i.e. what exactly we want to test in such a CI?
I don't think we can test latency, for example, because all the public
runners are very different and generally weak.  There will be a lot of
noise in the test results.  I also don't think we have enough memory
available to run things with high enough scale, even if they are
isolated to a single set of OVN daemons.  And low scale may not give
any good indicators for performance issues.

> 
>> Looking forward to the solution.
>>
> 
> Numan, please let me know if there's anything I can assist with.
> 
> Regards,
> Dumitru
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to