On 2/5/24 15:45, Ilya Maximets wrote:
> On 2/5/24 11:34, Ilya Maximets wrote:
>> On 2/5/24 09:23, Dumitru Ceara wrote:
>>> On 2/5/24 08:13, Han Zhou wrote:
>>>> On Sun, Feb 4, 2024 at 9:26 PM Numan Siddique <num...@ovn.org> wrote:
>>>>>
>>>>> On Sun, Feb 4, 2024 at 9:53 PM Han Zhou <hz...@ovn.org> wrote:
>>>>>>
>>>>>> On Sun, Feb 4, 2024 at 5:46 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>  35 files changed, 9681 insertions(+), 4645 deletions(-)
>>>>>>>>>>
>>>>>>>>>> I had another look at this series and acked the remaining
>>>> patches.  I
>>>>>>>>>> just had some minor comments that can be easily fixed when
>>>> applying
>>>>>> the
>>>>>>>>>> patches to the main branch.
>>>>>>>>>>
>>>>>>>>>> Thanks for all the work on this!  It was a very large change but
>>>> it
>>>>>>>>>> improves northd performance significantly.  I just hope we don't
>>>>>>>>>> introduce too many bugs.  Hopefully the time we have until release
>>>>>> will
>>>>>>>>>> allow us to further test this change on the 24.03 branch.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Dumitru
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks a lot Dumitru and Han for the reviews and patience.
>>>>>>>>>
>>>>>>>>> I addressed the comments and applied the patches to main and also
>>>> to
>>>>>>>> branch-24.03.
>>>>>>>>>
>>>>>>>>> @Han - I know you wanted to take another look in to v6.  I didn't
>>>> want
>>>>>> to
>>>>>>>> delay further as branch-24.03 was created.  I'm more than happy to
>>>>>> submit
>>>>>>>> follow up patches if you have any comments to address.  Please let
>>>> me
>>>>>> know.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Numan,
>>>>>>>>
>>>>>>>> I was writing the reply and saw your email just now. Thanks a lot
>>>> for
>>>>>>>> taking a huge effort to achieve the great optimization. I only left
>>>> one
>>>>>>>> comment on the implicit dependency left for the en_lrnat ->
>>>> en_lflow.
>>>>>> Feel
>>>>>>>> free to address it with a followup and no need to block the
>>>> branching.
>>>>>> And
>>>>>>>> take my Ack for the series with that addressed.
>>>>>>>>
>>>>>>>> Acked-by: Han Zhou <hzhou at ovn.org>
>>>>>>>
>>>>>>>
>>>>>>> Hi, Numan, Dumitru and Han.
>>>>>>>
>>>>>>> I see a huge negative performance impact, most likely from this set,
>>>> on
>>>>>>> ovn-heater's cluster-density tests.  The memory consumption on northd
>>>
>>> Thanks for reporting this, Ilya!
>>>
>>>>>>> jumped about 4x and it constantly recomputes due to failures of
>>>> port_group
>>>>>>> handler:
>>>>>>>
>>>>>>> 2024-02-03T11:09:12.441Z|01680|inc_proc_eng|INFO|node: lflow,
>>>> recompute
>>>>>> (failed handler for input port_group) took 9762ms
>>>>>>> 2024-02-03T11:09:12.444Z|01681|timeval|WARN|Unreasonably long 9898ms
>>>> poll
>>>>>> interval (5969ms user, 1786ms system)
>>>>>>> ...
>>>>>>> 2024-02-03T11:09:23.770Z|01690|inc_proc_eng|INFO|node: lflow,
>>>> recompute
>>>>>> (failed handler for input port_group) took 9014ms
>>>>>>> 2024-02-03T11:09:23.773Z|01691|timeval|WARN|Unreasonably long 9118ms
>>>> poll
>>>>>> interval (5376ms user, 1515ms system)
>>>>>>> ...
>>>>>>> 2024-02-03T11:09:36.692Z|01699|inc_proc_eng|INFO|node: lflow,
>>>> recompute
>>>>>> (failed handler for input port_group) took 10695ms
>>>>>>> 2024-02-03T11:09:36.696Z|01700|timeval|WARN|Unreasonably long 10890ms
>>>>>> poll interval (6085ms user, 2745ms system)
>>>>>>> ...
>>>>>>> 2024-02-03T11:09:49.133Z|01708|inc_proc_eng|INFO|node: lflow,
>>>> recompute
>>>>>> (failed handler for input port_group) took 9985ms
>>>>>>> 2024-02-03T11:09:49.137Z|01709|timeval|WARN|Unreasonably long 10108ms
>>>>>> poll interval (5521ms user, 2440ms system)
>>>>>>>
>>>>>>> That increases 95%% ovn-installed latency in 500node cluster-density
>>>> from
>>>>>>> 3.6 seconds last week to 21.5 seconds this week.
>>>>>>>
>>>>>>> I think, this should be a release blocker.
>>>>>>>
>>>>>>> Memory usage is also very concerning.  Unfortunately it is not tied
>>>> to the
>>>>>>> cluster-density test.  The same 4-5x RSS jump is also seen in other
>>>> test
>>>>>>> like density-heavy.  Last week RSS of ovn-northd in cluster-density
>>>> 500
>>>>>> node
>>>>>>> was between 1.5 and 2.5 GB, this week we have a range between 5.5 and
>>>> 8.5
>>>>>> GB.
>>>>>>>
>>>>>>> I would consider this as a release blocker as well.
>>>>>>>
>>>
>>> I agree, we shouldn't release 24.03.0 unless these two issues are
>>> (sufficiently) addressed.  We do have until March 1st (official release
>>> date) to do that or to revert any patches that cause regressions.
>>>
>>>>>>>
>>>>>>> I don't have direct evidence that this particular series is a
>>>> culprit, but
>>>>>>> it looks like the most likely candidate.  I can dig more into
>>>>>> investigation
>>>>>>> on Monday.
>>>>>>>
>>>>>>> Best regards, Ilya Maximets.
>>>>>>
>>>>>> Thanks Ilya for reporting this. 95% latency and 4x RSS increase is a
>>>> little
>>>>>> surprising to me. I did test this series with my scale test scripts for
>>>>>> recompute performance regression. It was 10+% increase in latency. I
>>>> even
>>>>>> digged a little into it, and noticed ~5% increase caused by the hmap
>>>> used
>>>>>> to maintain the lflows in each lflow_ref. This was discussed in the code
>>>>>> review for an earlier version (v2/v3). Overall it looked not very bad,
>>>> if
>>>>>> we now handle most common scenarios incrementally, and it is reasonable
>>>> to
>>>>>> have some cost for maintaining the references/index for incremental
>>>>>> processing. I wonder if my test scenario was too simple (didn't have LBs
>>>>>> included) to find the problems, so today I did another test by
>>>> including a
>>>>>> LB group with 1k LBs applied to 100 node-LS & GR, and another 1K LBs per
>>>>>> node-LS & GR (101K LBs in total), and I did see more performance penalty
>>>>>> but still within ~20%. While for memory I didn't notice a significant
>>>>>> increase (<10%). I believe I am missing some specific scenario that had
>>>> the
>>>>>> big impact in the ovn-heater's tests. Please share if you dig out more
>>>>>> clues .
>>>>>
>>>>> Hi Ilya,
>>>>>
>>>>> Thanks for reporting these details.
>>>>>
>>>>> I had a look at this regression.   There is a significant increase in
>>>>> the lflow recompute
>>>>> time (around 4x) in my local testing and this definitely not acceptable.
>>>>>
>>>>> In this particular cluster density test, whenever a port group is
>>>>> created it results in a full recompute
>>>>> and now since recompute time has increased, it has a cumulative effect
>>>>> on the latency of the test.
>>>>>
>>>>> The dp reference counting [1] added in the v4 of this series has
>>>>> introduced this regression (both CPU and memory).
>>>>> I'm working on this fix and I think I should be able to address this soon.
>>>>>
>>>>> [1] - https://github.com/ovn-org/ovn/blob/main/northd/lflow-mgr.c#L696
>>>>>
>>>>>
>>>> Indeed, the same concern I mentioned when reviewing v5:
>>>> https://mail.openvswitch.org/pipermail/ovs-dev/2024-January/411194.html
>>
>> Yeah.  In this cluster-density 500node test, dp_refcnt_use() allocates
>> 120 M references and puts them into hash maps.  That alone takes several
>> GB of memory.  And it is a largest consumer according to massif.
>> It takes ~45% of total memory allocations.  Another ~25% are taken by hash
>> maps where the refcounts are stored.
>>
>> There is a simple optimization that can be made - not allocate refcounts
>> if it is the first time we're trying to add them.  The original bitmap
>> should be enough.  And if we try to use it again while it is already in
>> a bitmap, then allocate.  That saves a lot of space, because 80% of all
>> refcounts in that test never used more than once.  However, it might not
>> be enough.  Needs testing.  Ideally the whole structure needs to go or
>> be replaced by something different.  Frequent iterations over the bitmaps
>> and per-datapath*per-lflow memory allocations do not scale.  Even if we
>> can make it work reasonably well for 500 nodes, it will cause problems on
>> higher scale.
> 
> FWIW, I have a small prototype for this partial solution that seem to
> noticeably decrease the memory usage here:
>   
> https://github.com/igsilya/ovn/commit/5ac3a78e287708fd571344c74a021e418fde31a4
> 
> I need to run this through the scale run to get actual results though.

So, the results for the cluster-density 500 node test are here:

          | Avg. Poll Intervals | Total test time |  northd RSS
--------------------------------+-----------------+---------------
Last week |     1.5 seconds     |  1005 seconds   |    2.5 GB
This week |      6  seconds     |  2246 seconds   |    8.5 GB
Patch     |     2.5 seconds     |  1170 seconds   |    3.1 GB

So, the patch does help a lot.  But we still have 66%  increase
of the poll intervals on northd, i.e. 66% increase in recompute
time.  And about 25% increase in memory usage.

I know that Numan is working on alternative solution to just
remove the reference counting altogether, so I'll hold on sending
an official patch.  Removing the refcounting, if possible, sounds
like a better approach in general.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to