Re: [ovs-dev] [PATCH ovn] ovn-northd: Fix extremely inefficient usage of lflow hash map.

Ilya Maximets Tue, 24 Aug 2021 04:46:44 -0700

On 8/24/21 1:18 PM, Anton Ivanov wrote:
> 
> On 24/08/2021 12:05, Ilya Maximets wrote:
>> On 8/24/21 7:36 AM, Anton Ivanov wrote:
>>> On 23/08/2021 22:36, Ilya Maximets wrote:
>>>> On 8/23/21 10:37 PM, Anton Ivanov wrote:
>>>>> On 23/08/2021 21:26, Ilya Maximets wrote:
>>>>>> On 8/23/21 10:20 PM, Anton Ivanov wrote:
>>>>>>> Should not be the case.
>>>>>>>
>>>>>>> The map is pre-sized for the size from the previous iterations.
>>>>>>>
>>>>>>> Line 12861 in my tree which is probably a few commits out of date:
>>>>>>>
>>>>>>>        fast_hmap_size_for(&lflows, max_seen_lflow_size);
>>>>>>>
>>>>>>> And immediately after building the lflows:
>>>>>>>
>>>>>>>        if (hmap_count(&lflows) > max_seen_lflow_size) {
>>>>>>>            max_seen_lflow_size = hmap_count(&lflows);
>>>>>>>        }
>>>>>>>
>>>>>>> So the map SHOULD be sized correctly - to the most recent seen lflow 
>>>>>>> count.
>>>>>> Please, re-read the commit message.  It's a first run of the loop
>>>>>> and the 'max_seen_lflow_size' is default 128 at this point.
>>>>> Ack,
>>>>>
>>>>> Not using auto-resizing in single threaded mode is a bug. Thanks for 
>>>>> fixing it.
>>>>>
>>>>>   From that perspective the patch is a straight +1 from me.
>>>>>
>>>>>   From the perspective of the use case stated in the commit message- I am 
>>>>> not sure it addresses it.
>>>>>
>>>>> If northd is in single-threaded mode and is tackling a GIGANTIC> 
>>>>> database, it may never complete the first iteration before the
>>>>> expiration of the timers and everyone deciding that northd is AWOL.
>>>> Well, how do you suggest to fix that?  Obviously, we can always create
>>>> a database that northd will never be able to process in a reasonable
>>>> amount of time.  And it doesn't matter if it's single- or multi-threaded.
>>>>
>>>> In this case NbDB is only 9MB in size, which is very reasonable, and
>>>> northd on my laptop takes more than 15 minutes to process it (I killed
>>>> it at this point).  With the patch applied it took only 11 seconds.
>>>> So, for me, this patch pretty much fixes the issue.  11 seconds is not
>>>> that bad, e.g. ovn-k8s configures inactivity probes for clients to 180.
>>>> It would be great to reduce, but we're not there yet.
>>>>
>>>>> In that case, if it is multi-threaded from the start, it should probably
>>>>> grab the sizing from the lflow table hash in south db. That would be a
>>>>> good approximation for the first run.
>>>> This will not work for a case where SbDB is empty for any reason while
>>>> NbDB is not.  And there is still a case where northd initially connects
>>>> to semi-empty databases and after few iterations NbDB receives a big
>>>> transaction and generates a big update for northd.
>>> A partial fix is to resize to optimum size the hash after lflow processing.
>> I'm not sure I understand what you mean here, because resizing after
>> lflow processing will not help.  The lflow processing itself is the
>> very slow part that we're trying to make faster here.
> 
> That can be the case only with dpgroups. Otherwise lflows are just a 
> destination to dump data and the bucket sizing is irrelevant because there is 
> never any lookup inside lflows during processing. The lflow map is just used 
> to store data. So if it is suboptimal at the exit of build_lflows() the 
> resize will fix it before the first lookup shortly thereafter.
> 
> Are you running it with dpgroups enabled? In that case there are lookups 
> inside lflows during build which happen under a per-bucket lock. So in 
> addition to suboptimal size when searching the contention depends on the 
> number of buckets. If they are too few, the system becomes heavily contended 
> resulting in ridiculous computation sizes.


Oh, I see.  Indeed, without dp-groups there is no lookup during
lflow build.  I missed that.  So, yes, I agree that for a case
without dp-groups, re-sizing after lflow processing should work.
We need that for parallel case.

Current patch (use hmap_insert() that resizes if needed) helps
for all non-parallel cases.

I'm mostly running dp-groups + non-parallel which is a default
case for ovn-heater/ovn-k8s.

> 
> For the case of "dpgroups + parallel + first iteration + pre-existing large 
> database" there is no cure short of pre-allocating the hash to maximum size.

Yeah, dp-groups + parallel is a hard case.

> 
> I am scale testing that as well as resize (for non-dp-groups cases) at 
> present.
> 
> Brgds,
> 
> A.
> 
>>
>>> If the sizing was correct - 99.9% of the case this will be a noop.
>>>
>>> If the sizing was incorrect, it will be resized so that the DP searches and 
>>> all other ops which were recently added for flow reduction will work 
>>> optimally.
>>>
>>> This still does not work for lflow compute with dpgroups + parallel upon 
>>> initial connect and without a SB database to use for size guidance. It will 
>>> work for all other cases.
>>>
>>> I will send two separate patches to address the cases which can be easily 
>>> addressed and see what can be done with the dp+parallel upon initial 
>>> connect to an empty sb database.
>>>
>>> Brgds,
>>>
>>> A
>>>
>>>>> A.
>>>>>
>>>>>>> A.
>>>>>>>
>>>>>>> On 23/08/2021 21:02, Ilya Maximets wrote:
>>>>>>>> 'lflow_map' is never expanded because northd always uses fast
>>>>>>>> insertion.  This leads to the case where we have a hash map
>>>>>>>> with only 128 initial buckets and every ovn_lflow_find() ends
>>>>>>>> up iterating over n_lflows / 128 entries.  It's thousands of
>>>>>>>> logical flows or even more.  For example, it takes forever for
>>>>>>>> ovn-northd to start up with the Northbound Db from the 120 node
>>>>>>>> density-heavy test from ovn-heater, because every lookup is slower
>>>>>>>> than previous one.  I aborted the process after 15 minutes of
>>>>>>>> waiting, because there was no sign that it will converge.  With
>>>>>>>> this change applied the loop completes in only 11 seconds.
>>>>>>>>
>>>>>>>> Hash map will be pre-allocated to the maximum seen number of
>>>>>>>> logical flows on a second iteration, but this doesn't help for
>>>>>>>> the first iteration when northd first time connects to a big
>>>>>>>> Northbound database, which is a common case during failover or
>>>>>>>> cluster upgrade.  And there is an even trickier case where big
>>>>>>>> NbDB transaction that explodes the number of logical flows received
>>>>>>>> on not the first run.
>>>>>>>>
>>>>>>>> We can't expand the hash map in case of parallel build, so this
>>>>>>>> should be fixed separately.
>>>>>>>>
>>>>>>>> CC: Anton Ivanov <[email protected]>
>>>>>>>> Fixes: 74daa0607c7f ("ovn-northd: Introduce parallel lflow build")
>>>>>>>> Signed-off-by: Ilya Maximets <[email protected]>
>>>>>>>> ---
>>>>>>>>      northd/ovn-northd.c | 6 +++++-
>>>>>>>>      1 file changed, 5 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>>>>>>>> index 3d8e21a4f..40cf957c0 100644
>>>>>>>> --- a/northd/ovn-northd.c
>>>>>>>> +++ b/northd/ovn-northd.c
>>>>>>>> @@ -4387,7 +4387,11 @@ do_ovn_lflow_add(struct hmap *lflow_map, struct 
>>>>>>>> ovn_datapath *od,
>>>>>>>>                         nullable_xstrdup(ctrl_meter),
>>>>>>>>                         ovn_lflow_hint(stage_hint), where);
>>>>>>>>          hmapx_add(&lflow->od_group, od);
>>>>>>>> -    hmap_insert_fast(lflow_map, &lflow->hmap_node, hash);
>>>>>>>> +    if (!use_parallel_build) {
>>>>>>>> +        hmap_insert(lflow_map, &lflow->hmap_node, hash);
>>>>>>>> +    } else {
>>>>>>>> +        hmap_insert_fast(lflow_map, &lflow->hmap_node, hash);
>>>>>>>> +    }
>>>>>>>>      }
>>>>>>>>        /* Adds a row with the specified contents to the Logical_Flow 
>>>>>>>> table. */
>>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] ovn-northd: Fix extremely inefficient usage of lflow hash map.

Reply via email to