On Tue, Aug 24, 2021 at 2:20 PM Anton Ivanov <[email protected]> wrote: > > On 24/08/2021 19:16, Ilya Maximets wrote: > > On 8/24/21 7:48 PM, Numan Siddique wrote: > >> On Tue, Aug 24, 2021 at 1:24 PM Anton Ivanov > >> <[email protected]> wrote: > >>> > >>> On 24/08/2021 17:35, Ilya Maximets wrote: > >>>> On 8/24/21 6:25 PM, Numan Siddique wrote: > >>>>> On Tue, Aug 24, 2021 at 7:56 AM Anton Ivanov > >>>>> <[email protected]> wrote: > >>>>>> On 24/08/2021 12:46, Ilya Maximets wrote: > >>>>>>> On 8/24/21 1:18 PM, Anton Ivanov wrote: > >>>>>>>> On 24/08/2021 12:05, Ilya Maximets wrote: > >>>>>>>>> On 8/24/21 7:36 AM, Anton Ivanov wrote: > >>>>>>>>>> On 23/08/2021 22:36, Ilya Maximets wrote: > >>>>>>>>>>> On 8/23/21 10:37 PM, Anton Ivanov wrote: > >>>>>>>>>>>> On 23/08/2021 21:26, Ilya Maximets wrote: > >>>>>>>>>>>>> On 8/23/21 10:20 PM, Anton Ivanov wrote: > >>>>>>>>>>>>>> Should not be the case. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The map is pre-sized for the size from the previous iterations. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Line 12861 in my tree which is probably a few commits out of > >>>>>>>>>>>>>> date: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> fast_hmap_size_for(&lflows, max_seen_lflow_size); > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> And immediately after building the lflows: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> if (hmap_count(&lflows) > max_seen_lflow_size) { > >>>>>>>>>>>>>> max_seen_lflow_size = hmap_count(&lflows); > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> So the map SHOULD be sized correctly - to the most recent seen > >>>>>>>>>>>>>> lflow count. > >>>>>>>>>>>>> Please, re-read the commit message. It's a first run of the > >>>>>>>>>>>>> loop > >>>>>>>>>>>>> and the 'max_seen_lflow_size' is default 128 at this point. > >>>>>>>>>>>> Ack, > >>>>>>>>>>>> > >>>>>>>>>>>> Not using auto-resizing in single threaded mode is a bug. Thanks > >>>>>>>>>>>> for fixing it. > >>>>>>>>>>>> > >>>>>>>>>>>> From that perspective the patch is a straight +1 from me. > >>>>>>>>>>>> > >>>>>>>>>>>> From the perspective of the use case stated in the commit > >>>>>>>>>>>> message- I am not sure it addresses it. > >>>>>>>>>>>> > >>>>>>>>>>>> If northd is in single-threaded mode and is tackling a GIGANTIC> > >>>>>>>>>>>> database, it may never complete the first iteration before the > >>>>>>>>>>>> expiration of the timers and everyone deciding that northd is > >>>>>>>>>>>> AWOL. > >>>>>>>>>>> Well, how do you suggest to fix that? Obviously, we can always > >>>>>>>>>>> create > >>>>>>>>>>> a database that northd will never be able to process in a > >>>>>>>>>>> reasonable > >>>>>>>>>>> amount of time. And it doesn't matter if it's single- or > >>>>>>>>>>> multi-threaded. > >>>>>>>>>>> > >>>>>>>>>>> In this case NbDB is only 9MB in size, which is very reasonable, > >>>>>>>>>>> and > >>>>>>>>>>> northd on my laptop takes more than 15 minutes to process it (I > >>>>>>>>>>> killed > >>>>>>>>>>> it at this point). With the patch applied it took only 11 > >>>>>>>>>>> seconds. > >>>>>>>>>>> So, for me, this patch pretty much fixes the issue. 11 seconds > >>>>>>>>>>> is not > >>>>>>>>>>> that bad, e.g. ovn-k8s configures inactivity probes for clients > >>>>>>>>>>> to 180. > >>>>>>>>>>> It would be great to reduce, but we're not there yet. > >>>>>>>>>>> > >>>>>>>>>>>> In that case, if it is multi-threaded from the start, it should > >>>>>>>>>>>> probably > >>>>>>>>>>>> grab the sizing from the lflow table hash in south db. That > >>>>>>>>>>>> would be a > >>>>>>>>>>>> good approximation for the first run. > >>>>>>>>>>> This will not work for a case where SbDB is empty for any reason > >>>>>>>>>>> while > >>>>>>>>>>> NbDB is not. And there is still a case where northd initially > >>>>>>>>>>> connects > >>>>>>>>>>> to semi-empty databases and after few iterations NbDB receives a > >>>>>>>>>>> big > >>>>>>>>>>> transaction and generates a big update for northd. > >>>>>>>>>> A partial fix is to resize to optimum size the hash after lflow > >>>>>>>>>> processing. > >>>>>>>>> I'm not sure I understand what you mean here, because resizing after > >>>>>>>>> lflow processing will not help. The lflow processing itself is the > >>>>>>>>> very slow part that we're trying to make faster here. > >>>>>>>> That can be the case only with dpgroups. Otherwise lflows are just a > >>>>>>>> destination to dump data and the bucket sizing is irrelevant because > >>>>>>>> there is never any lookup inside lflows during processing. The lflow > >>>>>>>> map is just used to store data. So if it is suboptimal at the exit > >>>>>>>> of build_lflows() the resize will fix it before the first lookup > >>>>>>>> shortly thereafter. > >>>>>>>> > >>>>>>>> Are you running it with dpgroups enabled? In that case there are > >>>>>>>> lookups inside lflows during build which happen under a per-bucket > >>>>>>>> lock. So in addition to suboptimal size when searching the > >>>>>>>> contention depends on the number of buckets. If they are too few, > >>>>>>>> the system becomes heavily contended resulting in ridiculous > >>>>>>>> computation sizes. > >>>>>>> Oh, I see. Indeed, without dp-groups there is no lookup during > >>>>>>> lflow build. I missed that. So, yes, I agree that for a case > >>>>>>> without dp-groups, re-sizing after lflow processing should work. > >>>>>>> We need that for parallel case. > >>>>>>> > >>>>>>> Current patch (use hmap_insert() that resizes if needed) helps > >>>>>>> for all non-parallel cases. > >>>>>> Indeed. It should go in. > >>>>>> > >>>>> Why can't we have hmap_insert() for both parallel and non parallel > >>>>> configs > >>>>> to start with and switch over to hmap_insert_fast() when ovn-northd > >>>>> has successfully connected to SB DB and has approximated on the > >>>>> more accurate hmap size ? > >>>> We can't use hmap_insert() for parallel, because resize of a hash map > >>>> will crash threads that are working on it at the moment, IIUC. > >>> We actually can for non-dp-groups case, but the merge will become much > >>> slower. Example - the last version of the snapshot and monitor > >>> parallelization for OVS. It no longer uses pre-sized fixed hmaps because > >>> it is nearly impossible to presize an RFC7047 structure correctly. > >>> > >>> For the dp-groups case we can't. Locking of the lflows which are used for > >>> lookups is per hash bucket. You cannot change the number of buckets in > >>> the middle of the run and other locking mechanisms will be more coarse. > >>> So the marginal benefit with dp-groups at present will become none (or > >>> even slower). > >>> > >>>> We could disable parallel for first several iterations, but still, > >>>> this doesn't cover all the cases. i.e. the one where we have a big > >>>> update from the NbDB with a fairly small DB before that. > >>> I am working on that. Let me collect some stat data for the current OVN > >>> master with all the recent changes. They have changed the "heat map" for > >>> the dp_groups case. > >>> > >>> We may need to do some adjustments to what and where we parallelize in > >>> the dp-groups case (if at all). > >> Ok. You're fine with the proposed patch here right ? > >> > >> @Ilya - Can you please update the commit message to include that the > >> issue is seen with dp groups enabled ? > > I'm not sure about that, because issue will still be there, but in a > > different place. We will not have lookups during construction of new > > logical flows, but we will have them while trying to reconcile existing > > SbDB flows and this will be slow too, i.e. the loop > > '/* Push changes to the Logical_Flow table to database. */' will be a > > a few thousand times slower. So, the issue is less severe, but it > > still there even with dp groups disabled.
Right. Thanks for pointing this out. I applied this patch to the main branch and backported to branch-21.06. Numan > > I am OK with the log entry as is. In the absencee of a > hmap_expand(&lflows) right after the computation it is valid for both > dp_groups and non dp_groups cases. > > > > >> I guess you can share the commit message here. No need to submit v2 for > >> that. > >> > >> Thanks > >> Numan > >> > >> Thanks > >> Numan > >>> A. > >>> > >>>>> Thanks > >>>>> Numan > >>>>> > >>>>>> I will sort out the other cases to the extent possible. > >>>>>> Brgds, > >>>>>> > >>>>>> A. > >>>>>> > >>>>>>> I'm mostly running dp-groups + non-parallel which is a default > >>>>>>> case for ovn-heater/ovn-k8s. > >>>>>>> > >>>>>>>> For the case of "dpgroups + parallel + first iteration + > >>>>>>>> pre-existing large database" there is no cure short of > >>>>>>>> pre-allocating the hash to maximum size. > >>>>>>> Yeah, dp-groups + parallel is a hard case. > >>>>>>> > >>>>>>>> I am scale testing that as well as resize (for non-dp-groups cases) > >>>>>>>> at present. > >>>>>>>> > >>>>>>>> Brgds, > >>>>>>>> > >>>>>>>> A. > >>>>>>>> > >>>>>>>>>> If the sizing was correct - 99.9% of the case this will be a noop. > >>>>>>>>>> > >>>>>>>>>> If the sizing was incorrect, it will be resized so that the DP > >>>>>>>>>> searches and all other ops which were recently added for flow > >>>>>>>>>> reduction will work optimally. > >>>>>>>>>> > >>>>>>>>>> This still does not work for lflow compute with dpgroups + > >>>>>>>>>> parallel upon initial connect and without a SB database to use for > >>>>>>>>>> size guidance. It will work for all other cases. > >>>>>>>>>> > >>>>>>>>>> I will send two separate patches to address the cases which can be > >>>>>>>>>> easily addressed and see what can be done with the dp+parallel > >>>>>>>>>> upon initial connect to an empty sb database. > >>>>>>>>>> > >>>>>>>>>> Brgds, > >>>>>>>>>> > >>>>>>>>>> A > >>>>>>>>>> > >>>>>>>>>>>> A. > >>>>>>>>>>>> > >>>>>>>>>>>>>> A. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 23/08/2021 21:02, Ilya Maximets wrote: > >>>>>>>>>>>>>>> 'lflow_map' is never expanded because northd always uses fast > >>>>>>>>>>>>>>> insertion. This leads to the case where we have a hash map > >>>>>>>>>>>>>>> with only 128 initial buckets and every ovn_lflow_find() ends > >>>>>>>>>>>>>>> up iterating over n_lflows / 128 entries. It's thousands of > >>>>>>>>>>>>>>> logical flows or even more. For example, it takes forever for > >>>>>>>>>>>>>>> ovn-northd to start up with the Northbound Db from the 120 > >>>>>>>>>>>>>>> node > >>>>>>>>>>>>>>> density-heavy test from ovn-heater, because every lookup is > >>>>>>>>>>>>>>> slower > >>>>>>>>>>>>>>> than previous one. I aborted the process after 15 minutes of > >>>>>>>>>>>>>>> waiting, because there was no sign that it will converge. > >>>>>>>>>>>>>>> With > >>>>>>>>>>>>>>> this change applied the loop completes in only 11 seconds. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hash map will be pre-allocated to the maximum seen number of > >>>>>>>>>>>>>>> logical flows on a second iteration, but this doesn't help for > >>>>>>>>>>>>>>> the first iteration when northd first time connects to a big > >>>>>>>>>>>>>>> Northbound database, which is a common case during failover or > >>>>>>>>>>>>>>> cluster upgrade. And there is an even trickier case where big > >>>>>>>>>>>>>>> NbDB transaction that explodes the number of logical flows > >>>>>>>>>>>>>>> received > >>>>>>>>>>>>>>> on not the first run. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> We can't expand the hash map in case of parallel build, so > >>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>> should be fixed separately. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> CC: Anton Ivanov <[email protected]> > >>>>>>>>>>>>>>> Fixes: 74daa0607c7f ("ovn-northd: Introduce parallel lflow > >>>>>>>>>>>>>>> build") > >>>>>>>>>>>>>>> Signed-off-by: Ilya Maximets <[email protected]> > >>>>>>>>>>>>>>> --- > >>>>>>>>>>>>>>> northd/ovn-northd.c | 6 +++++- > >>>>>>>>>>>>>>> 1 file changed, 5 insertions(+), 1 deletion(-) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > >>>>>>>>>>>>>>> index 3d8e21a4f..40cf957c0 100644 > >>>>>>>>>>>>>>> --- a/northd/ovn-northd.c > >>>>>>>>>>>>>>> +++ b/northd/ovn-northd.c > >>>>>>>>>>>>>>> @@ -4387,7 +4387,11 @@ do_ovn_lflow_add(struct hmap > >>>>>>>>>>>>>>> *lflow_map, struct ovn_datapath *od, > >>>>>>>>>>>>>>> nullable_xstrdup(ctrl_meter), > >>>>>>>>>>>>>>> ovn_lflow_hint(stage_hint), where); > >>>>>>>>>>>>>>> hmapx_add(&lflow->od_group, od); > >>>>>>>>>>>>>>> - hmap_insert_fast(lflow_map, &lflow->hmap_node, hash); > >>>>>>>>>>>>>>> + if (!use_parallel_build) { > >>>>>>>>>>>>>>> + hmap_insert(lflow_map, &lflow->hmap_node, hash); > >>>>>>>>>>>>>>> + } else { > >>>>>>>>>>>>>>> + hmap_insert_fast(lflow_map, &lflow->hmap_node, hash); > >>>>>>>>>>>>>>> + } > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> /* Adds a row with the specified contents to the > >>>>>>>>>>>>>>> Logical_Flow table. */ > >>>>>> -- > >>>>>> Anton R. Ivanov > >>>>>> Cambridgegreys Limited. Registered in England. Company Number 10273661 > >>>>>> https://www.cambridgegreys.com/ > >>>>>> > >>>>>> _______________________________________________ > >>>>>> dev mailing list > >>>>>> [email protected] > >>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >>> -- > >>> Anton R. Ivanov > >>> Cambridgegreys Limited. Registered in England. Company Number 10273661 > >>> https://www.cambridgegreys.com/ > >>> > >>> _______________________________________________ > >>> dev mailing list > >>> [email protected] > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >>> > > > > -- > Anton R. Ivanov > Cambridgegreys Limited. Registered in England. Company Number 10273661 > https://www.cambridgegreys.com/ > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
