On Tue, Aug 24, 2021 at 1:24 PM Anton Ivanov <[email protected]> wrote: > > > On 24/08/2021 17:35, Ilya Maximets wrote: > > On 8/24/21 6:25 PM, Numan Siddique wrote: > >> On Tue, Aug 24, 2021 at 7:56 AM Anton Ivanov > >> <[email protected]> wrote: > >>> > >>> On 24/08/2021 12:46, Ilya Maximets wrote: > >>>> On 8/24/21 1:18 PM, Anton Ivanov wrote: > >>>>> On 24/08/2021 12:05, Ilya Maximets wrote: > >>>>>> On 8/24/21 7:36 AM, Anton Ivanov wrote: > >>>>>>> On 23/08/2021 22:36, Ilya Maximets wrote: > >>>>>>>> On 8/23/21 10:37 PM, Anton Ivanov wrote: > >>>>>>>>> On 23/08/2021 21:26, Ilya Maximets wrote: > >>>>>>>>>> On 8/23/21 10:20 PM, Anton Ivanov wrote: > >>>>>>>>>>> Should not be the case. > >>>>>>>>>>> > >>>>>>>>>>> The map is pre-sized for the size from the previous iterations. > >>>>>>>>>>> > >>>>>>>>>>> Line 12861 in my tree which is probably a few commits out of date: > >>>>>>>>>>> > >>>>>>>>>>> fast_hmap_size_for(&lflows, max_seen_lflow_size); > >>>>>>>>>>> > >>>>>>>>>>> And immediately after building the lflows: > >>>>>>>>>>> > >>>>>>>>>>> if (hmap_count(&lflows) > max_seen_lflow_size) { > >>>>>>>>>>> max_seen_lflow_size = hmap_count(&lflows); > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> So the map SHOULD be sized correctly - to the most recent seen > >>>>>>>>>>> lflow count. > >>>>>>>>>> Please, re-read the commit message. It's a first run of the loop > >>>>>>>>>> and the 'max_seen_lflow_size' is default 128 at this point. > >>>>>>>>> Ack, > >>>>>>>>> > >>>>>>>>> Not using auto-resizing in single threaded mode is a bug. Thanks > >>>>>>>>> for fixing it. > >>>>>>>>> > >>>>>>>>> From that perspective the patch is a straight +1 from me. > >>>>>>>>> > >>>>>>>>> From the perspective of the use case stated in the commit > >>>>>>>>> message- I am not sure it addresses it. > >>>>>>>>> > >>>>>>>>> If northd is in single-threaded mode and is tackling a GIGANTIC> > >>>>>>>>> database, it may never complete the first iteration before the > >>>>>>>>> expiration of the timers and everyone deciding that northd is AWOL. > >>>>>>>> Well, how do you suggest to fix that? Obviously, we can always > >>>>>>>> create > >>>>>>>> a database that northd will never be able to process in a reasonable > >>>>>>>> amount of time. And it doesn't matter if it's single- or > >>>>>>>> multi-threaded. > >>>>>>>> > >>>>>>>> In this case NbDB is only 9MB in size, which is very reasonable, and > >>>>>>>> northd on my laptop takes more than 15 minutes to process it (I > >>>>>>>> killed > >>>>>>>> it at this point). With the patch applied it took only 11 seconds. > >>>>>>>> So, for me, this patch pretty much fixes the issue. 11 seconds is > >>>>>>>> not > >>>>>>>> that bad, e.g. ovn-k8s configures inactivity probes for clients to > >>>>>>>> 180. > >>>>>>>> It would be great to reduce, but we're not there yet. > >>>>>>>> > >>>>>>>>> In that case, if it is multi-threaded from the start, it should > >>>>>>>>> probably > >>>>>>>>> grab the sizing from the lflow table hash in south db. That would > >>>>>>>>> be a > >>>>>>>>> good approximation for the first run. > >>>>>>>> This will not work for a case where SbDB is empty for any reason > >>>>>>>> while > >>>>>>>> NbDB is not. And there is still a case where northd initially > >>>>>>>> connects > >>>>>>>> to semi-empty databases and after few iterations NbDB receives a big > >>>>>>>> transaction and generates a big update for northd. > >>>>>>> A partial fix is to resize to optimum size the hash after lflow > >>>>>>> processing. > >>>>>> I'm not sure I understand what you mean here, because resizing after > >>>>>> lflow processing will not help. The lflow processing itself is the > >>>>>> very slow part that we're trying to make faster here. > >>>>> That can be the case only with dpgroups. Otherwise lflows are just a > >>>>> destination to dump data and the bucket sizing is irrelevant because > >>>>> there is never any lookup inside lflows during processing. The lflow > >>>>> map is just used to store data. So if it is suboptimal at the exit of > >>>>> build_lflows() the resize will fix it before the first lookup shortly > >>>>> thereafter. > >>>>> > >>>>> Are you running it with dpgroups enabled? In that case there are > >>>>> lookups inside lflows during build which happen under a per-bucket > >>>>> lock. So in addition to suboptimal size when searching the contention > >>>>> depends on the number of buckets. If they are too few, the system > >>>>> becomes heavily contended resulting in ridiculous computation sizes. > >>>> Oh, I see. Indeed, without dp-groups there is no lookup during > >>>> lflow build. I missed that. So, yes, I agree that for a case > >>>> without dp-groups, re-sizing after lflow processing should work. > >>>> We need that for parallel case. > >>>> > >>>> Current patch (use hmap_insert() that resizes if needed) helps > >>>> for all non-parallel cases. > >>> Indeed. It should go in. > >>> > >> Why can't we have hmap_insert() for both parallel and non parallel configs > >> to start with and switch over to hmap_insert_fast() when ovn-northd > >> has successfully connected to SB DB and has approximated on the > >> more accurate hmap size ? > > We can't use hmap_insert() for parallel, because resize of a hash map > > will crash threads that are working on it at the moment, IIUC. > > We actually can for non-dp-groups case, but the merge will become much > slower. Example - the last version of the snapshot and monitor > parallelization for OVS. It no longer uses pre-sized fixed hmaps because it > is nearly impossible to presize an RFC7047 structure correctly. > > For the dp-groups case we can't. Locking of the lflows which are used for > lookups is per hash bucket. You cannot change the number of buckets in the > middle of the run and other locking mechanisms will be more coarse. So the > marginal benefit with dp-groups at present will become none (or even slower). > > > > > We could disable parallel for first several iterations, but still, > > this doesn't cover all the cases. i.e. the one where we have a big > > update from the NbDB with a fairly small DB before that. > > I am working on that. Let me collect some stat data for the current OVN > master with all the recent changes. They have changed the "heat map" for the > dp_groups case. > > We may need to do some adjustments to what and where we parallelize in the > dp-groups case (if at all).
Ok. You're fine with the proposed patch here right ? @Ilya - Can you please update the commit message to include that the issue is seen with dp groups enabled ? I guess you can share the commit message here. No need to submit v2 for that. Thanks Numan Thanks Numan > > A. > > > > >> Thanks > >> Numan > >> > >>> I will sort out the other cases to the extent possible. > >> > >>> Brgds, > >>> > >>> A. > >>> > >>>> I'm mostly running dp-groups + non-parallel which is a default > >>>> case for ovn-heater/ovn-k8s. > >>>> > >>>>> For the case of "dpgroups + parallel + first iteration + pre-existing > >>>>> large database" there is no cure short of pre-allocating the hash to > >>>>> maximum size. > >>>> Yeah, dp-groups + parallel is a hard case. > >>>> > >>>>> I am scale testing that as well as resize (for non-dp-groups cases) at > >>>>> present. > >>>>> > >>>>> Brgds, > >>>>> > >>>>> A. > >>>>> > >>>>>>> If the sizing was correct - 99.9% of the case this will be a noop. > >>>>>>> > >>>>>>> If the sizing was incorrect, it will be resized so that the DP > >>>>>>> searches and all other ops which were recently added for flow > >>>>>>> reduction will work optimally. > >>>>>>> > >>>>>>> This still does not work for lflow compute with dpgroups + parallel > >>>>>>> upon initial connect and without a SB database to use for size > >>>>>>> guidance. It will work for all other cases. > >>>>>>> > >>>>>>> I will send two separate patches to address the cases which can be > >>>>>>> easily addressed and see what can be done with the dp+parallel upon > >>>>>>> initial connect to an empty sb database. > >>>>>>> > >>>>>>> Brgds, > >>>>>>> > >>>>>>> A > >>>>>>> > >>>>>>>>> A. > >>>>>>>>> > >>>>>>>>>>> A. > >>>>>>>>>>> > >>>>>>>>>>> On 23/08/2021 21:02, Ilya Maximets wrote: > >>>>>>>>>>>> 'lflow_map' is never expanded because northd always uses fast > >>>>>>>>>>>> insertion. This leads to the case where we have a hash map > >>>>>>>>>>>> with only 128 initial buckets and every ovn_lflow_find() ends > >>>>>>>>>>>> up iterating over n_lflows / 128 entries. It's thousands of > >>>>>>>>>>>> logical flows or even more. For example, it takes forever for > >>>>>>>>>>>> ovn-northd to start up with the Northbound Db from the 120 node > >>>>>>>>>>>> density-heavy test from ovn-heater, because every lookup is > >>>>>>>>>>>> slower > >>>>>>>>>>>> than previous one. I aborted the process after 15 minutes of > >>>>>>>>>>>> waiting, because there was no sign that it will converge. With > >>>>>>>>>>>> this change applied the loop completes in only 11 seconds. > >>>>>>>>>>>> > >>>>>>>>>>>> Hash map will be pre-allocated to the maximum seen number of > >>>>>>>>>>>> logical flows on a second iteration, but this doesn't help for > >>>>>>>>>>>> the first iteration when northd first time connects to a big > >>>>>>>>>>>> Northbound database, which is a common case during failover or > >>>>>>>>>>>> cluster upgrade. And there is an even trickier case where big > >>>>>>>>>>>> NbDB transaction that explodes the number of logical flows > >>>>>>>>>>>> received > >>>>>>>>>>>> on not the first run. > >>>>>>>>>>>> > >>>>>>>>>>>> We can't expand the hash map in case of parallel build, so this > >>>>>>>>>>>> should be fixed separately. > >>>>>>>>>>>> > >>>>>>>>>>>> CC: Anton Ivanov <[email protected]> > >>>>>>>>>>>> Fixes: 74daa0607c7f ("ovn-northd: Introduce parallel lflow > >>>>>>>>>>>> build") > >>>>>>>>>>>> Signed-off-by: Ilya Maximets <[email protected]> > >>>>>>>>>>>> --- > >>>>>>>>>>>> northd/ovn-northd.c | 6 +++++- > >>>>>>>>>>>> 1 file changed, 5 insertions(+), 1 deletion(-) > >>>>>>>>>>>> > >>>>>>>>>>>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > >>>>>>>>>>>> index 3d8e21a4f..40cf957c0 100644 > >>>>>>>>>>>> --- a/northd/ovn-northd.c > >>>>>>>>>>>> +++ b/northd/ovn-northd.c > >>>>>>>>>>>> @@ -4387,7 +4387,11 @@ do_ovn_lflow_add(struct hmap *lflow_map, > >>>>>>>>>>>> struct ovn_datapath *od, > >>>>>>>>>>>> nullable_xstrdup(ctrl_meter), > >>>>>>>>>>>> ovn_lflow_hint(stage_hint), where); > >>>>>>>>>>>> hmapx_add(&lflow->od_group, od); > >>>>>>>>>>>> - hmap_insert_fast(lflow_map, &lflow->hmap_node, hash); > >>>>>>>>>>>> + if (!use_parallel_build) { > >>>>>>>>>>>> + hmap_insert(lflow_map, &lflow->hmap_node, hash); > >>>>>>>>>>>> + } else { > >>>>>>>>>>>> + hmap_insert_fast(lflow_map, &lflow->hmap_node, hash); > >>>>>>>>>>>> + } > >>>>>>>>>>>> } > >>>>>>>>>>>> /* Adds a row with the specified contents to the > >>>>>>>>>>>> Logical_Flow table. */ > >>> -- > >>> Anton R. Ivanov > >>> Cambridgegreys Limited. Registered in England. Company Number 10273661 > >>> https://www.cambridgegreys.com/ > >>> > >>> _______________________________________________ > >>> dev mailing list > >>> [email protected] > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > -- > Anton R. Ivanov > Cambridgegreys Limited. Registered in England. Company Number 10273661 > https://www.cambridgegreys.com/ > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
