On 8/24/22 17:43, Ilya Maximets wrote:
> On 8/23/22 16:42, Dumitru Ceara wrote:
>> After removing the possibility of disabling logical datapath groups the
>> above mentioned state should only be relevant when ovn-northd is started
>> with more than one lflow processing threads (--n-threads > 1), to avoid
>> a very inefficient first lflow processing run due to a suboptimally
>> sized hmap.
>>
>> There's no need to ever go back to this state now that we don't allow
>> users to disable logical datapath groups. Moreover, the previous code
>> was wrongfully setting the state to STATE_INIT_HASH_SIZES at every
>> iteration, essentially disabling parallelization.
>>
>> Fixes: 9dea0c090e91 ("nb: Remove possibility of disabling logical datapath
>> groups.")
>> Reported-by: Ilya Maximets <[email protected]>
>> Signed-off-by: Dumitru Ceara <[email protected]>
>> ---
>
> Hi, Dumitru. Thanks for the patch!
> I've been using it in multiple scale tests since it was posted,
> and it does re-enable parallel lflow build.
>
> One issue though. Not sure how to fix that or what is the desired
> behavior, but what I noticed is:
>
> 1. Cluster with 3 northd instances starts up.
> 2. After a while all of them has been active at least once, due
> to leadership transfers or other things.
> 3. If one instance stays active for long, a lot of database changes
> can be accumulated.
> 4. Once the leadership flips, new active northd is not prepared for
> the amount of logical flows it suddenly needs to manage. So,
> it starts parallel processing with hmap significantly smaller
> than it actually needs right now. That iteration takes noticeably
> more time than a usual one. After the first iteration after
> becoming active, it can process lflows with a normal speed.
>
> So, I wonder if we can fall into a catastrophic scenario where
> the current hash map size is very small and amount of flows after
> re-connection is very high, so threads are only fighting for locks
> all the time.
>
> Do you have some thoughts on this?
>
> Should we disable parallelism when northd goes into stand-up state?
* I meant 'standby', of course. :/
>
> Best regards, Ilya Maximets.
>
>> northd/northd.c | 10 ----------
>> 1 file changed, 10 deletions(-)
>>
>> diff --git a/northd/northd.c b/northd/northd.c
>> index 7e2681865..33943bfaf 100644
>> --- a/northd/northd.c
>> +++ b/northd/northd.c
>> @@ -14164,15 +14164,6 @@ void run_update_worker_pool(int n_threads)
>> }
>> }
>>
>> -static void init_worker_pool(void)
>> -{
>> - /* If parallelization is enabled, make sure locks are initialized. */
>> - if (parallelization_state != STATE_NULL) {
>> - lflow_hash_lock_init();
>> - parallelization_state = STATE_INIT_HASH_SIZES;
>> - }
>> -}
>> -
>> static void
>> build_mcast_groups(struct lflow_input *data,
>> const struct hmap *datapaths,
>> @@ -15762,7 +15753,6 @@ void northd_run(struct northd_input *input_data,
>> struct ovsdb_idl_txn *ovnsb_txn)
>> {
>> stopwatch_start(OVNNB_DB_RUN_STOPWATCH_NAME, time_msec());
>> - init_worker_pool();
>> ovnnb_db_run(input_data, data, ovnnb_txn, ovnsb_txn,
>> input_data->sbrec_chassis_by_name,
>> input_data->sbrec_chassis_by_hostname);
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev