Re: [ovs-dev] [PATCH ovn] northd: Don't get stuck in the STATE_INIT_HASH_SIZES state.

Ilya Maximets Wed, 24 Aug 2022 08:49:28 -0700

On 8/24/22 17:43, Ilya Maximets wrote:
> On 8/23/22 16:42, Dumitru Ceara wrote:
>> After removing the possibility of disabling logical datapath groups the
>> above mentioned state should only be relevant when ovn-northd is started
>> with more than one lflow processing threads (--n-threads > 1), to avoid
>> a very inefficient first lflow processing run due to a suboptimally
>> sized hmap.
>>
>> There's no need to ever go back to this state now that we don't allow
>> users to disable logical datapath groups.  Moreover, the previous code
>> was wrongfully setting the state to STATE_INIT_HASH_SIZES at every
>> iteration, essentially disabling parallelization.
>>
>> Fixes: 9dea0c090e91 ("nb: Remove possibility of disabling logical datapath 
>> groups.")
>> Reported-by: Ilya Maximets <[email protected]>
>> Signed-off-by: Dumitru Ceara <[email protected]>
>> ---
> 
> Hi, Dumitru.  Thanks for the patch!
> I've been using it in multiple scale tests since it was posted,
> and it does re-enable parallel lflow build.
> 
> One issue though.  Not sure how to fix that or what is the desired
> behavior, but what I noticed is:
> 
>  1. Cluster with 3 northd instances starts up.
>  2. After a while all of them has been active at least once, due
>     to leadership transfers or other things.
>  3. If one instance stays active for long, a lot of database changes
>     can be accumulated.
>  4. Once the leadership flips, new active northd is not prepared for
>     the amount of logical flows it suddenly needs to manage.  So,
>     it starts parallel processing with hmap significantly smaller
>     than it actually needs right now.  That iteration takes noticeably
>     more time than a usual one.  After the first iteration after
>     becoming active, it can process lflows with a normal speed.
> 
> So, I wonder if we can fall into a catastrophic scenario where
> the current hash map size is very small and amount of flows after
> re-connection is very high, so threads are only fighting for locks
> all the time.
> 
> Do you have some thoughts on this?
> 
> Should we disable parallelism when northd goes into stand-up state?


* I meant 'standby', of course.  :/

> 
> Best regards, Ilya Maximets.
> 
>>  northd/northd.c | 10 ----------
>>  1 file changed, 10 deletions(-)
>>
>> diff --git a/northd/northd.c b/northd/northd.c
>> index 7e2681865..33943bfaf 100644
>> --- a/northd/northd.c
>> +++ b/northd/northd.c
>> @@ -14164,15 +14164,6 @@ void run_update_worker_pool(int n_threads)
>>      }
>>  }
>>  
>> -static void init_worker_pool(void)
>> -{
>> -    /* If parallelization is enabled, make sure locks are initialized. */
>> -    if (parallelization_state != STATE_NULL) {
>> -        lflow_hash_lock_init();
>> -        parallelization_state = STATE_INIT_HASH_SIZES;
>> -    }
>> -}
>> -
>>  static void
>>  build_mcast_groups(struct lflow_input *data,
>>                     const struct hmap *datapaths,
>> @@ -15762,7 +15753,6 @@ void northd_run(struct northd_input *input_data,
>>                  struct ovsdb_idl_txn *ovnsb_txn)
>>  {
>>      stopwatch_start(OVNNB_DB_RUN_STOPWATCH_NAME, time_msec());
>> -    init_worker_pool();
>>      ovnnb_db_run(input_data, data, ovnnb_txn, ovnsb_txn,
>>                   input_data->sbrec_chassis_by_name,
>>                   input_data->sbrec_chassis_by_hostname);
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] northd: Don't get stuck in the STATE_INIT_HASH_SIZES state.

Reply via email to