On 30 Mar 2023, at 17:57, Jon Kohler wrote:

>> On Mar 28, 2023, at 1:32 PM, Mike Pattrick <[email protected]> wrote:
>>
>> On Mon, Mar 27, 2023 at 7:25 AM Eelco Chaudron <[email protected]> wrote:
>>>
>>> Make the read of the current seq->value atomic, i.e., not needing to
>>> acquire the global mutex when reading it. On 64-bit systems, this
>>> incurs no overhead, and it will avoid the mutex and potentially
>>> a system call.
>>>
>>> For incrementing the value followed by waking up the threads, we are
>>> still taking the mutex, so the current behavior is not changing. The
>>> seq_read() behavior is already defined as, "Returns seq's current
>>> sequence number (which could change immediately)". So the change
>>> should not impact the current behavior.
>>>
>>> Signed-off-by: Eelco Chaudron <[email protected]>
>>> ---
>>> lib/ovs-rcu.c |    2 +-
>>> lib/seq.c     |   37 ++++++++++++++++---------------------
>>> lib/seq.h     |    1 -
>>> 3 files changed, 17 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
>>> index 946aa04d1..9e07d9bab 100644
>>> --- a/lib/ovs-rcu.c
>>> +++ b/lib/ovs-rcu.c
>>> @@ -170,7 +170,7 @@ ovsrcu_try_quiesce(void)
>>>     ovs_assert(!single_threaded());
>>>     perthread = ovsrcu_perthread_get();
>>>     if (!seq_try_lock()) {
>>> -        perthread->seqno = seq_read_protected(global_seqno);
>>> +        perthread->seqno = seq_read(global_seqno);
>>>         if (perthread->cbset) {
>>>             ovsrcu_flush_cbset__(perthread, true);
>>>         }
>>> diff --git a/lib/seq.c b/lib/seq.c
>>> index 99e5bf8bd..22646f3f8 100644
>>> --- a/lib/seq.c
>>> +++ b/lib/seq.c
>>> @@ -32,7 +32,7 @@ COVERAGE_DEFINE(seq_change);
>>>
>>> /* A sequence number object. */
>>> struct seq {
>>> -    uint64_t value OVS_GUARDED;
>>> +    atomic_uint64_t value;
>>>     struct hmap waiters OVS_GUARDED; /* Contains 'struct seq_waiter's. */
>>> };
>>>
>>> @@ -72,6 +72,7 @@ static void seq_wake_waiters(struct seq *) 
>>> OVS_REQUIRES(seq_mutex);
>>> struct seq * OVS_EXCLUDED(seq_mutex)
>>> seq_create(void)
>>> {
>>> +    uint64_t seq_value;
>>>     struct seq *seq;
>>>
>>>     seq_init();
>>> @@ -81,7 +82,8 @@ seq_create(void)
>>>     COVERAGE_INC(seq_change);
>>>
>>>     ovs_mutex_lock(&seq_mutex);
>>
>> I think it's also possible to get rid of the mutex here. Operations
>> like modifying a bridge or interface can result in calling seq_create
>> dozens of times, which could potentially lead to contention in cases
>> when a lot of interfaces are constantly added or removed. I'm mainly
>> thinking about kubernetes here.
>>
>> Other than that, I found this patch to reduce locking on seq_mutex
>> pretty significantly with a userspace workload:
>>
>> Before: 468727.2/sec
>> After: 229265.8/sec
>>
>> The rest of this patch looks good to me, but what do you think about:
>>
>> diff --git a/lib/seq.c b/lib/seq.c
>> index 22646f3f8..09fdccce5 100644
>> --- a/lib/seq.c
>> +++ b/lib/seq.c
>> @@ -57,7 +57,7 @@ struct seq_thread {
>>
>> static struct ovs_mutex seq_mutex = OVS_MUTEX_INITIALIZER;
>>
>> -static uint64_t seq_next OVS_GUARDED_BY(seq_mutex) = 1;
>> +static atomic_uint64_t seq_next = 1;
>>
>> static pthread_key_t seq_thread_key;
>>
>> @@ -81,11 +81,9 @@ seq_create(void)
>>
>>     COVERAGE_INC(seq_change);
>>
>> -    ovs_mutex_lock(&seq_mutex);
>> -    seq_value = seq_next++;
>> +    seq_value = atomic_count_inc64(&seq_next);
>>     atomic_store_relaxed(&seq->value, seq_value);
>>     hmap_init(&seq->waiters);
>> -    ovs_mutex_unlock(&seq_mutex);
>>
>>     return seq;
>> }
>> @@ -128,7 +126,7 @@ void
>> seq_change_protected(struct seq *seq)
>>     OVS_REQUIRES(seq_mutex)
>> {
>> -    uint64_t seq_value = seq_next++;
>> +    uint64_t seq_value = atomic_count_inc64(&seq_next);
>>
>>     COVERAGE_INC(seq_change);
>>
>
> Mike - Seems like a good idea at first blush?
>
> Eelco,
> This patch has perfect timing, as we’re currently chasing down some
> overhead on large systems, e.g. quad socket or newer chips with many
> dozens/hundreds of processors per system. In these systems, we see
> a lot of overhead from the revalidator threads coming online and
> constantly competing for the mutex in seq_read(), seq_woke(), and
> seq_change(). This leads to ovs-vswitchd being on-CPU constantly at
> somewhere between 20-80% of one core as per top. This time comes from
> spending a ton of time contending for the lock, then just futex'ing in
> and out of sleep as a result.
>
> Along these same lines of thinking, would it be possible to get rid of
> this mutex in seq_change() too? Perhaps we could have some atomic val
> that indicated if there was a waiter, and if so, then take the mutex
> to deal with the wakeup, but if not, don't take the mutex at all?

I’m not an expert on the seq code, but quickly looking at the code, the mutex 
is to protect the seq->waiters. Just having a waiter count outside of the mutex 
will probably not work, as there is still a raise condition when adding a new 
waiter.

Maybe you could work on a patch, and see there a now corner cases with this 
idea…

> Thats just trying to further optimize, even with this patch there is
> gains by easing contention on the mutex from all of the read calls, so
> great work!
>
> While I'm on the thought: The next point of contention specifically in
> the revalidator case that I'm tracking is under nl_dump_next() where
> we contend for dump->mutex just to do a short lived update to
> dump->status. Perhaps that can be a spinlock? I'll start a separate
> thread after I try that out.

I do not like spin locks in userspace, as it could cause longer locks when the 
thread is moved off the CPU.

From the code, it looks like we might only need the dump_mutex to protect the 
nl_dump_refill(), and other nl_* operations, not the actual dump->status 
updates. With some puzzling, we can probably update it atomically with the 
atomic_compare_exchange functions.

//Eelco

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to