> On Mar 30, 2023, at 11:57 AM, Jon Kohler <[email protected]> wrote:
> 
> 
> 
>> On Mar 28, 2023, at 1:32 PM, Mike Pattrick <[email protected]> wrote:
>> 
>> On Mon, Mar 27, 2023 at 7:25 AM Eelco Chaudron <[email protected]> wrote:
>>> 
>>> Make the read of the current seq->value atomic, i.e., not needing to
>>> acquire the global mutex when reading it. On 64-bit systems, this
>>> incurs no overhead, and it will avoid the mutex and potentially
>>> a system call.
>>> 
>>> For incrementing the value followed by waking up the threads, we are
>>> still taking the mutex, so the current behavior is not changing. The
>>> seq_read() behavior is already defined as, "Returns seq's current
>>> sequence number (which could change immediately)". So the change
>>> should not impact the current behavior.
>>> 
>>> Signed-off-by: Eelco Chaudron <[email protected]>
>>> ---
>>> lib/ovs-rcu.c |    2 +-
>>> lib/seq.c     |   37 ++++++++++++++++---------------------
>>> lib/seq.h     |    1 -
>>> 3 files changed, 17 insertions(+), 23 deletions(-)
>>> 
>>> diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
>>> index 946aa04d1..9e07d9bab 100644
>>> --- a/lib/ovs-rcu.c
>>> +++ b/lib/ovs-rcu.c
>>> @@ -170,7 +170,7 @@ ovsrcu_try_quiesce(void)
>>>    ovs_assert(!single_threaded());
>>>    perthread = ovsrcu_perthread_get();
>>>    if (!seq_try_lock()) {
>>> -        perthread->seqno = seq_read_protected(global_seqno);
>>> +        perthread->seqno = seq_read(global_seqno);
>>>        if (perthread->cbset) {
>>>            ovsrcu_flush_cbset__(perthread, true);
>>>        }
>>> diff --git a/lib/seq.c b/lib/seq.c
>>> index 99e5bf8bd..22646f3f8 100644
>>> --- a/lib/seq.c
>>> +++ b/lib/seq.c
>>> @@ -32,7 +32,7 @@ COVERAGE_DEFINE(seq_change);
>>> 
>>> /* A sequence number object. */
>>> struct seq {
>>> -    uint64_t value OVS_GUARDED;
>>> +    atomic_uint64_t value;
>>>    struct hmap waiters OVS_GUARDED; /* Contains 'struct seq_waiter's. */
>>> };
>>> 
>>> @@ -72,6 +72,7 @@ static void seq_wake_waiters(struct seq *) 
>>> OVS_REQUIRES(seq_mutex);
>>> struct seq * OVS_EXCLUDED(seq_mutex)
>>> seq_create(void)
>>> {
>>> +    uint64_t seq_value;
>>>    struct seq *seq;
>>> 
>>>    seq_init();
>>> @@ -81,7 +82,8 @@ seq_create(void)
>>>    COVERAGE_INC(seq_change);
>>> 
>>>    ovs_mutex_lock(&seq_mutex);
>> 
>> I think it's also possible to get rid of the mutex here. Operations
>> like modifying a bridge or interface can result in calling seq_create
>> dozens of times, which could potentially lead to contention in cases
>> when a lot of interfaces are constantly added or removed. I'm mainly
>> thinking about kubernetes here.
>> 
>> Other than that, I found this patch to reduce locking on seq_mutex
>> pretty significantly with a userspace workload:
>> 
>> Before: 468727.2/sec
>> After: 229265.8/sec
>> 
>> The rest of this patch looks good to me, but what do you think about:
>> 
>> diff --git a/lib/seq.c b/lib/seq.c
>> index 22646f3f8..09fdccce5 100644
>> --- a/lib/seq.c
>> +++ b/lib/seq.c
>> @@ -57,7 +57,7 @@ struct seq_thread {
>> 
>> static struct ovs_mutex seq_mutex = OVS_MUTEX_INITIALIZER;
>> 
>> -static uint64_t seq_next OVS_GUARDED_BY(seq_mutex) = 1;
>> +static atomic_uint64_t seq_next = 1;
>> 
>> static pthread_key_t seq_thread_key;
>> 
>> @@ -81,11 +81,9 @@ seq_create(void)
>> 
>>    COVERAGE_INC(seq_change);
>> 
>> -    ovs_mutex_lock(&seq_mutex);
>> -    seq_value = seq_next++;
>> +    seq_value = atomic_count_inc64(&seq_next);
>>    atomic_store_relaxed(&seq->value, seq_value);
>>    hmap_init(&seq->waiters);
>> -    ovs_mutex_unlock(&seq_mutex);
>> 
>>    return seq;
>> }
>> @@ -128,7 +126,7 @@ void
>> seq_change_protected(struct seq *seq)
>>    OVS_REQUIRES(seq_mutex)
>> {
>> -    uint64_t seq_value = seq_next++;
>> +    uint64_t seq_value = atomic_count_inc64(&seq_next);
>> 
>>    COVERAGE_INC(seq_change);
>> 
> 
> Mike - Seems like a good idea at first blush?
> 
> Eelco, 
> This patch has perfect timing, as we’re currently chasing down some
> overhead on large systems, e.g. quad socket or newer chips with many
> dozens/hundreds of processors per system. In these systems, we see
> a lot of overhead from the revalidator threads coming online and
> constantly competing for the mutex in seq_read(), seq_woke(), and
> seq_change(). This leads to ovs-vswitchd being on-CPU constantly at
> somewhere between 20-80% of one core as per top. This time comes from
> spending a ton of time contending for the lock, then just futex'ing in
> and out of sleep as a result.
> 
> Along these same lines of thinking, would it be possible to get rid of
> this mutex in seq_change() too? Perhaps we could have some atomic val
> that indicated if there was a waiter, and if so, then take the mutex
> to deal with the wakeup, but if not, don't take the mutex at all?
> 
> Thats just trying to further optimize, even with this patch there is
> gains by easing contention on the mutex from all of the read calls, so
> great work!
> 
> While I'm on the thought: The next point of contention specifically in
> the revalidator case that I'm tracking is under nl_dump_next() where
> we contend for dump->mutex just to do a short lived update to 
> dump->status. Perhaps that can be a spinlock? I'll start a separate 
> thread after I try that out.
> 
> Jon

Your change inspired me to dig a bit deeper. For the revalidator_sweep()
call that the revalidator threads do, it ends up calling ovsrcu_quiesce() in
a loop, which we see quite a bit of contention under with a lot of futex 
sleep/wake.

With your change, we’ll get rid of the potential futex under seq_read, but
we will still acquire mutex in seq_change() as well as potentially in
ovsrcu_flush_cbset(). We could optimize potential locking from 3x to 2x
with your change, and then 1x per ovsrcu_quiesce() with the following
change.

Thoughts?

Happy to send this out as a separate change on top of yours if you’d like.

--- a/lib/ovs-rcu.c
+++ b/lib/ovs-rcu.c
@@ -155,10 +155,12 @@ ovsrcu_quiesce(void)
 
     perthread = ovsrcu_perthread_get();
     perthread->seqno = seq_read(global_seqno);
+    seq_lock();
     if (perthread->cbset) {
-        ovsrcu_flush_cbset(perthread);
+        ovsrcu_flush_cbset__(perthread, true);
     }
-    seq_change(global_seqno);
+    seq_change_protected(global_seqno);
+    seq_unlock();
 
     ovsrcu_quiesced();
 }

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to