Re: [ovs-dev] [PATCH] seq: Make read of the current value atomic

Jon Kohler Thu, 30 Mar 2023 08:57:46 -0700


> On Mar 28, 2023, at 1:32 PM, Mike Pattrick <[email protected]> wrote:
> 
> On Mon, Mar 27, 2023 at 7:25 AM Eelco Chaudron <[email protected]> wrote:
>> 
>> Make the read of the current seq->value atomic, i.e., not needing to
>> acquire the global mutex when reading it. On 64-bit systems, this
>> incurs no overhead, and it will avoid the mutex and potentially
>> a system call.
>> 
>> For incrementing the value followed by waking up the threads, we are
>> still taking the mutex, so the current behavior is not changing. The
>> seq_read() behavior is already defined as, "Returns seq's current
>> sequence number (which could change immediately)". So the change
>> should not impact the current behavior.
>> 
>> Signed-off-by: Eelco Chaudron <[email protected]>
>> ---
>> lib/ovs-rcu.c |    2 +-
>> lib/seq.c     |   37 ++++++++++++++++---------------------
>> lib/seq.h     |    1 -
>> 3 files changed, 17 insertions(+), 23 deletions(-)
>> 
>> diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
>> index 946aa04d1..9e07d9bab 100644
>> --- a/lib/ovs-rcu.c
>> +++ b/lib/ovs-rcu.c
>> @@ -170,7 +170,7 @@ ovsrcu_try_quiesce(void)
>>     ovs_assert(!single_threaded());
>>     perthread = ovsrcu_perthread_get();
>>     if (!seq_try_lock()) {
>> -        perthread->seqno = seq_read_protected(global_seqno);
>> +        perthread->seqno = seq_read(global_seqno);
>>         if (perthread->cbset) {
>>             ovsrcu_flush_cbset__(perthread, true);
>>         }
>> diff --git a/lib/seq.c b/lib/seq.c
>> index 99e5bf8bd..22646f3f8 100644
>> --- a/lib/seq.c
>> +++ b/lib/seq.c
>> @@ -32,7 +32,7 @@ COVERAGE_DEFINE(seq_change);
>> 
>> /* A sequence number object. */
>> struct seq {
>> -    uint64_t value OVS_GUARDED;
>> +    atomic_uint64_t value;
>>     struct hmap waiters OVS_GUARDED; /* Contains 'struct seq_waiter's. */
>> };
>> 
>> @@ -72,6 +72,7 @@ static void seq_wake_waiters(struct seq *) 
>> OVS_REQUIRES(seq_mutex);
>> struct seq * OVS_EXCLUDED(seq_mutex)
>> seq_create(void)
>> {
>> +    uint64_t seq_value;
>>     struct seq *seq;
>> 
>>     seq_init();
>> @@ -81,7 +82,8 @@ seq_create(void)
>>     COVERAGE_INC(seq_change);
>> 
>>     ovs_mutex_lock(&seq_mutex);
> 
> I think it's also possible to get rid of the mutex here. Operations
> like modifying a bridge or interface can result in calling seq_create
> dozens of times, which could potentially lead to contention in cases
> when a lot of interfaces are constantly added or removed. I'm mainly
> thinking about kubernetes here.
> 
> Other than that, I found this patch to reduce locking on seq_mutex
> pretty significantly with a userspace workload:
> 
> Before: 468727.2/sec
> After: 229265.8/sec
> 
> The rest of this patch looks good to me, but what do you think about:
> 
> diff --git a/lib/seq.c b/lib/seq.c
> index 22646f3f8..09fdccce5 100644
> --- a/lib/seq.c
> +++ b/lib/seq.c
> @@ -57,7 +57,7 @@ struct seq_thread {
> 
> static struct ovs_mutex seq_mutex = OVS_MUTEX_INITIALIZER;
> 
> -static uint64_t seq_next OVS_GUARDED_BY(seq_mutex) = 1;
> +static atomic_uint64_t seq_next = 1;
> 
> static pthread_key_t seq_thread_key;
> 
> @@ -81,11 +81,9 @@ seq_create(void)
> 
>     COVERAGE_INC(seq_change);
> 
> -    ovs_mutex_lock(&seq_mutex);
> -    seq_value = seq_next++;
> +    seq_value = atomic_count_inc64(&seq_next);
>     atomic_store_relaxed(&seq->value, seq_value);
>     hmap_init(&seq->waiters);
> -    ovs_mutex_unlock(&seq_mutex);
> 
>     return seq;
> }
> @@ -128,7 +126,7 @@ void
> seq_change_protected(struct seq *seq)
>     OVS_REQUIRES(seq_mutex)
> {
> -    uint64_t seq_value = seq_next++;
> +    uint64_t seq_value = atomic_count_inc64(&seq_next);
> 
>     COVERAGE_INC(seq_change);
>


Mike - Seems like a good idea at first blush?

Eelco, 
This patch has perfect timing, as we’re currently chasing down some
overhead on large systems, e.g. quad socket or newer chips with many
dozens/hundreds of processors per system. In these systems, we see
a lot of overhead from the revalidator threads coming online and
constantly competing for the mutex in seq_read(), seq_woke(), and
seq_change(). This leads to ovs-vswitchd being on-CPU constantly at
somewhere between 20-80% of one core as per top. This time comes from
spending a ton of time contending for the lock, then just futex'ing in
and out of sleep as a result.

Along these same lines of thinking, would it be possible to get rid of
this mutex in seq_change() too? Perhaps we could have some atomic val
that indicated if there was a waiter, and if so, then take the mutex
to deal with the wakeup, but if not, don't take the mutex at all?

Thats just trying to further optimize, even with this patch there is
gains by easing contention on the mutex from all of the read calls, so
great work!

While I'm on the thought: The next point of contention specifically in
the revalidator case that I'm tracking is under nl_dump_next() where
we contend for dump->mutex just to do a short lived update to 
dump->status. Perhaps that can be a spinlock? I'll start a separate 
thread after I try that out.

Jon

> _______________________________________________
> dev mailing list
> [email protected]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIGaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=J3WKdism6FcwVUgXro62hA3y_rJCV4cmBOy5edXHN1-PSyvYZHawAQDRuEowlngE&s=uijzFxNVvFMOOsSdejvK-CnCkROd2hcUMA1VRwzu8ds&e=
>  

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] seq: Make read of the current value atomic

Reply via email to