> On Mar 28, 2023, at 1:32 PM, Mike Pattrick <[email protected]> wrote: > > On Mon, Mar 27, 2023 at 7:25 AM Eelco Chaudron <[email protected]> wrote: >> >> Make the read of the current seq->value atomic, i.e., not needing to >> acquire the global mutex when reading it. On 64-bit systems, this >> incurs no overhead, and it will avoid the mutex and potentially >> a system call. >> >> For incrementing the value followed by waking up the threads, we are >> still taking the mutex, so the current behavior is not changing. The >> seq_read() behavior is already defined as, "Returns seq's current >> sequence number (which could change immediately)". So the change >> should not impact the current behavior. >> >> Signed-off-by: Eelco Chaudron <[email protected]> >> --- >> lib/ovs-rcu.c | 2 +- >> lib/seq.c | 37 ++++++++++++++++--------------------- >> lib/seq.h | 1 - >> 3 files changed, 17 insertions(+), 23 deletions(-) >> >> diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c >> index 946aa04d1..9e07d9bab 100644 >> --- a/lib/ovs-rcu.c >> +++ b/lib/ovs-rcu.c >> @@ -170,7 +170,7 @@ ovsrcu_try_quiesce(void) >> ovs_assert(!single_threaded()); >> perthread = ovsrcu_perthread_get(); >> if (!seq_try_lock()) { >> - perthread->seqno = seq_read_protected(global_seqno); >> + perthread->seqno = seq_read(global_seqno); >> if (perthread->cbset) { >> ovsrcu_flush_cbset__(perthread, true); >> } >> diff --git a/lib/seq.c b/lib/seq.c >> index 99e5bf8bd..22646f3f8 100644 >> --- a/lib/seq.c >> +++ b/lib/seq.c >> @@ -32,7 +32,7 @@ COVERAGE_DEFINE(seq_change); >> >> /* A sequence number object. */ >> struct seq { >> - uint64_t value OVS_GUARDED; >> + atomic_uint64_t value; >> struct hmap waiters OVS_GUARDED; /* Contains 'struct seq_waiter's. */ >> }; >> >> @@ -72,6 +72,7 @@ static void seq_wake_waiters(struct seq *) >> OVS_REQUIRES(seq_mutex); >> struct seq * OVS_EXCLUDED(seq_mutex) >> seq_create(void) >> { >> + uint64_t seq_value; >> struct seq *seq; >> >> seq_init(); >> @@ -81,7 +82,8 @@ seq_create(void) >> COVERAGE_INC(seq_change); >> >> ovs_mutex_lock(&seq_mutex); > > I think it's also possible to get rid of the mutex here. Operations > like modifying a bridge or interface can result in calling seq_create > dozens of times, which could potentially lead to contention in cases > when a lot of interfaces are constantly added or removed. I'm mainly > thinking about kubernetes here. > > Other than that, I found this patch to reduce locking on seq_mutex > pretty significantly with a userspace workload: > > Before: 468727.2/sec > After: 229265.8/sec > > The rest of this patch looks good to me, but what do you think about: > > diff --git a/lib/seq.c b/lib/seq.c > index 22646f3f8..09fdccce5 100644 > --- a/lib/seq.c > +++ b/lib/seq.c > @@ -57,7 +57,7 @@ struct seq_thread { > > static struct ovs_mutex seq_mutex = OVS_MUTEX_INITIALIZER; > > -static uint64_t seq_next OVS_GUARDED_BY(seq_mutex) = 1; > +static atomic_uint64_t seq_next = 1; > > static pthread_key_t seq_thread_key; > > @@ -81,11 +81,9 @@ seq_create(void) > > COVERAGE_INC(seq_change); > > - ovs_mutex_lock(&seq_mutex); > - seq_value = seq_next++; > + seq_value = atomic_count_inc64(&seq_next); > atomic_store_relaxed(&seq->value, seq_value); > hmap_init(&seq->waiters); > - ovs_mutex_unlock(&seq_mutex); > > return seq; > } > @@ -128,7 +126,7 @@ void > seq_change_protected(struct seq *seq) > OVS_REQUIRES(seq_mutex) > { > - uint64_t seq_value = seq_next++; > + uint64_t seq_value = atomic_count_inc64(&seq_next); > > COVERAGE_INC(seq_change); >
Mike - Seems like a good idea at first blush? Eelco, This patch has perfect timing, as we’re currently chasing down some overhead on large systems, e.g. quad socket or newer chips with many dozens/hundreds of processors per system. In these systems, we see a lot of overhead from the revalidator threads coming online and constantly competing for the mutex in seq_read(), seq_woke(), and seq_change(). This leads to ovs-vswitchd being on-CPU constantly at somewhere between 20-80% of one core as per top. This time comes from spending a ton of time contending for the lock, then just futex'ing in and out of sleep as a result. Along these same lines of thinking, would it be possible to get rid of this mutex in seq_change() too? Perhaps we could have some atomic val that indicated if there was a waiter, and if so, then take the mutex to deal with the wakeup, but if not, don't take the mutex at all? Thats just trying to further optimize, even with this patch there is gains by easing contention on the mutex from all of the read calls, so great work! While I'm on the thought: The next point of contention specifically in the revalidator case that I'm tracking is under nl_dump_next() where we contend for dump->mutex just to do a short lived update to dump->status. Perhaps that can be a spinlock? I'll start a separate thread after I try that out. Jon > _______________________________________________ > dev mailing list > [email protected] > https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIGaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=J3WKdism6FcwVUgXro62hA3y_rJCV4cmBOy5edXHN1-PSyvYZHawAQDRuEowlngE&s=uijzFxNVvFMOOsSdejvK-CnCkROd2hcUMA1VRwzu8ds&e= > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
