On 5/15/23 16:24, Eelco Chaudron wrote: > > > On 4 May 2023, at 0:55, Ilya Maximets wrote: > >> On 3/27/23 13:25, Eelco Chaudron wrote: >>> Make the read of the current seq->value atomic, i.e., not needing to >>> acquire the global mutex when reading it. On 64-bit systems, this >>> incurs no overhead, and it will avoid the mutex and potentially >>> a system call. >>> >>> For incrementing the value followed by waking up the threads, we are >>> still taking the mutex, so the current behavior is not changing. The >>> seq_read() behavior is already defined as, "Returns seq's current >>> sequence number (which could change immediately)". So the change >>> should not impact the current behavior. >>> >>> Signed-off-by: Eelco Chaudron <[email protected]> >>> --- >>> lib/ovs-rcu.c | 2 +- >>> lib/seq.c | 37 ++++++++++++++++--------------------- >>> lib/seq.h | 1 - >>> 3 files changed, 17 insertions(+), 23 deletions(-) >>> >>> diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c >>> index 946aa04d1..9e07d9bab 100644 >>> --- a/lib/ovs-rcu.c >>> +++ b/lib/ovs-rcu.c >>> @@ -170,7 +170,7 @@ ovsrcu_try_quiesce(void) >>> ovs_assert(!single_threaded()); >>> perthread = ovsrcu_perthread_get(); >>> if (!seq_try_lock()) { >>> - perthread->seqno = seq_read_protected(global_seqno); >>> + perthread->seqno = seq_read(global_seqno); >>> if (perthread->cbset) { >>> ovsrcu_flush_cbset__(perthread, true); >>> } >>> diff --git a/lib/seq.c b/lib/seq.c >>> index 99e5bf8bd..22646f3f8 100644 >>> --- a/lib/seq.c >>> +++ b/lib/seq.c >>> @@ -32,7 +32,7 @@ COVERAGE_DEFINE(seq_change); >>> >>> /* A sequence number object. */ >>> struct seq { >>> - uint64_t value OVS_GUARDED; >>> + atomic_uint64_t value; >>> struct hmap waiters OVS_GUARDED; /* Contains 'struct seq_waiter's. */ >>> }; >>> >>> @@ -72,6 +72,7 @@ static void seq_wake_waiters(struct seq *) >>> OVS_REQUIRES(seq_mutex); >>> struct seq * OVS_EXCLUDED(seq_mutex) >>> seq_create(void) >>> { >>> + uint64_t seq_value; >>> struct seq *seq; >>> >>> seq_init(); >>> @@ -81,7 +82,8 @@ seq_create(void) >>> COVERAGE_INC(seq_change); >>> >>> ovs_mutex_lock(&seq_mutex); >>> - seq->value = seq_next++; >>> + seq_value = seq_next++; >>> + atomic_store_relaxed(&seq->value, seq_value); >>> hmap_init(&seq->waiters); >>> ovs_mutex_unlock(&seq_mutex); >>> >>> @@ -126,9 +128,11 @@ void >>> seq_change_protected(struct seq *seq) >>> OVS_REQUIRES(seq_mutex) >>> { >>> + uint64_t seq_value = seq_next++; >>> + >>> COVERAGE_INC(seq_change); >>> >>> - seq->value = seq_next++; >>> + atomic_store_relaxed(&seq->value, seq_value); >>> seq_wake_waiters(seq); >>> } >>> >>> @@ -143,18 +147,6 @@ seq_change(struct seq *seq) >>> ovs_mutex_unlock(&seq_mutex); >>> } >>> >>> -/* Returns 'seq''s current sequence number (which could change >>> immediately). >>> - * >>> - * seq_read() and seq_wait() can be used together to yield a race-free >>> wakeup >>> - * when an object changes, even without an ability to lock the object. See >>> - * Usage in seq.h for details. */ >>> -uint64_t >>> -seq_read_protected(const struct seq *seq) >>> - OVS_REQUIRES(seq_mutex) >>> -{ >>> - return seq->value; >>> -} >>> - >>> /* Returns 'seq''s current sequence number (which could change >>> immediately). >>> * >>> * seq_read() and seq_wait() can be used together to yield a race-free >>> wakeup >>> @@ -162,14 +154,15 @@ seq_read_protected(const struct seq *seq) >>> * Usage in seq.h for details. */ >>> uint64_t >>> seq_read(const struct seq *seq) >>> - OVS_EXCLUDED(seq_mutex) >>> { >>> uint64_t value; >>> >>> - ovs_mutex_lock(&seq_mutex); >>> - value = seq_read_protected(seq); >>> - ovs_mutex_unlock(&seq_mutex); >>> - >>> + /* We use atomic_read(), memory_order_seq_cst, to simulate the full >>> + * memory barrier you would get with locking and unlocking the global >>> + * mutex. >> >> Hi, Eelco. I'm not sure this is actually correct. >> >> We're performing memory_order_seq_cst operation on the value. >> >> Sequential consistency means just acquire-release semantics in our case, >> since >> we're pairing with non-seq-cst operations. And acquire-release semantics >> works >> between acquire loads and release stores on the same atomic. But all the >> loads >> on the value we have with this change are relaxed. And the reader doesn't >> take >> the mutex. Meaning there is no acquire-release pair that can synchronize the >> memory between threads. memory_order_seq_cst read on its own can't really >> guarantee any synchronization. >> >> Also, the documentation for the sequence number library explicitly says that >> """ >> seq_change() synchronizes with seq_read() and >> seq_wait() on the same variable in release-acquire fashion. That >> is, all effects of the memory accesses performed by a thread prior >> to seq_change() are visible to the threads returning from >> seq_read() or seq_wait() observing that change. >> """ >> >> And I don't think that's the case with this change anymore. > > It was more than a year ago I did this patch, so can’t really remember my > reasoning for doing it this way, and thinking it’s fine :) > > I think it might have been the following: > > 1) Before increasing the value in seq_change() there is a full memory barrier > due to taking the lock > 2a) Now if we read the atomic seq in seq_read() and this new value is > incremented in the seq_read above, the memory should be in sync > 2b) If we read it before we increase it, it’s the old value and it should not > matter if the read has happened or not. > > Maybe I miss something obvious, if so let me know.
The problem here is not the value of the sequence number, but the fact that pair seq_change() + seq_read() is documented as an acquire-release synchronization pair. Meaning that these functions has to provide memory synchronization on unrelated memory locations. And we don't have an acquire-release pair of operations that would ensure that, IIUC. It might work with current implementations for well known architectures today, but I don't think it's required for any of these operations to always perform full memory barriers regardless of memory location. Might be fixed by using release semantics for stores though, I guess. Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
