On 5/16/23 10:20, Eelco Chaudron wrote:
> 
> 
> On 15 May 2023, at 17:47, Ilya Maximets wrote:
> 
>> On 5/15/23 16:24, Eelco Chaudron wrote:
>>>
>>>
>>> On 4 May 2023, at 0:55, Ilya Maximets wrote:
>>>
>>>> On 3/27/23 13:25, Eelco Chaudron wrote:
>>>>> Make the read of the current seq->value atomic, i.e., not needing to
>>>>> acquire the global mutex when reading it. On 64-bit systems, this
>>>>> incurs no overhead, and it will avoid the mutex and potentially
>>>>> a system call.
>>>>>
>>>>> For incrementing the value followed by waking up the threads, we are
>>>>> still taking the mutex, so the current behavior is not changing. The
>>>>> seq_read() behavior is already defined as, "Returns seq's current
>>>>> sequence number (which could change immediately)". So the change
>>>>> should not impact the current behavior.
>>>>>
>>>>> Signed-off-by: Eelco Chaudron <[email protected]>
>>>>> ---
>>>>>  lib/ovs-rcu.c |    2 +-
>>>>>  lib/seq.c     |   37 ++++++++++++++++---------------------
>>>>>  lib/seq.h     |    1 -
>>>>>  3 files changed, 17 insertions(+), 23 deletions(-)
>>>>>
>>>>> diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
>>>>> index 946aa04d1..9e07d9bab 100644
>>>>> --- a/lib/ovs-rcu.c
>>>>> +++ b/lib/ovs-rcu.c
>>>>> @@ -170,7 +170,7 @@ ovsrcu_try_quiesce(void)
>>>>>      ovs_assert(!single_threaded());
>>>>>      perthread = ovsrcu_perthread_get();
>>>>>      if (!seq_try_lock()) {
>>>>> -        perthread->seqno = seq_read_protected(global_seqno);
>>>>> +        perthread->seqno = seq_read(global_seqno);
>>>>>          if (perthread->cbset) {
>>>>>              ovsrcu_flush_cbset__(perthread, true);
>>>>>          }
>>>>> diff --git a/lib/seq.c b/lib/seq.c
>>>>> index 99e5bf8bd..22646f3f8 100644
>>>>> --- a/lib/seq.c
>>>>> +++ b/lib/seq.c
>>>>> @@ -32,7 +32,7 @@ COVERAGE_DEFINE(seq_change);
>>>>>
>>>>>  /* A sequence number object. */
>>>>>  struct seq {
>>>>> -    uint64_t value OVS_GUARDED;
>>>>> +    atomic_uint64_t value;
>>>>>      struct hmap waiters OVS_GUARDED; /* Contains 'struct seq_waiter's. */
>>>>>  };
>>>>>
>>>>> @@ -72,6 +72,7 @@ static void seq_wake_waiters(struct seq *) 
>>>>> OVS_REQUIRES(seq_mutex);
>>>>>  struct seq * OVS_EXCLUDED(seq_mutex)
>>>>>  seq_create(void)
>>>>>  {
>>>>> +    uint64_t seq_value;
>>>>>      struct seq *seq;
>>>>>
>>>>>      seq_init();
>>>>> @@ -81,7 +82,8 @@ seq_create(void)
>>>>>      COVERAGE_INC(seq_change);
>>>>>
>>>>>      ovs_mutex_lock(&seq_mutex);
>>>>> -    seq->value = seq_next++;
>>>>> +    seq_value = seq_next++;
>>>>> +    atomic_store_relaxed(&seq->value, seq_value);
>>>>>      hmap_init(&seq->waiters);
>>>>>      ovs_mutex_unlock(&seq_mutex);
>>>>>
>>>>> @@ -126,9 +128,11 @@ void
>>>>>  seq_change_protected(struct seq *seq)
>>>>>      OVS_REQUIRES(seq_mutex)
>>>>>  {
>>>>> +    uint64_t seq_value = seq_next++;
>>>>> +
>>>>>      COVERAGE_INC(seq_change);
>>>>>
>>>>> -    seq->value = seq_next++;
>>>>> +    atomic_store_relaxed(&seq->value, seq_value);
>>>>>      seq_wake_waiters(seq);
>>>>>  }
>>>>>
>>>>> @@ -143,18 +147,6 @@ seq_change(struct seq *seq)
>>>>>      ovs_mutex_unlock(&seq_mutex);
>>>>>  }
>>>>>
>>>>> -/* Returns 'seq''s current sequence number (which could change 
>>>>> immediately).
>>>>> - *
>>>>> - * seq_read() and seq_wait() can be used together to yield a race-free 
>>>>> wakeup
>>>>> - * when an object changes, even without an ability to lock the object.  
>>>>> See
>>>>> - * Usage in seq.h for details. */
>>>>> -uint64_t
>>>>> -seq_read_protected(const struct seq *seq)
>>>>> -    OVS_REQUIRES(seq_mutex)
>>>>> -{
>>>>> -    return seq->value;
>>>>> -}
>>>>> -
>>>>>  /* Returns 'seq''s current sequence number (which could change 
>>>>> immediately).
>>>>>   *
>>>>>   * seq_read() and seq_wait() can be used together to yield a race-free 
>>>>> wakeup
>>>>> @@ -162,14 +154,15 @@ seq_read_protected(const struct seq *seq)
>>>>>   * Usage in seq.h for details. */
>>>>>  uint64_t
>>>>>  seq_read(const struct seq *seq)
>>>>> -    OVS_EXCLUDED(seq_mutex)
>>>>>  {
>>>>>      uint64_t value;
>>>>>
>>>>> -    ovs_mutex_lock(&seq_mutex);
>>>>> -    value = seq_read_protected(seq);
>>>>> -    ovs_mutex_unlock(&seq_mutex);
>>>>> -
>>>>> +    /* We use atomic_read(), memory_order_seq_cst, to simulate the full
>>>>> +     * memory barrier you would get with locking and unlocking the global
>>>>> +     * mutex.
>>>>
>>>> Hi, Eelco.   I'm not sure this is actually correct.
>>>>
>>>> We're performing memory_order_seq_cst operation on the value.
>>>>
>>>> Sequential consistency means just acquire-release semantics in our case, 
>>>> since
>>>> we're pairing with non-seq-cst operations.  And acquire-release semantics 
>>>> works
>>>> between acquire loads and release stores on the same atomic.  But all the 
>>>> loads
>>>> on the value we have with this change are relaxed.  And the reader doesn't 
>>>> take
>>>> the mutex.  Meaning there is no acquire-release pair that can synchronize 
>>>> the
>>>> memory between threads.  memory_order_seq_cst read on its own can't really
>>>> guarantee any synchronization.
>>>>
>>>> Also, the documentation for the sequence number library explicitly says 
>>>> that
>>>> """
>>>>  seq_change() synchronizes with seq_read() and
>>>>  seq_wait() on the same variable in release-acquire fashion.  That
>>>>  is, all effects of the memory accesses performed by a thread prior
>>>>  to seq_change() are visible to the threads returning from
>>>>  seq_read() or seq_wait() observing that change.
>>>> """
>>>>
>>>> And I don't think that's the case with this change anymore.
>>>
>>> It was more than a year ago I did this patch, so can’t really remember my 
>>> reasoning for doing it this way, and thinking it’s fine :)
>>>
>>> I think it might have been the following:
>>>
>>> 1) Before increasing the value in seq_change() there is a full memory 
>>> barrier due to taking the lock
>>> 2a) Now if we read the atomic seq in seq_read() and this new value is 
>>> incremented in the seq_read above, the memory should be in sync
>>> 2b) If we read it before we increase it, it’s the old value and it should 
>>> not matter if the read has happened or not.
>>>
>>> Maybe I miss something obvious, if so let me know.
>>
>> The problem here is not the value of the sequence number, but the
>> fact that pair seq_change() + seq_read() is documented as an
>> acquire-release synchronization pair.  Meaning that these functions
>> has to provide memory synchronization on unrelated memory locations.
> 
> Yes, that’s what I tried to explain, but looks like I failed ;) I think it’s 
> a bit less restricted, as it is only for the sequence number returned by 
> seq_read compare to the new sequence number generated by seq_change.
> 
>> And we don't have an acquire-release pair of operations that would
>> ensure that, IIUC.  It might work with current implementations for
>> well known architectures today, but I don't think it's required for
>> any of these operations to always perform full memory barriers
>> regardless of memory location.
> 
> Yes, you are right, I think the memory barrier is not a requirement for the 
> mutex_lock(), I guess only for the mutex_unlock().
> 
>> Might be fixed by using release semantics for stores though, I guess.
> 
> So something like this could solve it:
> 
> void
> seq_change_protected(struct seq *seq)
>     OVS_REQUIRES(seq_mutex)
> {
>     uint64_t seq_value = seq_next++;
> 
>     COVERAGE_INC(seq_change);
> 
>     atomic_thread_fence(memory_order_release);
>     atomic_store_relaxed(&seq->value, seq_value);
>     seq_wake_waiters(seq);
> }
> 
> uint64_t
> seq_read(const struct seq *seq)
> {
>     uint64_t value;
> 
>     /* Note that the odd CONST_CAST() is here to keep sparse happy. */
>     atomic_read_relaxed(&CONST_CAST(struct seq *, seq)->value, &value);
>     atomic_thread_fence(memory_order_acquire);
>     return value;
> }
> 
> What do you think?

Why not just explicit read/store with acquire-release?  I'm not sure why
we need to use a fence.  Could you elaborate?


> 
>> Best regards, Ilya Maximets.
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to