Re: [PATCH v2 8/9] async: update documentation of the memory barriers

Paolo Bonzini Wed, 08 Mar 2023 02:50:08 -0800

On 3/7/23 18:26, Richard Henderson wrote:

On 3/7/23 09:00, Paolo Bonzini wrote:
while QSLIST_REMOVE_HEAD in the dequeuing thread is not ordered at all:
         y.store(0, mo_relaxed);           // QSLIST_REMOVE_HEAD
         x.store(0, mo_release);           // fetch_and
As I read aio_bh_queue, this is exactly the situation you describe inpatch 1 justifying the introduction of the new barriers.
Only store-store reordering is required between QSLIST_REMOVE_HEAD andatomic_fetch_and(); and that one *is* blocked by atomic_fetch_and(),since mo_seq_cst is a superset of both mo_release. The new barriersare needed for store-load reordering.
In patch 1 you say:

# in C11, sequentially consistent atomics (except for seq-cst fences)
# only affect the ordering of sequentially consistent operations.
and the store in QSLIST_REMOVE_HEAD is not a sequentially consistentoperation.
Therefore by your own logic we must have a separate barrier here.


You're right that the comment is contradictory.

It's the comment that is wrong.  The right text should be

---

in C11, with the exception of seq-cst fences, the order established bysequentially consistent atomics does not propagate to other memoryaccesses on either side of the seq-cst atomic. As far as those areconcerned, loads performed by a seq-cst atomic are just acquire loads,and stores are just release stores. Even though loads that occur aftera RMW operation cannot move above the load, they can still sneak abovethe store!

---

The first sentence is a more accurate way to say the same thing. Therest sentence says what happens (mo_seq_cst downgrades to mo_acq_rel asfar as those other accesses are concerned) and I missed the opportunityto include it because it didn't matter for the cases where the seriesadded barriers---except now it matters for *not* adding barriers.

Now, I also said that I don't understand mo_acq_rel and that is indeedtrue. Instead, in this case I'm basically treating mo_seq_cst as asuperset of both mo_release (on the dequeue side) or mo_acquire (on theenqueue side), individually.

I wonder if your definition/description of smp_mb__before_rmw() isn'tactively misleading in this case.
- We could drop it entirely and be less confusing, by not having toexplain it.
- We could define it as  signal_barrier() for all hosts, simply to fix the
   compiler-theoretic reordering problem.

The case that I was imagining for smp_mb__before_rmw() is something likethis:


        wake_me = true;
        smp_mb__before_rmw();
        if (qatomic_xchg(&can_sleep, true)) { ... }

where you really need a full barrier.

Paolo

Re: [PATCH v2 8/9] async: update documentation of the memory barriers

Reply via email to