On 3/7/23 09:00, Paolo Bonzini wrote:
while QSLIST_REMOVE_HEAD in the dequeuing thread is not ordered at all:
y.store(0, mo_relaxed); // QSLIST_REMOVE_HEAD
x.store(0, mo_release); // fetch_and
As I read aio_bh_queue, this is exactly the situation you describe in patch 1 justifying
the introduction of the new barriers.
Only store-store reordering is required between QSLIST_REMOVE_HEAD and atomic_fetch_and();
and that one *is* blocked by atomic_fetch_and(), since mo_seq_cst is a superset of both
mo_release. The new barriers are needed for store-load reordering.
In patch 1 you say:
# in C11, sequentially consistent atomics (except for seq-cst fences)
# only affect the ordering of sequentially consistent operations.
and the store in QSLIST_REMOVE_HEAD is not a sequentially consistent operation.
Therefore by your own logic we must have a separate barrier here.
I think this is mostly a compiler-theoretic problem, because hardware will see barriers
that affect all memory operations, not just seq-cst. Only compilers can be so literal as
to decide to reschedule the two because they have different source-level model.
I wonder if your definition/description of smp_mb__before_rmw() isn't actively misleading
in this case.
- We could drop it entirely and be less confusing, by not having to explain it.
- We could define it as signal_barrier() for all hosts, simply to fix the
compiler-theoretic reordering problem.
Thoughts?
r~