On 3/5/23 20:32, Richard Henderson wrote:
On 3/3/23 09:19, Paolo Bonzini wrote:
unsigned old_flags;
/*
* The memory barrier makes sure that:
* 1. any writes needed by the callback are visible from the callback
* after aio_bh_dequeue() returns bh.
* 2. ctx is loaded before the callback has a chance to execute and bh
* could be freed.
*/
old_flags = qatomic_fetch_or(&bh->flags, BH_PENDING | new_flags);
+ smp_mb__after_rmw();
+
if (!(old_flags & BH_PENDING)) {
QSLIST_INSERT_HEAD_ATOMIC(&ctx->bh_list, bh, next);
}
@@ -107,14 +109,15 @@ static QEMUBH *aio_bh_dequeue(BHList *head,
unsigned *flags)
QSLIST_REMOVE_HEAD(head, next);
/*
* The memory barrier is paired with aio_bh_enqueue() and it
* ensures that the callback sees all writes done by the scheduling
* thread. It also ensures that the scheduling thread sees the cleared
* flag before bh->cb has run, and thus will call aio_notify again if
* necessary.
*/
Is it really paired with aio_bh_enqueue?
Seems like the enqueue barrier really is for aio_bh_poll, and the
dequeue barrier is for the callback.
The documentation has been quite obsolete since the introduction of
bh_list. The logic is as follows:
aio_bh_enqueue()
load bh->ctx
set PENDING flag // (0)
if flag was not set
// bh becomes visible to dequeuing thread here:
insert into bh_list // (1)
aio_notify
// Write bh->flags and bh_list before ctx->notified
smp_wmb() // (2)
set notified to true
// Write notified before reading notify_me
smp_mb() // (3)
if notify_me then event_notifier_set()
and on the dequeue side it's tricky to see why all barriers are
needed; it boils down to the busy-wait polling done at the beginning
of aio_poll():
aio_poll()
compute approximate timeout (*unordered* read of bh_list)
enable notify_me
// Write notify_me before reading notified
smp_mb() // synchronizes with (3)
if notified then timeout = 0
ctx->fdmon_ops->wait(timeout)
disable notify_me
aio_notify_accept()
set notified to false
// Write ctx->notified before reading e.g. bh_list
smp_mb() // synchronizes with (2)
aio_bh_poll()
QSLIST_MOVE_ATOMIC // synchronizes with (1)
aio_bh_dequeue
remove from head
clear PENDING/SCHEDULED/IDLE // synchronizes with (0)
if flags were set
aio_bh_call
That is:
for synchronization point (0)
- the main function here is to ensure that aio_bh_dequeue() removes
from the list before the PENDING flag is cleared, otherwise enqueuers can
clobber the list, and vice versa for the enqueuers. For this, it is enough
that qatomic_fetch_and() is "release" and qatomic_fetch_or() is "acquire"
(and they are).
for synchronization point (1)
- the bottom half machinery between aio_bh_enqueue and aio_bh_poll only
needs release-to-acquire synchronization, and that is provided by (1).
This is also where ordering ensures that bh->ctx is loaded before the
callback executes (which could free bh).
- between enqueuers, setting the PENDING flag only needs to be done before
inserting into bh_list to avoid double insertion (and of course needs to
be done atomically); for this purpose, QSLIST_INSERT_HEAD_ATOMIC needs to
be "release" (which it is).
Also the call to aio_notify() in aio_bh_enqueue() is unconditional, so
aio_bh_dequeue() need not protect against missed calls to aio_notify().
aio_notify() only synchronizes with aio_poll() and aio_notify_accept().
for synchronization point (2)
- This cannot be just a smp_rmb() because the timeout that was computed
earlier was not ordered against either notified or notify_me.
- This is a smp_mb() for generality; as far as bh_list is
concerned it could be smp_mb__before_rmw().
Synchronization point (3) is pretty mundane in comparison.
So I'm dropping this patch; I have prepared a documentation patch instead.
Paolo