dequeue

Paolo Bonzini Mon, 06 Mar 2023 01:56:47 -0800

On 3/5/23 20:32, Richard Henderson wrote:

On 3/3/23 09:19, Paolo Bonzini wrote:

      unsigned old_flags;
      /*
       * The memory barrier makes sure that:
       * 1. any writes needed by the callback are visible from the callback
       *    after aio_bh_dequeue() returns bh.
       * 2. ctx is loaded before the callback has a chance to execute and bh
       *    could be freed.
       */
      old_flags = qatomic_fetch_or(&bh->flags, BH_PENDING | new_flags);
+    smp_mb__after_rmw();
+
      if (!(old_flags & BH_PENDING)) {
          QSLIST_INSERT_HEAD_ATOMIC(&ctx->bh_list, bh, next);
      }

@@ -107,14 +109,15 @@ static QEMUBH *aio_bh_dequeue(BHList *head,unsigned *flags)

      QSLIST_REMOVE_HEAD(head, next);
      /*
       * The memory barrier is paired with aio_bh_enqueue() and it
       * ensures that the callback sees all writes done by the scheduling
       * thread.  It also ensures that the scheduling thread sees the cleared
       * flag before bh->cb has run, and thus will call aio_notify again if
       * necessary.
       */


Is it really paired with aio_bh_enqueue?

Seems like the enqueue barrier really is for aio_bh_poll, and thedequeue barrier is for the callback.


The documentation has been quite obsolete since the introduction of
bh_list.  The logic is as follows:

aio_bh_enqueue()
  load bh->ctx
  set PENDING flag                                          // (0)
  if flag was not set
    // bh becomes visible to dequeuing thread here:
    insert into bh_list                                     // (1)

  aio_notify
    // Write bh->flags and bh_list before ctx->notified
    smp_wmb()                                               // (2)
    set notified to true
    // Write notified before reading notify_me
    smp_mb()                                                // (3)
    if notify_me then event_notifier_set()

and on the dequeue side it's tricky to see why all barriers are
needed; it boils down to the busy-wait polling done at the beginning
of aio_poll():

aio_poll()
  compute approximate timeout (*unordered* read of bh_list)

  enable notify_me
  // Write notify_me before reading notified
  smp_mb()                                // synchronizes with (3)
  if notified then timeout = 0
  ctx->fdmon_ops->wait(timeout)

  disable notify_me
  aio_notify_accept()
    set notified to false
    // Write ctx->notified before reading e.g. bh_list
    smp_mb()                              // synchronizes with (2)

  aio_bh_poll()
    QSLIST_MOVE_ATOMIC                    // synchronizes with (1)
    aio_bh_dequeue
      remove from head
      clear PENDING/SCHEDULED/IDLE        // synchronizes with (0)
    if flags were set
      aio_bh_call


That is:

for synchronization point (0)
- the main function here is to ensure that aio_bh_dequeue() removes
  from the list before the PENDING flag is cleared, otherwise enqueuers can
  clobber the list, and vice versa for the enqueuers.  For this, it is enough
  that qatomic_fetch_and() is "release" and qatomic_fetch_or() is "acquire"
  (and they are).

for synchronization point (1)
- the bottom half machinery between aio_bh_enqueue and aio_bh_poll only
  needs release-to-acquire synchronization, and that is provided by (1).
  This is also where ordering ensures that bh->ctx is loaded before the
  callback executes (which could free bh).

- between enqueuers, setting the PENDING flag only needs to be done before
  inserting into bh_list to avoid double insertion (and of course needs to
  be done atomically); for this purpose, QSLIST_INSERT_HEAD_ATOMIC needs to
  be "release" (which it is).

Also the call to aio_notify() in aio_bh_enqueue() is unconditional, so
aio_bh_dequeue() need not protect against missed calls to aio_notify().
aio_notify() only synchronizes with aio_poll() and aio_notify_accept().

for synchronization point (2)
- This cannot be just a smp_rmb() because the timeout that was computed
  earlier was not ordered against either notified or notify_me.

- This is a smp_mb() for generality; as far as bh_list is
  concerned it could be smp_mb__before_rmw().

Synchronization point (3) is pretty mundane in comparison.

So I'm dropping this patch; I have prepared a documentation patch instead.

Paolo

Re: [PATCH 5/8] util/async: add smp_mb__after_rmw() around BH enqueue/dequeue

Reply via email to