On 5/11/25 08:08, Akihiko Odaki wrote:
@@ -387,12 +365,17 @@ void qemu_event_set(QemuEvent *ev)
      assert(ev->initialized);
/*
-     * Pairs with both qemu_event_reset() and qemu_event_wait().
+     * Pairs with qemu_event_wait() (on Linux) and qemu_event_reset().
       *
       * qemu_event_set has release semantics, but because it *loads*
       * ev->value we need a full memory barrier here.
+     *
+     * qemu_event_wait() do not have a paired barrier on non-Linux because
+     * ev->lock ensures ordering.
       */

Thanks for trying to write down the idea.  This works, but there's
still room for improvement. :)  The insight is that the logic becomes:

  if (done == 0) {                  done = 1;
    <read ev->value>                ev->value = EV_SET;
    ev->value = EV_FREE;
    if (done == 0) {
      pthread_cond_wait();          pthread_cond_broadcast();
    }
  }

which is a lot simplified compared to the futex case because all the
differences between EV_FREE and EV_BUSY are replaced by the condition
variable.  In order to avoid the slow path, all that's needed is to
ensure that if qemu_event_reset() sees EV_SET, the caller also sees
done == 1 in the second "if".

The futex case needs qatomic_or() because there can be a concurrent
EV_FREE->EV_BUSY, and a smp_mb() because qemu_event_set() needs to
*read* ev->value.  The non-futex case instead it can do this:

  if (done == 0) {                    done = 1;
    v = load_acquire(ev->value); <--- store_release(value, EV_SET);
    ev->value = v | EV_FREE;
    if (done == 0) {
      pthread_cond_wait();          pthread_cond_broadcast();
    }
  }

where load<---store indicates that the load on the left reads the
value that the store writes, and the store "happens before" the load.
In other words:

- because v reads EV_SET, everything before the store_release is ordered
before everything that follows the load_acquire

- therefore if v reads EV_SET, the pthread_cond_wait() won't be reached
and there's no possibility of a hang (I think :))


This becomes the following patch:

diff --git a/util/event.c b/util/event.c
index 20853d61a7c..489cec08de7 100644
--- a/util/event.c
+++ b/util/event.c
@@ -47,18 +47,14 @@ void qemu_event_set(QemuEvent *ev)
 {
     assert(ev->initialized);
+#ifdef HAVE_FUTEX
     /*
-     * Pairs with qemu_event_wait() (on Linux) and qemu_event_reset().
-     *
+     * Pairs with qemu_event_wait() and qemu_event_reset().
      * qemu_event_set has release semantics, but because it *loads*
      * ev->value we need a full memory barrier here.
-     *
-     * qemu_event_wait() do not have a paired barrier on non-Linux because
-     * ev->lock ensures ordering.
      */
     smp_mb();
-#ifdef HAVE_FUTEX
     if (qatomic_read(&ev->value) != EV_SET) {
         int old = qatomic_xchg(&ev->value, EV_SET);
@@ -71,7 +67,8 @@ void qemu_event_set(QemuEvent *ev)
     }
 #else
     pthread_mutex_lock(&ev->lock);
-    qatomic_set(&ev->value, EV_SET);
+    /* Pairs with qemu_event_reset()'s load acquire.  */
+    qatomic_store_release(&ev->value, EV_SET);
     pthread_cond_broadcast(&ev->cond);
     pthread_mutex_unlock(&ev->lock);
 #endif
@@ -81,6 +78,7 @@ void qemu_event_reset(QemuEvent *ev)
 {
     assert(ev->initialized);
+#ifdef HAVE_FUTEX
     /*
      * If there was a concurrent reset (or even reset+wait),
      * do nothing.  Otherwise change EV_SET->EV_FREE.
@@ -92,6 +90,28 @@ void qemu_event_reset(QemuEvent *ev)
      * Pairs with the first memory barrier in qemu_event_set().
      */
     smp_mb__after_rmw();
+#else
+    /*
+     * If futexes are not available, there are no EV_FREE->EV_BUSY
+     * transitions because wakeups are done entirely through the
+     * condition variable.  Since qatomic_set() always writes EV_FREE
+     * the load seems useless, but in reality its acquire synchronizes
+     * with qemu_event_set()'s store-release: if qemu_event_reset()
+     * sees EV_SET here, then the caller will certainly see a
+     * successful condition and skip qemu_event_wait():
+     *
+     * done = 1;                 if (done == 0)
+     * qemu_event_set() {          qemu_event_reset() {
+     *   lock();
+     *   ev->value = EV_SET ----->     v = ev->value
+     *                                 ev->value = v | EV_FREE
+     *   cond_broadcast()
+     *   unlock();                 }
+     * }                           if (done == 0)
+     *                               // qemu_event_wait() not called
+     */
+    qatomic_set(&ev->value, qatomic_load_acquire(&ev->value) | EV_FREE);
+#endif
 }
void qemu_event_wait(QemuEvent *ev)


You don't need to post v4 (which would include the above diff in
this patch, in qemu-thread-posix.c); I can post the above change
as a follow up too.  What you have now is not wrong, it's only the
explanation that isn't too precise.

Paolo


Reply via email to