On 5/11/25 08:08, Akihiko Odaki wrote:
@@ -387,12 +365,17 @@ void qemu_event_set(QemuEvent *ev)
assert(ev->initialized);
/*
- * Pairs with both qemu_event_reset() and qemu_event_wait().
+ * Pairs with qemu_event_wait() (on Linux) and qemu_event_reset().
*
* qemu_event_set has release semantics, but because it *loads*
* ev->value we need a full memory barrier here.
+ *
+ * qemu_event_wait() do not have a paired barrier on non-Linux because
+ * ev->lock ensures ordering.
*/
Thanks for trying to write down the idea. This works, but there's
still room for improvement. :) The insight is that the logic becomes:
if (done == 0) { done = 1;
<read ev->value> ev->value = EV_SET;
ev->value = EV_FREE;
if (done == 0) {
pthread_cond_wait(); pthread_cond_broadcast();
}
}
which is a lot simplified compared to the futex case because all the
differences between EV_FREE and EV_BUSY are replaced by the condition
variable. In order to avoid the slow path, all that's needed is to
ensure that if qemu_event_reset() sees EV_SET, the caller also sees
done == 1 in the second "if".
The futex case needs qatomic_or() because there can be a concurrent
EV_FREE->EV_BUSY, and a smp_mb() because qemu_event_set() needs to
*read* ev->value. The non-futex case instead it can do this:
if (done == 0) { done = 1;
v = load_acquire(ev->value); <--- store_release(value, EV_SET);
ev->value = v | EV_FREE;
if (done == 0) {
pthread_cond_wait(); pthread_cond_broadcast();
}
}
where load<---store indicates that the load on the left reads the
value that the store writes, and the store "happens before" the load.
In other words:
- because v reads EV_SET, everything before the store_release is ordered
before everything that follows the load_acquire
- therefore if v reads EV_SET, the pthread_cond_wait() won't be reached
and there's no possibility of a hang (I think :))
This becomes the following patch:
diff --git a/util/event.c b/util/event.c
index 20853d61a7c..489cec08de7 100644
--- a/util/event.c
+++ b/util/event.c
@@ -47,18 +47,14 @@ void qemu_event_set(QemuEvent *ev)
{
assert(ev->initialized);
+#ifdef HAVE_FUTEX
/*
- * Pairs with qemu_event_wait() (on Linux) and qemu_event_reset().
- *
+ * Pairs with qemu_event_wait() and qemu_event_reset().
* qemu_event_set has release semantics, but because it *loads*
* ev->value we need a full memory barrier here.
- *
- * qemu_event_wait() do not have a paired barrier on non-Linux because
- * ev->lock ensures ordering.
*/
smp_mb();
-#ifdef HAVE_FUTEX
if (qatomic_read(&ev->value) != EV_SET) {
int old = qatomic_xchg(&ev->value, EV_SET);
@@ -71,7 +67,8 @@ void qemu_event_set(QemuEvent *ev)
}
#else
pthread_mutex_lock(&ev->lock);
- qatomic_set(&ev->value, EV_SET);
+ /* Pairs with qemu_event_reset()'s load acquire. */
+ qatomic_store_release(&ev->value, EV_SET);
pthread_cond_broadcast(&ev->cond);
pthread_mutex_unlock(&ev->lock);
#endif
@@ -81,6 +78,7 @@ void qemu_event_reset(QemuEvent *ev)
{
assert(ev->initialized);
+#ifdef HAVE_FUTEX
/*
* If there was a concurrent reset (or even reset+wait),
* do nothing. Otherwise change EV_SET->EV_FREE.
@@ -92,6 +90,28 @@ void qemu_event_reset(QemuEvent *ev)
* Pairs with the first memory barrier in qemu_event_set().
*/
smp_mb__after_rmw();
+#else
+ /*
+ * If futexes are not available, there are no EV_FREE->EV_BUSY
+ * transitions because wakeups are done entirely through the
+ * condition variable. Since qatomic_set() always writes EV_FREE
+ * the load seems useless, but in reality its acquire synchronizes
+ * with qemu_event_set()'s store-release: if qemu_event_reset()
+ * sees EV_SET here, then the caller will certainly see a
+ * successful condition and skip qemu_event_wait():
+ *
+ * done = 1; if (done == 0)
+ * qemu_event_set() { qemu_event_reset() {
+ * lock();
+ * ev->value = EV_SET -----> v = ev->value
+ * ev->value = v | EV_FREE
+ * cond_broadcast()
+ * unlock(); }
+ * } if (done == 0)
+ * // qemu_event_wait() not called
+ */
+ qatomic_set(&ev->value, qatomic_load_acquire(&ev->value) | EV_FREE);
+#endif
}
void qemu_event_wait(QemuEvent *ev)
You don't need to post v4 (which would include the above diff in
this patch, in qemu-thread-posix.c); I can post the above change
as a follow up too. What you have now is not wrong, it's only the
explanation that isn't too precise.
Paolo