On 2025/05/16 23:53, Peter Xu wrote:
On Fri, May 16, 2025 at 02:34:33PM +0900, Akihiko Odaki wrote:
On 2025/05/15 2:06, Peter Xu wrote:
On Wed, May 14, 2025 at 04:34:33PM +0900, Akihiko Odaki wrote:
On 2025/05/13 23:39, 'Peter Xu' via devel wrote:
On Sun, May 11, 2025 at 03:08:18PM +0900, Akihiko Odaki wrote:
futex(2) - Linux manual page
https://man7.org/linux/man-pages/man2/futex.2.html
Note that a wake-up can also be caused by common futex usage patterns
in unrelated code that happened to have previously used the futex
word's memory location (e.g., typical futex-based implementations of
Pthreads mutexes can cause this under some conditions). Therefore,
callers should always conservatively assume that a return value of 0
can mean a spurious wake-up, and use the futex word's value (i.e.,
the user-space synchronization scheme) to decide whether to continue
to block or not.
I'm just curious - do you know when this will happen?
AFAIU, QEMU uses futex always on private mappings, internally futex does
use (mm, HVA) tuple to index a futex, afaict. Hence, I don't see how it
can get spurious wakeups.. And _if_ it happens, since mm pointer can't
change it must mean the HVA of the futex word is reused, it sounds like an
UAF user bug to me instead.
[1]
I checked the man-pages git repo, this line was introduced in:
https://github.com/mkerrisk/man-pages/commit/4b35dc5dabcf356ce6dcb1f949f7b00e76c7587d
I also didn't see details yet in commit message on why that paragraph was
added.
And..
Signed-off-by: Akihiko Odaki <akihiko.od...@daynix.com>
---
include/qemu/futex.h | 9 +++++++++
tests/unit/test-aio-multithread.c | 4 +++-
util/qemu-thread-posix.c | 28 ++++++++++++++++------------
3 files changed, 28 insertions(+), 13 deletions(-)
diff --git a/include/qemu/futex.h b/include/qemu/futex.h
index 91ae88966e12..f57774005330 100644
--- a/include/qemu/futex.h
+++ b/include/qemu/futex.h
@@ -24,6 +24,15 @@ static inline void qemu_futex_wake(void *f, int n)
qemu_futex(f, FUTEX_WAKE, n, NULL, NULL, 0);
}
+/*
+ * Note that a wake-up can also be caused by common futex usage patterns in
+ * unrelated code that happened to have previously used the futex word's
+ * memory location (e.g., typical futex-based implementations of Pthreads
+ * mutexes can cause this under some conditions). Therefore, callers should
.. another thing that was unclear to me is, here it's mentioning "typical
futex-based implementations of pthreads mutexes..", but here
qemu_futex_wait() is using raw futex without any pthread impl. Does it
also mean that this may not be applicable to whatever might cause a
spurious wakeup?
No. The man-page mentions "unrelated code that happened to have previously
used the futex word's memory location", so it doesn't matter whether we use
pthread here.
libpthread and even this QemuEvent follows the "common futex usage" so we
should do what is written in the man page.
Unfortunately the man page does not describe the "common futex usage
pattern". It looks like as follows:
Assume there are two threads, one atomic variable, and one futex.
Thread A does the following:
A1. Read the atomic variable.
A2. Go A5 if the atomic variable is zero.
A3. Wait using the futex.
A4. Go A1.
A5. Free the atomic variable and the futex.
Thread B does the following:
B1. Set the atomic variable to zero.
B2. Wake up using the futex.
In this example, the execution may happen in the following order:
B1 -> A1 -> A2 -> A5 -> B2
Here, B2 will cause a spurious wake up of QemuEvent if the freed memory gets
reused for QemuEvent.
This is true.
Said that, if to follow my previous statement at [1] above, here I think A5
is the UAF bug I mentioned, trying to free the lock object with existing
user (Thread B) accessing the object.
IMHO, the userapp should make sure the object will never be freed if
there's any possible user of it, and that includes a waker like Thread B.
For futex, the futex word (which is the important bit here relevant to
possible spurious wakeups) is part of the lock object, hence if the lock
object isn't freed too early it won't ever get reused, and then there
should have no chance of spurious wakeups in the futex context.
It is a UAF, but it is by design and not a bug.
The principle of the futex design is to use atomic memory operations to
manage the state instead of using a system call, which is more expensive.
This principle motivates tolerating spurious wakeups. If wakeup system calls
after free are forbidden, a thread will need to use a (expensive) system
call to ensure the wake up actually happened before freeing. Instead, we can
tolerate spurious wakeups without causing a buggy behavior by making the
waiting thread perform (cheaper) atomic memory reads to verify the expected
state.
Right, that's also my understanding that it's by design for futex from
kernel POV.
I think it also makes sense from the userspace POV; it is a common truth
that atomic memory operations are cheaper than system calls.
Which I am not yet sure is whether it's by design to be used in userapp so
that a spurious wakeup could happen. From which regard, I still think
maybe we shouldn't have that paragraph in the man page at all, at least it
can be clearer when put into man pages.
Eliminating spurious wakeups requires removing the paragraph from the
man page and updating all libraries (including libpthread) not to make
spurious wakeups, which takes a long time. We need to prepare for
spurious wakeups for now.
I agree the man page can be clearer; the paragraph assumes readers
naively follow what it says, but you need more insights; I also had to
spend some time to understand the QemuEvent code and the libpthread
code, which was unnecessary if the man page describes "the common futex
usage pattern".
So now the question is, do we have such use case so that QEMU needs to free
a qemu_futex_*() API based lock _before_ any wakeups?
We need to care external libraries that may use futex and at least I
know libpthread can cause spurious wakeups.
Regards,
Akihiko Odaki