On Fri, May 16, 2025 at 02:34:33PM +0900, Akihiko Odaki wrote:
> On 2025/05/15 2:06, Peter Xu wrote:
> > On Wed, May 14, 2025 at 04:34:33PM +0900, Akihiko Odaki wrote:
> > > On 2025/05/13 23:39, 'Peter Xu' via devel wrote:
> > > > On Sun, May 11, 2025 at 03:08:18PM +0900, Akihiko Odaki wrote:
> > > > > futex(2) - Linux manual page
> > > > > https://man7.org/linux/man-pages/man2/futex.2.html
> > > > > > Note that a wake-up can also be caused by common futex usage 
> > > > > > patterns
> > > > > > in unrelated code that happened to have previously used the futex
> > > > > > word's memory location (e.g., typical futex-based implementations of
> > > > > > Pthreads mutexes can cause this under some conditions).  Therefore,
> > > > > > callers should always conservatively assume that a return value of 0
> > > > > > can mean a spurious wake-up, and use the futex word's value (i.e.,
> > > > > > the user-space synchronization scheme) to decide whether to continue
> > > > > > to block or not.
> > > > 
> > > > I'm just curious - do you know when this will happen?
> > > > 
> > > > AFAIU, QEMU uses futex always on private mappings, internally futex does
> > > > use (mm, HVA) tuple to index a futex, afaict.  Hence, I don't see how it
> > > > can get spurious wakeups..  And _if_ it happens, since mm pointer can't
> > > > change it must mean the HVA of the futex word is reused, it sounds like 
> > > > an
> > > > UAF user bug to me instead.
> > 
> > [1]
> > 
> > > > 
> > > > I checked the man-pages git repo, this line was introduced in:
> > > > 
> > > > https://github.com/mkerrisk/man-pages/commit/4b35dc5dabcf356ce6dcb1f949f7b00e76c7587d
> > > > 
> > > > I also didn't see details yet in commit message on why that paragraph 
> > > > was
> > > > added.
> > > > 
> > > > And..
> > > > 
> > > > > 
> > > > > Signed-off-by: Akihiko Odaki <akihiko.od...@daynix.com>
> > > > > ---
> > > > >    include/qemu/futex.h              |  9 +++++++++
> > > > >    tests/unit/test-aio-multithread.c |  4 +++-
> > > > >    util/qemu-thread-posix.c          | 28 ++++++++++++++++------------
> > > > >    3 files changed, 28 insertions(+), 13 deletions(-)
> > > > > 
> > > > > diff --git a/include/qemu/futex.h b/include/qemu/futex.h
> > > > > index 91ae88966e12..f57774005330 100644
> > > > > --- a/include/qemu/futex.h
> > > > > +++ b/include/qemu/futex.h
> > > > > @@ -24,6 +24,15 @@ static inline void qemu_futex_wake(void *f, int n)
> > > > >        qemu_futex(f, FUTEX_WAKE, n, NULL, NULL, 0);
> > > > >    }
> > > > > +/*
> > > > > + * Note that a wake-up can also be caused by common futex usage 
> > > > > patterns in
> > > > > + * unrelated code that happened to have previously used the futex 
> > > > > word's
> > > > > + * memory location (e.g., typical futex-based implementations of 
> > > > > Pthreads
> > > > > + * mutexes can cause this under some conditions).  Therefore, 
> > > > > callers should
> > > > 
> > > > .. another thing that was unclear to me is, here it's mentioning 
> > > > "typical
> > > > futex-based implementations of pthreads mutexes..", but here
> > > > qemu_futex_wait() is using raw futex without any pthread impl.  Does it
> > > > also mean that this may not be applicable to whatever might cause a
> > > > spurious wakeup?
> > > 
> > > No. The man-page mentions "unrelated code that happened to have previously
> > > used the futex word's memory location", so it doesn't matter whether we 
> > > use
> > > pthread here.
> > > 
> > > libpthread and even this QemuEvent follows the "common futex usage" so we
> > > should do what is written in the man page.
> > > 
> > > Unfortunately the man page does not describe the "common futex usage
> > > pattern". It looks like as follows:
> > > 
> > > Assume there are two threads, one atomic variable, and one futex.
> > > 
> > > Thread A does the following:
> > > A1. Read the atomic variable.
> > > A2. Go A5 if the atomic variable is zero.
> > > A3. Wait using the futex.
> > > A4. Go A1.
> > > A5. Free the atomic variable and the futex.
> > > 
> > > Thread B does the following:
> > > B1. Set the atomic variable to zero.
> > > B2. Wake up using the futex.
> > > 
> > > In this example, the execution may happen in the following order:
> > > B1 -> A1 -> A2 -> A5 -> B2
> > > 
> > > Here, B2 will cause a spurious wake up of QemuEvent if the freed memory 
> > > gets
> > > reused for QemuEvent.
> > 
> > This is true.
> > 
> > Said that, if to follow my previous statement at [1] above, here I think A5
> > is the UAF bug I mentioned, trying to free the lock object with existing
> > user (Thread B) accessing the object.
> > 
> > IMHO, the userapp should make sure the object will never be freed if
> > there's any possible user of it, and that includes a waker like Thread B.
> > 
> > For futex, the futex word (which is the important bit here relevant to
> > possible spurious wakeups) is part of the lock object, hence if the lock
> > object isn't freed too early it won't ever get reused, and then there
> > should have no chance of spurious wakeups in the futex context.
> 
> It is a UAF, but it is by design and not a bug.
> 
> The principle of the futex design is to use atomic memory operations to
> manage the state instead of using a system call, which is more expensive.
> 
> This principle motivates tolerating spurious wakeups. If wakeup system calls
> after free are forbidden, a thread will need to use a (expensive) system
> call to ensure the wake up actually happened before freeing. Instead, we can
> tolerate spurious wakeups without causing a buggy behavior by making the
> waiting thread perform (cheaper) atomic memory reads to verify the expected
> state.

Right, that's also my understanding that it's by design for futex from
kernel POV.

Which I am not yet sure is whether it's by design to be used in userapp so
that a spurious wakeup could happen.  From which regard, I still think
maybe we shouldn't have that paragraph in the man page at all, at least it
can be clearer when put into man pages.

So now the question is, do we have such use case so that QEMU needs to free
a qemu_futex_*() API based lock _before_ any wakeups?

QEMU only has two locks impled on top of direct futex, which is QemuEvent
and QemuLockCnt.  From what I can tell on how they're used (not a lot),
none of them will use such as a feature, so IIUC it means QEMU still should
be free from such UAF issue, and it's definitely not a feature either.
Hence if any spurious wakeup happened in QEMU, it's a real bug.

>From that POV, IMHO it would make more sense if we allow spurious wakeup
iff it's proved to be necessary (e.g. when QEMU wants to make it a feature
not a bug), and also I worry we're copying the man page content all over
into QEMU tree but just in case it's inaccurate to be applied to QEMU's
context at all.

Thanks,

-- 
Peter Xu


Reply via email to