[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 Paul Floyd changed: What|Removed |Added Status|CONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #21 from Paul Floyd --- commit 94a41e050c92a504515c66ebdaf84cad91c869bb (HEAD -> master, origin/master, origin/HEAD) Author: Paul Floyd Date: Tue Feb 28 22:17:25 2023 +0100 -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 --- Comment #20 from Paul Floyd --- (In reply to Nick Briggs from comment #19) > c.f. restart_syscall(2) in the Linux kernel -- > https://man7.org/linux/man-pages/man2/restart_syscall.2.html That might be useful for Linux if/when someone implements it. I don't know if it will help with the way that we handle syscalls and signals, but it looks like it will be useful for timed syscalls. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 --- Comment #19 from Nick Briggs --- c.f. restart_syscall(2) in the Linux kernel -- https://man7.org/linux/man-pages/man2/restart_syscall.2.html -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 --- Comment #18 from Paul Floyd --- In the FreeBSD kernel it looks like the mutex functions are only restartable if they don't have a timeout. The 3 cases that I see are UMTX_OP_MUTEX_LOCK UMTX_OP_MUTEX_TRYLOCK and UMTX_OP_MUTEX_WAIT when the timeout argument is null The commit below seems to fix this issue on FreeBSD. commit d8cc70f6d5a37b9b773c85f8666d6c5f7f65a29b (HEAD -> master, origin/master, origin/HEAD) Author: Paul Floyd Date: Sun Dec 25 10:18:51 2022 +0100 FreeBSD only: fix 445743 Restart 3 of the umtx_op mutex operations if they are interrupted. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 Bart Van Assche changed: What|Removed |Added Assignee|[email protected] |[email protected] |m | -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 --- Comment #17 from Paul Floyd --- One thing that I'm not sure about is whether restartable syscalls re-read their userland arguments or whether they reuse the copyin()'d arguments. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 --- Comment #16 from Paul Floyd --- (In reply to Nick Briggs from comment #15) > Also note: drd/tests/pth_mutex_signal when run under simple memcheck, fails > because pthread_mutex_lock() returns EINTR, while POSIX says of it (and > related functions) "These functions shall not return an error code of > [EINTR]." The big question is why Linux and FreeBSD aren't restarting their respective _umtx_op/futex system calls. Both are using internal kernel values of errno that indicate that the sycall should be restarted, ERESTARTNOINTR on Linux and ERESTART on FreeBSD. When running standalone, the signal handler will create a sigframe with the return trampoline doing a sigreturn to the syscall. When running under Valgrind, things are a lot different. We will have replaced the client signal handler by async_signalhandler. So when the signal arrives, this is what gets called, with a sigframe for sigreturn back to the syscall. But we will never return from async_signalhandler. Instead async_signalhandler calls fixup_guest_state_after_syscall_interrupted *** which does not handle kernel restartable syscalls *** async_signalhandler then calls deliver_signal which synthesizes an extended sigframe with out own return trampoline that does a (fake) sigreturn. async_sighandler jumps back to the scheduler, which will then run the guest signal handler. That will return via our trampoline to our (fake) sigreturn. This fixes up the Virtual CPU to make it look like the client is returning to the syscall but then (fake) sigreturn just sets the status (so there is no real syscall) and returns back to the Valgrind scheduler. That all takes us back to where fixup_guest_state_after_syscall_interrupted said to go after the sigreturn. We'd like it to restart the syscall, like when running standalone. But that doesn't seem to be the case at the moment. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 Nick Briggs changed: What|Removed |Added CC||[email protected] ||m --- Comment #15 from Nick Briggs --- Also note: drd/tests/pth_mutex_signal when run under simple memcheck, fails because pthread_mutex_lock() returns EINTR, while POSIX says of it (and related functions) "These functions shall not return an error code of [EINTR]." -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743
--- Comment #14 from Paul Floyd ---
If I add to the previous patch
*flags |= SfMayBlock;
* start add*
if ((ARG2 & (VKI_FUTEX_PRIVATE_FLAG|VKI_FUTEX_LOCK_PI)) ==
(VKI_FUTEX_PRIVATE_FLAG|VKI_FUTEX_LOCK_PI)) {
*flags |= SfKernelRestart;
}
*end add *
to static void futex_pre_helper ( ThreadId tid, SyscallArgLayout* layout,
SyscallArgs* arrghs, SyscallStatus* status,
UWord* flags, Bool is_time64 )
then I get
paulf> ../../vg-in-place --tool=drd -q ./pth_mutex_signal
mutex initialized
thread attributes initialized
thread created
sleeping
signalling
sleeping
nullHandler running
unlocking
contender locked mutex
contender unlocking mutex
contender unlocked mutex
joining thread
No error
--
You are receiving this mail because:
You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 --- Comment #13 from Paul Floyd --- Created attachment 149549 --> https://bugs.kde.org/attachment.cgi?id=149549&action=edit Patch for FreeBSD This patch fixes the drd/tests/pth_mutex_signal testcase on FreeBSD. I think that it should extend easily to Linux. I need to do more testing and it would also be good to find out a list of all syscalls that use ERESTARTNOINTR (Linux) and ERESTART (FreeBSD). -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743
--- Comment #12 from Paul Floyd ---
Looks similar on FreeBSD. truss output, standalone
78094: thr_kill(133780,SIGRT59) = 0 (0x0)
78094: _umtx_op(0x800684008,UMTX_OP_MUTEX_LOCK,0x0,0x0,0x0) ERESTART
78094: SIGNAL 124 (SIGRT59) code=SI_LWP pid=78094 uid=501
78094: write(2,"sleeping\n",9) = 9 (0x9)
78094: sigprocmask(SIG_SETMASK,{
SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2
},0x0) = 0 (0x0)
78094: write(2,"nullHandler running\n",20) = 20 (0x14)
78094: sigreturn(0x7fffdfffd670) EJUSTRETURN
78094: nanosleep({ 1.0 })= 0 (0x0)
78094: write(2,"unlocking\n",10) = 10 (0xa)
78094: _umtx_op(0x800684008,UMTX_OP_MUTEX_UNLOCK,0x0,0x0,0x0) = 0 (0x0)
78094: _umtx_op(0x800684008,UMTX_OP_MUTEX_LOCK,0x0,0x0,0x0) ERESTART
78094: _umtx_op(0x800684008,UMTX_OP_MUTEX_LOCK,0x0,0x0,0x0) = 0 (0x0)
78094: write(2,"contender locked mutex\n",23)= 23 (0x17)
Without pthread_mutexattr_setprotocol(&mutex_attr, PTHREAD_PRIO_INHERIT);
78129: thr_kill(133828,SIGRT59) = 0 (0x0)
78129: _umtx_op(0x800684008,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) ERR#4 'Interrupted
system call'
78129: SIGNAL 124 (SIGRT59) code=SI_LWP pid=78129 uid=501
78129: write(2,"sleeping\n",9) = 9 (0x9)
78129: sigprocmask(SIG_SETMASK,{
SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2
},0x0) = 0 (0x0)
78129: write(2,"nullHandler running\n",20) = 20 (0x14)
78129: sigreturn(0x7fffdfffd640) EJUSTRETURN
78129: nanosleep({ 1.0 })= 0 (0x0)
78129: write(2,"unlocking\n",10) = 10 (0xa)
78129: _umtx_op(0x800684008,UMTX_OP_MUTEX_WAKE2,0x0,0x0,0x0) = 0 (0x0)
78129: _umtx_op(0x800684008,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) = 0 (0x0)
78129: write(2,"contender locked mutex\n",23)= 23 (0x17)
This time under DRD the mutex does return an error code - invalid argument.
--
You are receiving this mail because:
You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743
--- Comment #11 from Paul Floyd ---
I think that I understand the problem, at least on Linux.
PTHREAD_PRIO_INHERIT causes the type of mutex to change so that it will return
ERESTARTNOINTR if interrupted.
I see, with strace
14177 <... futex resumed> ) = ? ERESTARTNOINTR (To be restarted)
Standalone I see
20015 rt_sigreturn({mask=[]}) = 202
20015 futex(0x7fffd070, FUTEX_LOCK_PI_PRIVATE, 1, NULL
20014 <... nanosleep resumed> NULL) = 0
The problem is that the Valgrind sigreturn mechanism hijacks the return address
so we no longer return to the guest futex syscall. That means that the futex
doesn't get restarted.
Even worse, since the futex was supposed to be restarted the return code to
userland is zero, so we don't even see that it failed.
I don't see any way to fix this at the moment on Linux.
--
You are receiving this mail because:
You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743
--- Comment #10 from Paul Floyd ---
On FreeBSD, single stepping out after the user interrupt I get here
sleep_in_kernel:
if (abstime == NULL)
ret = __thr_umutex_lock(&m->m_lock, id); // RETURN HERE UNDER
VG
else if (__predict_false(abstime->tv_nsec < 0 ||
abstime->tv_nsec >= 10))
ret = EINVAL;
else
ret = __thr_umutex_timedlock(&m->m_lock, id, abstime);
done:
if (ret == 0 || ret == EOWNERDEAD) { // RETURN HERE STANDALONE
enqueue_mutex(curthread, m, ret);
if (ret == EOWNERDEAD)
m->m_lock.m_flags |= UMUTEX_NONCONSISTENT;
}
return (ret);
}
Retcode is 4, invalid.
--
You are receiving this mail because:
You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 Alexander Kanavin changed: What|Removed |Added CC||[email protected] --- Comment #9 from Alexander Kanavin --- Yocto project started getting this too with valgrind 3.19.0 on linux: drd/tests/pth_mutex_signal.stderr.diff --- pth_mutex_signal.stderr.exp 2018-03-09 12:34:56.0 + +++ pth_mutex_signal.stderr.out 2022-04-27 07:32:55.31000 + @@ -6,10 +6,42 @@ signalling sleeping nullHandler running -unlocking +Thread 2: +The impossible happened: mutex is locked simultaneously by two threads: mutex 0x, recursion count 1, owner 1. + at 0x: pthread_mutex_lock (drd_pthread_intercepts.c:?) + by 0x: contender_start (pth_mutex_signal.c:?) + by 0x: vgDrd_thread_wrapper (drd_pthread_intercepts.c:?) + by 0x: start_thread + by 0x: clone (in /...libc...) +mutex 0x was first observed at: + at 0x: pthread_mutex_init (drd_pthread_intercepts.c:?) + by 0x: main (pth_mutex_signal.c:?) + contender locked mutex contender unlocking mutex contender unlocked mutex +unlocking +Thread 1: +Mutex not locked by calling thread: mutex 0x, recursion count 1, owner 2. + at 0x: pthread_mutex_unlock (drd_pthread_intercepts.c:?) + by 0x: main (pth_mutex_signal.c:?) +mutex 0x was first observed at: + at 0x: pthread_mutex_init (drd_pthread_intercepts.c:?) + by 0x: main (pth_mutex_signal.c:?) + joining thread +Mutex still locked at thread exit: mutex 0x, recursion count 1, owner 2. + at 0x: pthread_join (drd_pthread_intercepts.c:?) + by 0x: main (pth_mutex_signal.c:?) +mutex 0x was first observed at: + at 0x: pthread_mutex_init (drd_pthread_intercepts.c:?) + by 0x: main (pth_mutex_signal.c:?) + +Destroying locked mutex: mutex 0x, recursion count 1, owner 0. + at 0x: main (pth_mutex_signal.c:?) +mutex 0x was first observed at: + at 0x: pthread_mutex_init (drd_pthread_intercepts.c:?) + by 0x: main (pth_mutex_signal.c:?) + -ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) +ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0) -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 --- Comment #8 from Paul Floyd --- On FreeBSD I get mutex initialized thread attributes initialized thread created sleeping signalling sleeping nullHandler running Assertion failed: (ret == 0), function contender_start, file pth_mutex_signal.c, line 23. I'll move to the valgrind developers list for further discussion - will update here as necessary. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743
Bart Van Assche changed:
What|Removed |Added
Status|REPORTED|CONFIRMED
Ever confirmed|0 |1
--- Comment #7 from Bart Van Assche ---
Please take a look at commit bf0579a44a44 ("drd: Add a test program that
interrupts pthread_mutex_lock()"). If I run the new test program (the program
attached to this bug with minor changes) natively then the text "unlock"
appears before the text "contender locked mutex". If I run that program under
memcheck then the order of these two output lines is reversed. So I think this
indicates a bug in the Valgrind core.
--
You are receiving this mail because:
You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 --- Comment #6 from Paul Floyd --- Unfortunately this doesn't seem to help on Linux. The problem seems to be that after the signal contender doesn't return to waiting. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 --- Comment #5 from Paul Floyd --- And this post on SO covers why the contender mutex lock fails. https://stackoverflow.com/questions/12781944/pthread-mutex-lock-returns-invalid-argument -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743
--- Comment #4 from Paul Floyd ---
I can also reproduce the problem on FreeBSD.
What I see so far is that main runs all the way to pthread_join. contender does
not run at all, despite the sleep. Only after main is blocked in the
pthread_join does contender run. When it does run, the pthread_mutex_lock
fails.
If I change contender to be
void *contender_start(void *arg)
{
pthread_mutex_t *mutex = arg;
fprintf(stderr, "contender trying to lock mutex\n");
while (pthread_mutex_lock(mutex))
{
perror("contender error locking mutex:");
}
fprintf(stderr, "contender locked mutex\n");
fprintf(stderr, "contender unlocking mutex\n");
pthread_mutex_unlock(mutex);
fprintf(stderr, "contender unlocked mutex\n");
return NULL;
}
then I get
mutex initialized
main mutex locked
thread attributes initialized
contender trying to lock mutex
thread created
sleeping
signalling
joining thread
nullHandler running
contender error locking mutex:: Invalid argument
with the perror printed just once
I'll need to do a fairly long debugging session inside DRD and Valgrind core to
see what is happening there.
with --trace-mutex=yes
==5143== [2] post_mutex_lock error checking mutex 0x7fc000290 rc 1 owner 1
(locking failed)
... mutex for perror ...
==5143== [2] mutex_trylock error checking mutex 0x7fc000290 rc 1 owner 1
--
You are receiving this mail because:
You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 --- Comment #3 from Paul Floyd --- I tried running both the the standalone and DRD under strace. Under DRD I see the main thread getting as far as the kill, then I see futex(0x4a814c0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) (which is one of DRD's mutex lock calls I think) Need to debug more. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 --- Comment #2 from Paul Floyd --- Created attachment 143748 --> https://bugs.kde.org/attachment.cgi?id=143748&action=edit reproducer with DRD extras I tried to merge the extras (wrapper + mutex + CV) that DRD adds. However this does not reproduce the problem. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 --- Comment #1 from Paul Floyd --- Interesting. With Helgrind I get ==13857== Thread #2: Bug in libpthread: write lock granted on mutex/rwlock which is currently wr-held by a different thread ==13857==at 0x403586F: mutex_lock_WRK (hg_intercepts.c:942) ==13857==by 0x4012E2: contender_start (sched.c:16) ==13857==by 0x40384A6: mythread_wrapper (hg_intercepts.c:406) ==13857==by 0x4A2BDD4: start_thread (in /usr/lib64/libpthread-2.17.so) ==13857==by 0x4D3DEAC: clone (in /usr/lib64/libc-2.17.so) This only happens when the mutex has the PTHREAD_PRIO_INHERIT attribute. This doesn't seem to be a new problem - I can reproduce with Valgrind 3.13. The Helgrind code also implies that there could be a race condition between the intercept passing the call on to pthread_mutex_lock and Helgrind registering the return from pthread_mutex_lock. That could be affected by changing the priority. The other possibility is that DRD uses its own CV and mutex to ensure that the thread is joinable before it runs. There could be interference between the guest mutex with elevated priority and this CV/mutex. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 Paul Floyd changed: What|Removed |Added CC||[email protected] -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445743] "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
https://bugs.kde.org/show_bug.cgi?id=445743 Vladimir Nikolov changed: What|Removed |Added CC||[email protected] -- You are receiving this mail because: You are watching all bug changes.
