On 01/07/16 11:56, Sergey Fedorov wrote: > On 30/06/16 13:35, Sergey Fedorov wrote: >> On 30/06/16 13:32, Alex Bennée wrote: >>> Sergey Fedorov <serge.f...@gmail.com> writes: >>> >>>> On 29/06/16 19:17, Alex Bennée wrote: >>>>> So I think there is a deadlock we can get with the async work: >>>>> >>>>> (gdb) thread apply all bt >>>>> >>>>> Thread 11 (Thread 0x7ffefeca7700 (LWP 2912)): >>>>> #0 pthread_cond_wait@@GLIBC_2.3.2 () at >>>>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 >>>>> #1 0x00005555555cb777 in wait_cpu_work () at >>>>> /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:155 >>>>> #2 0x00005555555a0cee in wait_safe_cpu_work () at >>>>> /home/alex/lsrc/qemu/qemu.git/cpu-exec-common.c:87 >>>>> #3 0x00005555555cb8fe in cpu_exec_end (cpu=0x555555bb67e0) at >>>>> /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:222 >>>>> #4 0x00005555555cc7a7 in cpu_loop (env=0x555555bbea58) at >>>>> /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:749 >>>>> #5 0x00005555555db0b2 in clone_func (arg=0x7fffffffc9c0) at >>>>> /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:5424 >>>>> #6 0x00007ffff6bed6fa in start_thread (arg=0x7ffefeca7700) at >>>>> pthread_create.c:333 >>>>> #7 0x00007ffff6923b5d in clone () at >>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >>>>> >>>>> <a bunch of other threads doing the same and then...> >>>>> >>>>> Thread 3 (Thread 0x7ffff7f38700 (LWP 2904)): >>>>> #0 0x00005555555faf5d in safe_syscall_base () >>>>> #1 0x00005555555cfeaf in safe_futex (uaddr=0x7ffff528a0a4, op=128, >>>>> val=1, timeout=0x0, uaddr2=0x0, val3=-162668384) >>>>> at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:706 >>>>> #2 0x00005555555dd7cc in do_futex (uaddr=4132298916, op=128, val=1, >>>>> timeout=0, uaddr2=0, val3=-162668384) >>>>> at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:6246 >>>>> #3 0x00005555555e8cdb in do_syscall (cpu_env=0x555555a81118, num=240, >>>>> arg1=-162668380, arg2=128, arg3=1, arg4=0, arg5=0, arg6=-162668384, >>>>> arg7=0, arg8=0) at >>>>> /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:10642 >>>>> #4 0x00005555555cd20e in cpu_loop (env=0x555555a81118) at >>>>> /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:883 >>>>> #5 0x00005555555db0b2 in clone_func (arg=0x7fffffffc9c0) at >>>>> /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:5424 >>>>> #6 0x00007ffff6bed6fa in start_thread (arg=0x7ffff7f38700) at >>>>> pthread_create.c:333 >>>>> #7 0x00007ffff6923b5d in clone () at >>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >>>>> >>>>> So everything is stalled awaiting this thread waking up and draining >>>>> its queue. So for linux-user I think we need some mechanism to kick >>>>> these syscalls which I assume means throwing a signal at it. >>>> Nice catch! How did you get it? >>> Running pigz (armhf, debian) to compress stuff. >>> >>>> We always go through cpu_exec_end() >>>> before serving a guest syscall and always go through cpu_exec_start() >>>> before entering the guest code execution loop. If we always schedule >>>> safe work on the current thread's queue then I think there's a way to >>>> make it safe and avoid kicking syscalls. >>> Not let the signals complete until safe work is done? >> I'm thinking of waiting for completion of safe works in cpu_exec_start() >> as well as in cpu_exec_end(). > I found a mistake in my code which causes deadlocks in my run of pigz. > Could you also try running it after applying the following patch? > > diff --git a/linux-user/main.c b/linux-user/main.c > index 6da3bb32186b..1dca55145c56 100644 > --- a/linux-user/main.c > +++ b/linux-user/main.c > @@ -214,7 +214,7 @@ static inline void cpu_exec_end(CPUState *cpu) > cpu->running = false; > tcg_pending_cpus--; > if (!tcg_pending_cpus) { > - pthread_cond_broadcast(&exclusive_cond); > + signal_cpu_work(); > } > exclusive_idle(); > flush_queued_work(cpu); > >
Or better this one: diff --git a/linux-user/main.c b/linux-user/main.c index 6da3bb32186b..aba550b69611 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -111,7 +111,6 @@ static pthread_mutex_t cpu_list_mutex = PTHREAD_MUTEX_INITIALIZER; static pthread_mutex_t exclusive_lock = PTHREAD_MUTEX_INITIALIZER; static pthread_cond_t exclusive_cond = PTHREAD_COND_INITIALIZER; static pthread_cond_t exclusive_resume = PTHREAD_COND_INITIALIZER; -static pthread_cond_t work_cond = PTHREAD_COND_INITIALIZER; static bool exclusive_pending; /* Make sure everything is in a consistent state for calling fork(). */ @@ -140,7 +139,6 @@ void fork_end(int child) pthread_mutex_init(&cpu_list_mutex, NULL); pthread_cond_init(&exclusive_cond, NULL); pthread_cond_init(&exclusive_resume, NULL); - pthread_cond_init(&work_cond, NULL); qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock); gdbserver_fork(thread_cpu); } else { @@ -151,12 +149,12 @@ void fork_end(int child) void wait_cpu_work(void) { - pthread_cond_wait(&work_cond, &exclusive_lock); + pthread_cond_wait(&exclusive_cond, &exclusive_lock); } void signal_cpu_work(void) { - pthread_cond_broadcast(&work_cond); + pthread_cond_broadcast(&exclusive_cond); } /* Wait for pending exclusive operations to complete. The exclusive lock @@ -214,7 +212,7 @@ static inline void cpu_exec_end(CPUState *cpu) cpu->running = false; tcg_pending_cpus--; if (!tcg_pending_cpus) { - pthread_cond_broadcast(&exclusive_cond); + signal_cpu_work(); } exclusive_idle(); flush_queued_work(cpu); Thanks, Sergey