On 02/22/13 18:33, Jan Kiszka wrote: > On 2013-02-20 11:28, Stefan Hajnoczi wrote: >> Convert iohandler_select_fill() and iohandler_select_poll() to use >> GPollFD instead of rfds/wfds/xfds. > > Since this commmit, I'm getting QEMU lock-ups, apparently slirp is > involved (the Linux guest tries to start its network at this point): > > (gdb) thread apply all bt > > Thread 3 (Thread 0x7fffed0e3700 (LWP 26788)): > #0 __lll_lock_wait () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 > #1 0x00007ffff44c7294 in _L_lock_999 () from /lib64/libpthread.so.0 > #2 0x00007ffff44c70aa in __pthread_mutex_lock (mutex=0x5555560afcc0) at > pthread_mutex_lock.c:61 > #3 0x00005555558945e9 in qemu_mutex_lock (mutex=<value optimized out>) at > /data/qemu/util/qemu-thread-posix.c:57 > #4 0x00005555557d9da5 in kvm_cpu_exec (env=0x55555689c9f0) at > /data/qemu/kvm-all.c:1564 > #5 0x0000555555780091 in qemu_kvm_cpu_thread_fn (arg=0x55555689c9f0) at > /data/qemu/cpus.c:759 > #6 0x00007ffff44c4a3f in start_thread (arg=0x7fffed0e3700) at > pthread_create.c:297 > #7 0x00007ffff2fb871d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 > #8 0x0000000000000000 in ?? () > > Thread 2 (Thread 0x7fffed8e4700 (LWP 26787)): > #0 sem_timedwait () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:103 > #1 0x0000555555894a93 in qemu_sem_timedwait (sem=0x555556090020, ms=<value > optimized out>) at /data/qemu/util/qemu-thread-posix.c:237 > #2 0x000055555575116e in worker_thread (unused=<value optimized out>) at > /data/qemu/thread-pool.c:88 > #3 0x00007ffff44c4a3f in start_thread (arg=0x7fffed8e4700) at > pthread_create.c:297 > #4 0x00007ffff2fb871d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 > #5 0x0000000000000000 in ?? () > > Thread 1 (Thread 0x7ffff7fab760 (LWP 26784)): > #0 0x00007ffff44cc763 in recvfrom () at ../sysdeps/unix/syscall-template.S:82 > #1 0x000055555574b67d in recvfrom (so=0x555556bd2f50) at > /usr/include/bits/socket2.h:77 > #2 sorecvfrom (so=0x555556bd2f50) at /data/qemu/slirp/socket.c:498 > #3 0x000055555574a160 in slirp_pollfds_poll (pollfds=0x555556511240, > select_error=0) at /data/qemu/slirp/slirp.c:619 > #4 0x000055555570ec99 in main_loop_wait (nonblocking=<value optimized out>) > at /data/qemu/main-loop.c:514 > #5 0x000055555577821d in main_loop (argc=<value optimized out>, argv=<value > optimized out>, envp=<value optimized out>) at /data/qemu/vl.c:2002 > #6 main (argc=<value optimized out>, argv=<value optimized out>, envp=<value > optimized out>) at /data/qemu/vl.c:4334 > > Thread 1 blocks in recvfrom, not returning, not releasing the global lock. > > Any idea?
Well I guess recvfrom() shouldn't block in slirp / thread 1, so maybe slirp_pollfds_poll() finds readiness where it shouldn't -- we possibly shouldn't even call sorecvfrom(). sorecvfrom() belongs to the UDP branch in slirp_pollfds_poll(). Could this be related to the change we discussed in <http://thread.gmane.org/gmane.comp.emulators.qemu/192801/focus=193181>? I guess trace calls would be handy... FWIW I find it interesting that slirp doesn't hang after patch 05/10 (which is the slirp conversion) but here. This patch (06/10) converts qemu iohandler. It looks as if qemu_iohandler_poll(), called just before slirp_pollfds_poll() in main_loop_wait(), "stole" data from slirp. Mixup between file descriptors? Laszlo