On Thu, Apr 06, 2017 at 02:24:07PM +0200, Alban Browaeys wrote:
> Bcache backing partition bcache0 triggers an udev add event that is handled
> by multipathd.
> Somewhat the other "bare" paritions sda do not.
>
> The issue is when this event triggers the thread lock itself since commit
> c6a18f4541d0a161e2f5fed8c67d9732bf512b37 "fix INIT_REQUESTED_UDEV code" .
> This change in "uev_update_path" moved "uev_add_path(uev, vecs);" under the
> fast lock (non recursive)
> "lock(>lock);". As uev_add_path too calls "lock(>lock);"
> multipathd hangs in this second call
> in the same thread.
Oops. I'll post a fix for this shortly.
Thanks
-Ben
> Then "multipathd list paths" or other multipathd commands returns timeout.
> This also postpone systemd shutdown/reboot by a minute while it waits for
> multipathd service to stop.
>
> The backtrace was:
> (gdb) t a a bt
>
> Thread 6 (Thread 0x7f922663c700 (LWP 545)):
> #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
> #1 0x7f9228bd3602 in ?? () from /usr/lib/x86_64-linux-gnu/liburcu.so.4
> #2 0x7f92289ba424 in start_thread (arg=0x7f922663c700) at
> pthread_create.c:333
> #3 0x7f922825c9bf in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
>
> Thread 5 (Thread 0x7f9229734700 (LWP 543)):
> #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1 0x7f92289bcb85 in __GI___pthread_mutex_lock (mutex=0x556fefc43080) at
> ../nptl/pthread_mutex_lock.c:80
> #2 0x556fedcbe42d in lock (a=0x556fefc43080) at ../libmultipath/lock.h:12
> #3 uev_add_path (vecs=0x556fefc43080, uev=, uev= out>) at main.c:627
> #4 0x556fedcbe9c9 in uev_update_path (uev=0x7f9220001510,
> vecs=0x556fefc43080) at main.c:998
> #5 0x556fedcbecdb in uev_trigger (uev=0x7f9220001510,
> trigger_data=0x556fefc43080) at main.c:1146
> #6 0x7f92292091b2 in service_uevq (tmpq=tmpq@entry=0x7f9229733b10) at
> uevent.c:89
> #7 0x7f9229209280 in uevent_dispatch (uev_trigger=,
> trigger_data=) at uevent.c:145
> #8 0x556fedcbc2cc in uevqloop (ap=0x556fefc43080) at main.c:1177
> #9 0x7f92289ba424 in start_thread (arg=0x7f9229734700) at
> pthread_create.c:333
> #10 0x7f922825c9bf in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
>
> Thread 4 (Thread 0x7f9229745700 (LWP 542)):
> #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1 0x7f92289bcb85 in __GI___pthread_mutex_lock (mutex=0x556fefc43080) at
> ../nptl/pthread_mutex_lock.c:80
> #2 0x556fedcbfb45 in lock (a=0x556fefc43080) at ../libmultipath/lock.h:12
> #3 checkerloop (ap=0x556fefc43080) at main.c:1827
> #4 0x7f92289ba424 in start_thread (arg=0x7f9229745700) at
> pthread_create.c:333
> #5 0x7f922825c9bf in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
>
> Thread 3 (Thread 0x7f9229810700 (LWP 541)):
> #0 0x7f9228253611 in __GI_ppoll (fds=0x7f92180021e0, nfds=nfds@entry=1,
> timeout=, timeout@entry=0x556fedecc020 ,
> sigmask=sigmask@entry=0x7f922980fa60) at ../sysdeps/unix/sysv/linux/ppoll.c:39
> #1 0x556fedcc13ba in ppoll (__ss=0x7f922980fa60,
> __timeout=0x556fedecc020 , __nfds=1, __fds=) at
> /usr/include/x86_64-linux-gnu/bits/poll2.h:77
> #2 uxsock_listen (uxsock_trigger=0x556fedcbb520 ,
> trigger_data=0x556fefc43080) at uxlsnr.c:204
> #3 0x556fedcbbd5a in uxlsnrloop (ap=0x556fefc43080) at main.c:1239
> #4 0x7f92289ba424 in start_thread (arg=0x7f9229810700) at
> pthread_create.c:333
> #5 0x7f922825c9bf in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
>
> Thread 2 (Thread 0x7f9229851700 (LWP 540)):
> #0 0x7f922825354d in poll () at ../sysdeps/unix/syscall-template.S:84
> #1 0x7f9229209f3a in poll (__timeout=, __nfds=1,
> __fds=0x7f9229850a88) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
> #2 uevent_listen (udev=0x556fefbec040) at uevent.c:515
> #3 0x556fedcbc235 in ueventloop (ap=0x556fefbec040) at main.c:1166
> #4 0x7f92289ba424 in start_thread (arg=0x7f9229851700) at
> pthread_create.c:333
> #5 0x7f922825c9bf in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
>
> Thread 1 (Thread 0x7f9229746f00 (LWP 537)):
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at
> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1 0x556fedcc0aba in child (param=) at main.c:2407
> #2 0x556fedcbb0df in main (argc=, argv=0x7fff81f9a0d8) at
> main.c:2664
>
> As a local workaround I moved "uev_add_path" in "uev_update_path" back out of
> the lock umbrella
> while I keep it under pp->initialized check.
> https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=859157;filename=fix_uev_update_path_udevadd_recursive_lock_deadlock.diff;msg=5
> This change fixes the reboot delay but I have no multipath setup thus cannot
> detect any regressions.
>
> -Alban
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel