Re: [dm-devel] [PATCH 6/9] multipathd: drop lock before calling uev_add_path

2017-04-07 Thread Alban Browaeys
Le vendredi 07 avril 2017 à 01:16 -0500, Benjamin Marzinski a écrit :
> commit c6a18f4541d0a161e2f5fed8c67d9732bf512b37 made uev_update_path
> call uev_add_path while holding the vecs lock, which is deadlocks,
> since
> uev_add_path grabs the vecs lock itself.
> 
> Cc: Alban Browaeys <pra...@yahoo.com>
> Signed-off-by: Benjamin Marzinski <bmarz...@redhat.com>
> ---
>  multipathd/main.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index f671d58..f4ff13e 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -974,6 +974,7 @@ uev_update_path (struct uevent *uev, struct
> vectors * vecs)
>   struct path * pp;
>   struct config *conf;
>   int disable_changed_wwids;
> + int needs_reinit = 0;
>  
>   conf = get_multipath_config();
>   disable_changed_wwids = conf->disable_changed_wwids;
> @@ -1009,7 +1010,7 @@ uev_update_path (struct uevent *uev, struct
> vectors * vecs)
>   }
>  
>   if (pp->initialized == INIT_REQUESTED_UDEV)
> - retval = uev_add_path(uev, vecs, 1);
> + needs_reinit = 1;
>   else if (mpp && ro >= 0) {
>   condlog(2, "%s: update path write_protect to
> '%d' (uevent)", uev->kernel, ro);
>  
> @@ -1041,7 +1042,8 @@ out:
>  
>   condlog(0, "%s: spurious uevent, path not found",
> uev->kernel);
>   }
> -
> + if (needs_reinit)
> + retval = uev_add_path(uev, vecs, 1);
>   return retval;
>  }


Thanks a lot. 
The patch fixes the hang in multipathd.
I made a minor modification for the test as I test above multipath-
tools 0.6.4. 
I stripped the need_do_map parameter from uev_add_path.

Thanks
Alban

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

[dm-devel] multipathd: locks itself in udev trigger

2017-04-06 Thread Alban Browaeys
Bcache backing partition bcache0 triggers an udev add event that is handled by 
multipathd.
 Somewhat the other "bare" paritions sda do not.

The issue is when this event triggers the thread lock itself since commit
 c6a18f4541d0a161e2f5fed8c67d9732bf512b37 "fix INIT_REQUESTED_UDEV code" .
This change in "uev_update_path" moved "uev_add_path(uev, vecs);" under the 
fast lock (non recursive)
"lock(>lock);".  As uev_add_path too calls "lock(>lock);" 
multipathd hangs in this second call
in the same thread.

Then "multipathd list paths" or other multipathd commands returns timeout.
This also postpone systemd shutdown/reboot by a minute while it waits for 
multipathd service to stop.
 
The backtrace was:
(gdb) t a a bt

Thread 6 (Thread 0x7f922663c700 (LWP 545)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x7f9228bd3602 in ?? () from /usr/lib/x86_64-linux-gnu/liburcu.so.4
#2  0x7f92289ba424 in start_thread (arg=0x7f922663c700) at 
pthread_create.c:333
#3  0x7f922825c9bf in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:105

Thread 5 (Thread 0x7f9229734700 (LWP 543)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x7f92289bcb85 in __GI___pthread_mutex_lock (mutex=0x556fefc43080) at 
../nptl/pthread_mutex_lock.c:80
#2  0x556fedcbe42d in lock (a=0x556fefc43080) at ../libmultipath/lock.h:12
#3  uev_add_path (vecs=0x556fefc43080, uev=, uev=) at main.c:627
#4  0x556fedcbe9c9 in uev_update_path (uev=0x7f9220001510, 
vecs=0x556fefc43080) at main.c:998
#5  0x556fedcbecdb in uev_trigger (uev=0x7f9220001510, 
trigger_data=0x556fefc43080) at main.c:1146
#6  0x7f92292091b2 in service_uevq (tmpq=tmpq@entry=0x7f9229733b10) at 
uevent.c:89
#7  0x7f9229209280 in uevent_dispatch (uev_trigger=, 
trigger_data=) at uevent.c:145
#8  0x556fedcbc2cc in uevqloop (ap=0x556fefc43080) at main.c:1177
#9  0x7f92289ba424 in start_thread (arg=0x7f9229734700) at 
pthread_create.c:333
#10 0x7f922825c9bf in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:105

Thread 4 (Thread 0x7f9229745700 (LWP 542)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x7f92289bcb85 in __GI___pthread_mutex_lock (mutex=0x556fefc43080) at 
../nptl/pthread_mutex_lock.c:80
#2  0x556fedcbfb45 in lock (a=0x556fefc43080) at ../libmultipath/lock.h:12
#3  checkerloop (ap=0x556fefc43080) at main.c:1827
#4  0x7f92289ba424 in start_thread (arg=0x7f9229745700) at 
pthread_create.c:333
#5  0x7f922825c9bf in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:105

Thread 3 (Thread 0x7f9229810700 (LWP 541)):
#0  0x7f9228253611 in __GI_ppoll (fds=0x7f92180021e0, nfds=nfds@entry=1, 
timeout=, timeout@entry=0x556fedecc020 , 
sigmask=sigmask@entry=0x7f922980fa60) at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x556fedcc13ba in ppoll (__ss=0x7f922980fa60, __timeout=0x556fedecc020 
, __nfds=1, __fds=) at 
/usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2  uxsock_listen (uxsock_trigger=0x556fedcbb520 , 
trigger_data=0x556fefc43080) at uxlsnr.c:204
#3  0x556fedcbbd5a in uxlsnrloop (ap=0x556fefc43080) at main.c:1239
#4  0x7f92289ba424 in start_thread (arg=0x7f9229810700) at 
pthread_create.c:333
#5  0x7f922825c9bf in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:105

Thread 2 (Thread 0x7f9229851700 (LWP 540)):
#0  0x7f922825354d in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x7f9229209f3a in poll (__timeout=, __nfds=1, 
__fds=0x7f9229850a88) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
#2  uevent_listen (udev=0x556fefbec040) at uevent.c:515
#3  0x556fedcbc235 in ueventloop (ap=0x556fefbec040) at main.c:1166
#4  0x7f92289ba424 in start_thread (arg=0x7f9229851700) at 
pthread_create.c:333
#5  0x7f922825c9bf in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:105

Thread 1 (Thread 0x7f9229746f00 (LWP 537)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at 
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x556fedcc0aba in child (param=) at main.c:2407
#2  0x556fedcbb0df in main (argc=, argv=0x7fff81f9a0d8) at 
main.c:2664

As a local workaround I moved "uev_add_path" in "uev_update_path" back out of 
the lock umbrella
 while I keep it under pp->initialized check.
https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=859157;filename=fix_uev_update_path_udevadd_recursive_lock_deadlock.diff;msg=5
This change fixes the reboot delay but I have no multipath setup thus cannot 
detect any regressions.

-Alban

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel