Re: [dm-devel] [PATCH v3 1/7] libmultipath: fix tur checker locking
On Tue, Feb 13, 2018 at 11:20:39AM +0100, Hannes Reinecke wrote: > On 02/13/2018 04:42 AM, Benjamin Marzinski wrote: > I would rather set 'ct->running' from within the thread; otherwise there > is a chance that pthread_create() returns without error, but the thread > itself hasn't been run yet. That worry is why I put ct->running where I did. If pthread_create returns without an error, but ct->running isn't set yet because the thread sets it, and it hasn't run yet, then it can appear to the checker as if the thread has already completed. This can cause it to return the results of the thread before it actually has completed, and potentially even start up a new thread, which would really mess with the value of ct->running. On the other hand, if you set running before starting the thread, the checker won't do anything until the it hits the timeout, at which point it will cancel the thread. We are guaranteed that if pthread_create returns successfully, we have the correct Thread ID to cancel the thread with, so there's no problem with cancelling a thread that hasn't run. Am I missing something here? -Ben > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke Teamlead Storage & Networking > h...@suse.de +49 911 74053 688 > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton > HRB 21284 (AG Nürnberg) > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v3 1/7] libmultipath: fix tur checker locking
On 02/13/2018 04:42 AM, Benjamin Marzinski wrote: > Commit 6e2423fd fixed a bug where the tur checker could cancel a > detached thread after it had exitted. However in fixing this, the new > code grabbed a mutex (to call condlog) while holding a spin_lock. To > deal with this, I've done away with the holder spin_lock completely, and > replaced it with two atomic variables, based on a suggestion by Martin > Wilck. > > The holder variable works exactly like before. When the checker is > initialized, it is set to 1. When a thread is created it is incremented. > When either the thread or the checker are done with the context, they > atomically decrement the holder variable and check its value. If it > is 0, they free the context. If it is 1, they never touch the context > again. > > The other variable has changed. First, ct->running and ct->thread have > switched uses. ct->thread is now only ever accessed by the checker, > never the thread. If it is non-NULL, a thread has started up, but > hasn't been dealt with by the checker yet. It is also obviously used > by the checker to cancel the thread. > > ct->running is now an atomic variable. When the thread is started > it is set to 1. When the checker wants to kill a thread, it atomically > sets the value to 0 and reads the previous value. If it was 1, > the checker cancels the thread. If it was 0, the nothing needs to be > done. After the checker has dealt with the thread, it sets ct->thread > to NULL. > > Right before the thread finishes and pops the cleanup handler, it > atomically sets the value of ct->running to 0 and reads the previous > value. If it was 1, the thread just pops the cleanup handler and exits. > If it was 0, then the checker is trying to cancel the thread, and so the > thread calls pause(), which is a cancellation point. > > Cc: Martin Wilck> Cc: Bart Van Assche > Signed-off-by: Benjamin Marzinski > --- > libmultipath/checkers/tur.c | 107 > ++-- > 1 file changed, 43 insertions(+), 64 deletions(-) > [ .. ] > @@ -391,19 +368,17 @@ int libcheck_check(struct checker * c) > ct->state = PATH_UNCHECKED; > ct->fd = c->fd; > ct->timeout = c->timeout; > - pthread_spin_lock(>hldr_lock); > - ct->holders++; > - pthread_spin_unlock(>hldr_lock); > + uatomic_add(>holders, 1); > + uatomic_set(>running, 1); > tur_set_async_timeout(c); > setup_thread_attr(, 32 * 1024, 1); > r = pthread_create(>thread, , tur_thread, ct); > pthread_attr_destroy(); > if (r) { > - pthread_spin_lock(>hldr_lock); > - ct->holders--; > - pthread_spin_unlock(>hldr_lock); > - pthread_mutex_unlock(>lock); > + uatomic_sub(>holders, 1); > + uatomic_set(>running, 0); > ct->thread = 0; > + pthread_mutex_unlock(>lock); > condlog(3, "%s: failed to start tur thread, using" > " sync mode", tur_devt(devt, sizeof(devt), ct)); > return tur_check(c->fd, c->timeout, I would rather set 'ct->running' from within the thread; otherwise there is a chance that pthread_create() returns without error, but the thread itself hasn't been run yet. Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v3 1/7] libmultipath: fix tur checker locking
On Mon, 2018-02-12 at 21:42 -0600, Benjamin Marzinski wrote: > Commit 6e2423fd fixed a bug where the tur checker could cancel a > detached thread after it had exitted. However in fixing this, the new > code grabbed a mutex (to call condlog) while holding a spin_lock. To > deal with this, I've done away with the holder spin_lock completely, > and > replaced it with two atomic variables, based on a suggestion by > Martin > Wilck. > > The holder variable works exactly like before. When the checker is > initialized, it is set to 1. When a thread is created it is > incremented. > When either the thread or the checker are done with the context, they > atomically decrement the holder variable and check its value. If it > is 0, they free the context. If it is 1, they never touch the context > again. > > The other variable has changed. First, ct->running and ct->thread > have > switched uses. ct->thread is now only ever accessed by the checker, > never the thread. If it is non-NULL, a thread has started up, but > hasn't been dealt with by the checker yet. It is also obviously used > by the checker to cancel the thread. > > ct->running is now an atomic variable. When the thread is started > it is set to 1. When the checker wants to kill a thread, it > atomically > sets the value to 0 and reads the previous value. If it was 1, > the checker cancels the thread. If it was 0, the nothing needs to be > done. After the checker has dealt with the thread, it sets ct- > >thread > to NULL. > > Right before the thread finishes and pops the cleanup handler, it > atomically sets the value of ct->running to 0 and reads the previous > value. If it was 1, the thread just pops the cleanup handler and > exits. > If it was 0, then the checker is trying to cancel the thread, and so > the > thread calls pause(), which is a cancellation point. > > Cc: Martin Wilck> Cc: Bart Van Assche > Signed-off-by: Benjamin Marzinski > --- Reviewed-by: Martin Wilck Side note: a little voice in my head keeps whispering that it should be possible to solve this problem with just one atomic variable instead of two ("holders" and "running"). But I'm fine with the current approach for now. -- Dr. Martin Wilck , Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH v3 1/7] libmultipath: fix tur checker locking
Commit 6e2423fd fixed a bug where the tur checker could cancel a detached thread after it had exitted. However in fixing this, the new code grabbed a mutex (to call condlog) while holding a spin_lock. To deal with this, I've done away with the holder spin_lock completely, and replaced it with two atomic variables, based on a suggestion by Martin Wilck. The holder variable works exactly like before. When the checker is initialized, it is set to 1. When a thread is created it is incremented. When either the thread or the checker are done with the context, they atomically decrement the holder variable and check its value. If it is 0, they free the context. If it is 1, they never touch the context again. The other variable has changed. First, ct->running and ct->thread have switched uses. ct->thread is now only ever accessed by the checker, never the thread. If it is non-NULL, a thread has started up, but hasn't been dealt with by the checker yet. It is also obviously used by the checker to cancel the thread. ct->running is now an atomic variable. When the thread is started it is set to 1. When the checker wants to kill a thread, it atomically sets the value to 0 and reads the previous value. If it was 1, the checker cancels the thread. If it was 0, the nothing needs to be done. After the checker has dealt with the thread, it sets ct->thread to NULL. Right before the thread finishes and pops the cleanup handler, it atomically sets the value of ct->running to 0 and reads the previous value. If it was 1, the thread just pops the cleanup handler and exits. If it was 0, then the checker is trying to cancel the thread, and so the thread calls pause(), which is a cancellation point. Cc: Martin WilckCc: Bart Van Assche Signed-off-by: Benjamin Marzinski --- libmultipath/checkers/tur.c | 107 ++-- 1 file changed, 43 insertions(+), 64 deletions(-) diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c index b4a5cb2..9155960 100644 --- a/libmultipath/checkers/tur.c +++ b/libmultipath/checkers/tur.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "checkers.h" @@ -44,7 +45,6 @@ struct tur_checker_context { pthread_t thread; pthread_mutex_t lock; pthread_cond_t active; - pthread_spinlock_t hldr_lock; int holders; char message[CHECKER_MSG_LEN]; }; @@ -74,13 +74,12 @@ int libcheck_init (struct checker * c) ct->state = PATH_UNCHECKED; ct->fd = -1; - ct->holders = 1; + uatomic_set(>holders, 1); pthread_cond_init_mono(>active); pthread_mutexattr_init(); pthread_mutexattr_settype(, PTHREAD_MUTEX_RECURSIVE); pthread_mutex_init(>lock, ); pthread_mutexattr_destroy(); - pthread_spin_init(>hldr_lock, PTHREAD_PROCESS_PRIVATE); c->context = ct; return 0; @@ -90,7 +89,6 @@ static void cleanup_context(struct tur_checker_context *ct) { pthread_mutex_destroy(>lock); pthread_cond_destroy(>active); - pthread_spin_destroy(>hldr_lock); free(ct); } @@ -99,16 +97,14 @@ void libcheck_free (struct checker * c) if (c->context) { struct tur_checker_context *ct = c->context; int holders; - pthread_t thread; - - pthread_spin_lock(>hldr_lock); - ct->holders--; - holders = ct->holders; - thread = ct->thread; - pthread_spin_unlock(>hldr_lock); - if (holders) - pthread_cancel(thread); - else + int running; + + running = uatomic_xchg(>running, 0); + if (running) + pthread_cancel(ct->thread); + ct->thread = 0; + holders = uatomic_sub_return(>holders, 1); + if (!holders) cleanup_context(ct); c->context = NULL; } @@ -220,26 +216,12 @@ static void cleanup_func(void *data) { int holders; struct tur_checker_context *ct = data; - pthread_spin_lock(>hldr_lock); - ct->holders--; - holders = ct->holders; - ct->thread = 0; - pthread_spin_unlock(>hldr_lock); + + holders = uatomic_sub_return(>holders, 1); if (!holders) cleanup_context(ct); } -static int tur_running(struct tur_checker_context *ct) -{ - pthread_t thread; - - pthread_spin_lock(>hldr_lock); - thread = ct->thread; - pthread_spin_unlock(>hldr_lock); - - return thread != 0; -} - static void copy_msg_to_tcc(void *ct_p, const char *msg) { struct tur_checker_context *ct = ct_p; @@ -252,7 +234,7 @@ static void copy_msg_to_tcc(void *ct_p, const char *msg) static void *tur_thread(void *ctx) { struct tur_checker_context *ct = ctx; -