Re: [dm-devel] [PATCH v3 1/7] libmultipath: fix tur checker locking

2018-02-13 Thread Benjamin Marzinski
On Tue, Feb 13, 2018 at 11:20:39AM +0100, Hannes Reinecke wrote:
> On 02/13/2018 04:42 AM, Benjamin Marzinski wrote:

> I would rather set 'ct->running' from within the thread; otherwise there
> is a chance that pthread_create() returns without error, but the thread
> itself hasn't been run yet.

That worry is why I put ct->running where I did. If pthread_create
returns without an error, but ct->running isn't set yet because the
thread sets it, and it hasn't run yet, then it can appear to the checker
as if the thread has already completed.  This can cause it to return the
results of the thread before it actually has completed, and potentially
even start up a new thread, which would really mess with the value of
ct->running.

On the other hand, if you set running before starting the thread, the
checker won't do anything until the it hits the timeout, at which point
it will cancel the thread.  We are guaranteed that if pthread_create
returns successfully, we have the correct Thread ID to cancel the thread
with, so there's no problem with cancelling a thread that hasn't run.

Am I missing something here?

-Ben

> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke  Teamlead Storage & Networking
> h...@suse.de +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v3 1/7] libmultipath: fix tur checker locking

2018-02-13 Thread Hannes Reinecke
On 02/13/2018 04:42 AM, Benjamin Marzinski wrote:
> Commit 6e2423fd fixed a bug where the tur checker could cancel a
> detached thread after it had exitted. However in fixing this, the new
> code grabbed a mutex (to call condlog) while holding a spin_lock.  To
> deal with this, I've done away with the holder spin_lock completely, and
> replaced it with two atomic variables, based on a suggestion by Martin
> Wilck.
> 
> The holder variable works exactly like before.  When the checker is
> initialized, it is set to 1. When a thread is created it is incremented.
> When either the thread or the checker are done with the context, they
> atomically decrement the holder variable and check its value. If it
> is 0, they free the context. If it is 1, they never touch the context
> again.
> 
> The other variable has changed. First, ct->running and ct->thread have
> switched uses. ct->thread is now only ever accessed by the checker,
> never the thread.  If it is non-NULL, a thread has started up, but
> hasn't been dealt with by the checker yet. It is also obviously used
> by the checker to cancel the thread.
> 
> ct->running is now an atomic variable.  When the thread is started
> it is set to 1. When the checker wants to kill a thread, it atomically
> sets the value to 0 and reads the previous value.  If it was 1,
> the checker cancels the thread. If it was 0, the nothing needs to be
> done.  After the checker has dealt with the thread, it sets ct->thread
> to NULL.
> 
> Right before the thread finishes and pops the cleanup handler, it
> atomically sets the value of ct->running to 0 and reads the previous
> value. If it was 1, the thread just pops the cleanup handler and exits.
> If it was 0, then the checker is trying to cancel the thread, and so the
> thread calls pause(), which is a cancellation point.
> 
> Cc: Martin Wilck 
> Cc: Bart Van Assche 
> Signed-off-by: Benjamin Marzinski 
> ---
>  libmultipath/checkers/tur.c | 107 
> ++--
>  1 file changed, 43 insertions(+), 64 deletions(-)
> 
[ .. ]
> @@ -391,19 +368,17 @@ int libcheck_check(struct checker * c)
>   ct->state = PATH_UNCHECKED;
>   ct->fd = c->fd;
>   ct->timeout = c->timeout;
> - pthread_spin_lock(>hldr_lock);
> - ct->holders++;
> - pthread_spin_unlock(>hldr_lock);
> + uatomic_add(>holders, 1);
> + uatomic_set(>running, 1);
>   tur_set_async_timeout(c);
>   setup_thread_attr(, 32 * 1024, 1);
>   r = pthread_create(>thread, , tur_thread, ct);
>   pthread_attr_destroy();
>   if (r) {
> - pthread_spin_lock(>hldr_lock);
> - ct->holders--;
> - pthread_spin_unlock(>hldr_lock);
> - pthread_mutex_unlock(>lock);
> + uatomic_sub(>holders, 1);
> + uatomic_set(>running, 0);
>   ct->thread = 0;
> + pthread_mutex_unlock(>lock);
>   condlog(3, "%s: failed to start tur thread, using"
>   " sync mode", tur_devt(devt, sizeof(devt), ct));
>   return tur_check(c->fd, c->timeout,
I would rather set 'ct->running' from within the thread; otherwise there
is a chance that pthread_create() returns without error, but the thread
itself hasn't been run yet.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH v3 1/7] libmultipath: fix tur checker locking

2018-02-13 Thread Martin Wilck
On Mon, 2018-02-12 at 21:42 -0600, Benjamin Marzinski wrote:
> Commit 6e2423fd fixed a bug where the tur checker could cancel a
> detached thread after it had exitted. However in fixing this, the new
> code grabbed a mutex (to call condlog) while holding a spin_lock.  To
> deal with this, I've done away with the holder spin_lock completely,
> and
> replaced it with two atomic variables, based on a suggestion by
> Martin
> Wilck.
> 
> The holder variable works exactly like before.  When the checker is
> initialized, it is set to 1. When a thread is created it is
> incremented.
> When either the thread or the checker are done with the context, they
> atomically decrement the holder variable and check its value. If it
> is 0, they free the context. If it is 1, they never touch the context
> again.
> 
> The other variable has changed. First, ct->running and ct->thread
> have
> switched uses. ct->thread is now only ever accessed by the checker,
> never the thread.  If it is non-NULL, a thread has started up, but
> hasn't been dealt with by the checker yet. It is also obviously used
> by the checker to cancel the thread.
> 
> ct->running is now an atomic variable.  When the thread is started
> it is set to 1. When the checker wants to kill a thread, it
> atomically
> sets the value to 0 and reads the previous value.  If it was 1,
> the checker cancels the thread. If it was 0, the nothing needs to be
> done.  After the checker has dealt with the thread, it sets ct-
> >thread
> to NULL.
> 
> Right before the thread finishes and pops the cleanup handler, it
> atomically sets the value of ct->running to 0 and reads the previous
> value. If it was 1, the thread just pops the cleanup handler and
> exits.
> If it was 0, then the checker is trying to cancel the thread, and so
> the
> thread calls pause(), which is a cancellation point.
> 
> Cc: Martin Wilck 
> Cc: Bart Van Assche 
> Signed-off-by: Benjamin Marzinski 
> ---

Reviewed-by: Martin Wilck 

Side note: a little voice in my head keeps whispering that it should be
possible to solve this problem with just one atomic variable instead of
two ("holders" and "running"). But I'm fine with the current approach
for now.

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

[dm-devel] [PATCH v3 1/7] libmultipath: fix tur checker locking

2018-02-12 Thread Benjamin Marzinski
Commit 6e2423fd fixed a bug where the tur checker could cancel a
detached thread after it had exitted. However in fixing this, the new
code grabbed a mutex (to call condlog) while holding a spin_lock.  To
deal with this, I've done away with the holder spin_lock completely, and
replaced it with two atomic variables, based on a suggestion by Martin
Wilck.

The holder variable works exactly like before.  When the checker is
initialized, it is set to 1. When a thread is created it is incremented.
When either the thread or the checker are done with the context, they
atomically decrement the holder variable and check its value. If it
is 0, they free the context. If it is 1, they never touch the context
again.

The other variable has changed. First, ct->running and ct->thread have
switched uses. ct->thread is now only ever accessed by the checker,
never the thread.  If it is non-NULL, a thread has started up, but
hasn't been dealt with by the checker yet. It is also obviously used
by the checker to cancel the thread.

ct->running is now an atomic variable.  When the thread is started
it is set to 1. When the checker wants to kill a thread, it atomically
sets the value to 0 and reads the previous value.  If it was 1,
the checker cancels the thread. If it was 0, the nothing needs to be
done.  After the checker has dealt with the thread, it sets ct->thread
to NULL.

Right before the thread finishes and pops the cleanup handler, it
atomically sets the value of ct->running to 0 and reads the previous
value. If it was 1, the thread just pops the cleanup handler and exits.
If it was 0, then the checker is trying to cancel the thread, and so the
thread calls pause(), which is a cancellation point.

Cc: Martin Wilck 
Cc: Bart Van Assche 
Signed-off-by: Benjamin Marzinski 
---
 libmultipath/checkers/tur.c | 107 ++--
 1 file changed, 43 insertions(+), 64 deletions(-)

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index b4a5cb2..9155960 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "checkers.h"
 
@@ -44,7 +45,6 @@ struct tur_checker_context {
pthread_t thread;
pthread_mutex_t lock;
pthread_cond_t active;
-   pthread_spinlock_t hldr_lock;
int holders;
char message[CHECKER_MSG_LEN];
 };
@@ -74,13 +74,12 @@ int libcheck_init (struct checker * c)
 
ct->state = PATH_UNCHECKED;
ct->fd = -1;
-   ct->holders = 1;
+   uatomic_set(>holders, 1);
pthread_cond_init_mono(>active);
pthread_mutexattr_init();
pthread_mutexattr_settype(, PTHREAD_MUTEX_RECURSIVE);
pthread_mutex_init(>lock, );
pthread_mutexattr_destroy();
-   pthread_spin_init(>hldr_lock, PTHREAD_PROCESS_PRIVATE);
c->context = ct;
 
return 0;
@@ -90,7 +89,6 @@ static void cleanup_context(struct tur_checker_context *ct)
 {
pthread_mutex_destroy(>lock);
pthread_cond_destroy(>active);
-   pthread_spin_destroy(>hldr_lock);
free(ct);
 }
 
@@ -99,16 +97,14 @@ void libcheck_free (struct checker * c)
if (c->context) {
struct tur_checker_context *ct = c->context;
int holders;
-   pthread_t thread;
-
-   pthread_spin_lock(>hldr_lock);
-   ct->holders--;
-   holders = ct->holders;
-   thread = ct->thread;
-   pthread_spin_unlock(>hldr_lock);
-   if (holders)
-   pthread_cancel(thread);
-   else
+   int running;
+
+   running = uatomic_xchg(>running, 0);
+   if (running)
+   pthread_cancel(ct->thread);
+   ct->thread = 0;
+   holders = uatomic_sub_return(>holders, 1);
+   if (!holders)
cleanup_context(ct);
c->context = NULL;
}
@@ -220,26 +216,12 @@ static void cleanup_func(void *data)
 {
int holders;
struct tur_checker_context *ct = data;
-   pthread_spin_lock(>hldr_lock);
-   ct->holders--;
-   holders = ct->holders;
-   ct->thread = 0;
-   pthread_spin_unlock(>hldr_lock);
+
+   holders = uatomic_sub_return(>holders, 1);
if (!holders)
cleanup_context(ct);
 }
 
-static int tur_running(struct tur_checker_context *ct)
-{
-   pthread_t thread;
-
-   pthread_spin_lock(>hldr_lock);
-   thread = ct->thread;
-   pthread_spin_unlock(>hldr_lock);
-
-   return thread != 0;
-}
-
 static void copy_msg_to_tcc(void *ct_p, const char *msg)
 {
struct tur_checker_context *ct = ct_p;
@@ -252,7 +234,7 @@ static void copy_msg_to_tcc(void *ct_p, const char *msg)
 static void *tur_thread(void *ctx)
 {
struct tur_checker_context *ct = ctx;
-