Re: [dm-devel] [PATCH v2 1/7] libmultipath: fix tur checker locking
On Sat, 2018-02-10 at 17:11 +0100, Martin Wilck wrote: > On Fri, 2018-02-09 at 18:36 -0600, Benjamin Marzinski wrote: > > On Sat, Feb 10, 2018 at 12:36:05AM +0100, Martin Wilck wrote: > > > On Sat, 2018-02-10 at 00:28 +0100, Martin Wilck wrote: > > > > Maybe it's easier than we thought. Attached is a patch on top > > > > of > > > > yours that I think might work, please have a look. > > > > > > > > > > That one didn't even compile. This one is better. > > > > > > Martin > > > > How about this one instead. The idea is that once we are in the > > cleanup > > handler, we just cleanup and exit. But before we enter it, we > > atomically > > exchange running, and if running was 0, we pause(), since the > > checker > > is > > either about to cancel us, or already has. > > > > Yes, that should work. Nice. ... but I just realized that we don't rcu_register_thread() the TUR thread. Maybe we should if we use RCU primitives? Martin -- Dr. Martin Wilck, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v2 1/7] libmultipath: fix tur checker locking
On Fri, 2018-02-09 at 18:36 -0600, Benjamin Marzinski wrote: > On Sat, Feb 10, 2018 at 12:36:05AM +0100, Martin Wilck wrote: > > On Sat, 2018-02-10 at 00:28 +0100, Martin Wilck wrote: > > > Maybe it's easier than we thought. Attached is a patch on top of > > > yours that I think might work, please have a look. > > > > > > > That one didn't even compile. This one is better. > > > > Martin > > How about this one instead. The idea is that once we are in the > cleanup > handler, we just cleanup and exit. But before we enter it, we > atomically > exchange running, and if running was 0, we pause(), since the checker > is > either about to cancel us, or already has. > Yes, that should work. Nice. Regards, Martin > diff --git a/libmultipath/checkers/tur.c > b/libmultipath/checkers/tur.c > index 894ad41..3774a17 100644 > --- a/libmultipath/checkers/tur.c > +++ b/libmultipath/checkers/tur.c > @@ -214,15 +214,12 @@ retry: > > static void cleanup_func(void *data) > { > - int running, holders; > + int holders; > struct tur_checker_context *ct = data; > > - running = uatomic_xchg(>running, 0); > holders = uatomic_sub_return(>holders, 1); > if (!holders) > cleanup_context(ct); > - if (!running) > - pause(); > } > > static int tur_running(struct tur_checker_context *ct) > @@ -242,7 +239,7 @@ static void copy_msg_to_tcc(void *ct_p, const > char *msg) > static void *tur_thread(void *ctx) > { > struct tur_checker_context *ct = ctx; > - int state; > + int state, running; > char devt[32]; > > condlog(3, "%s: tur checker starting up", > @@ -268,6 +265,11 @@ static void *tur_thread(void *ctx) > > condlog(3, "%s: tur checker finished, state %s", > tur_devt(devt, sizeof(devt), ct), > checker_state_name(state)); > + > + running = uatomic_xchg(>running, 0); > + if (!running) > + pause(); > + > tur_thread_cleanup_pop(ct); > > return ((void *)0); > > > > > > -- > > Dr. Martin Wilck, Tel. +49 (0)911 74053 2107 > > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham > > Norton > > HRB 21284 (AG Nürnberg) > > From afb9c7de3658d49c4f28f6b9ee618a87b806ecdd Mon Sep 17 00:00:00 > > 2001 > > From: Martin Wilck > > Date: Sat, 10 Feb 2018 00:22:17 +0100 > > Subject: [PATCH] tur checker: make sure pthread_cancel isn't called > > for exited > > thread > > > > If we enter the cleanup function as the result of a pthread_cancel > > by another > > thread, we don't need to wait for a cancellation any more. If we > > exit > > regularly, just tell the other thread not to try to cancel us. > > --- > > libmultipath/checkers/tur.c | 9 + > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > diff --git a/libmultipath/checkers/tur.c > > b/libmultipath/checkers/tur.c > > index 894ad41c89c3..5d2b36bfa883 100644 > > --- a/libmultipath/checkers/tur.c > > +++ b/libmultipath/checkers/tur.c > > @@ -214,15 +214,13 @@ retry: > > > > static void cleanup_func(void *data) > > { > > - int running, holders; > > + int holders; > > struct tur_checker_context *ct = data; > > > > - running = uatomic_xchg(>running, 0); > > + uatomic_set(>running, 0); > > holders = uatomic_sub_return(>holders, 1); > > if (!holders) > > cleanup_context(ct); > > - if (!running) > > - pause(); > > } > > > > static int tur_running(struct tur_checker_context *ct) > > @@ -266,6 +264,9 @@ static void *tur_thread(void *ctx) > > pthread_cond_signal(>active); > > pthread_mutex_unlock(>lock); > > > > + /* Tell main checker thread not to cancel us, as we exit > > anyway */ > > + uatomic_set(>running, 0); > > + > > condlog(3, "%s: tur checker finished, state %s", > > tur_devt(devt, sizeof(devt), ct), > > checker_state_name(state)); > > tur_thread_cleanup_pop(ct); > > -- > > 2.16.1 > > > > -- Dr. Martin Wilck , Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v2 1/7] libmultipath: fix tur checker locking
On Fri, 2018-02-09 at 18:17 -0600, Benjamin Marzinski wrote: > On Sat, Feb 10, 2018 at 12:36:05AM +0100, Martin Wilck wrote: > > On Sat, 2018-02-10 at 00:28 +0100, Martin Wilck wrote: > > > Maybe it's easier than we thought. Attached is a patch on top of > > > yours that I think might work, please have a look. > > > > > > > That one didn't even compile. This one is better. > > > > Martin > > So if we have this ordering > > - checker calls uatomic_xchg() which returns 1 and then gets > scheduled > - thread calls uatomic_set() and then runs till it terminates > - checker calls pthread_cancel() > > You will get Bart's original bug. Yes, I realized that overnight :-( I shouldn't post stuff like this around midnight. But I have another idea in my mind. Martin > I realize that having the condlog() > after the uatomic_set() in the thread makes this unlikely, but I > don't > races like this. I would be happier with simply taking the original > code > and moving the condlog(), if neither of my other two options are > acceptable. > > -Ben > > > > > -- > > Dr. Martin Wilck, Tel. +49 (0)911 74053 2107 > > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham > > Norton > > HRB 21284 (AG Nürnberg) > > From afb9c7de3658d49c4f28f6b9ee618a87b806ecdd Mon Sep 17 00:00:00 > > 2001 > > From: Martin Wilck > > Date: Sat, 10 Feb 2018 00:22:17 +0100 > > Subject: [PATCH] tur checker: make sure pthread_cancel isn't called > > for exited > > thread > > > > If we enter the cleanup function as the result of a pthread_cancel > > by another > > thread, we don't need to wait for a cancellation any more. If we > > exit > > regularly, just tell the other thread not to try to cancel us. > > --- > > libmultipath/checkers/tur.c | 9 + > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > diff --git a/libmultipath/checkers/tur.c > > b/libmultipath/checkers/tur.c > > index 894ad41c89c3..5d2b36bfa883 100644 > > --- a/libmultipath/checkers/tur.c > > +++ b/libmultipath/checkers/tur.c > > @@ -214,15 +214,13 @@ retry: > > > > static void cleanup_func(void *data) > > { > > - int running, holders; > > + int holders; > > struct tur_checker_context *ct = data; > > > > - running = uatomic_xchg(>running, 0); > > + uatomic_set(>running, 0); > > holders = uatomic_sub_return(>holders, 1); > > if (!holders) > > cleanup_context(ct); > > - if (!running) > > - pause(); > > } > > > > static int tur_running(struct tur_checker_context *ct) > > @@ -266,6 +264,9 @@ static void *tur_thread(void *ctx) > > pthread_cond_signal(>active); > > pthread_mutex_unlock(>lock); > > > > + /* Tell main checker thread not to cancel us, as we exit > > anyway */ > > + uatomic_set(>running, 0); > > + > > condlog(3, "%s: tur checker finished, state %s", > > tur_devt(devt, sizeof(devt), ct), > > checker_state_name(state)); > > tur_thread_cleanup_pop(ct); > > -- > > 2.16.1 > > > > -- Dr. Martin Wilck , Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v2 1/7] libmultipath: fix tur checker locking
On Sat, Feb 10, 2018 at 12:36:05AM +0100, Martin Wilck wrote: > On Sat, 2018-02-10 at 00:28 +0100, Martin Wilck wrote: > > Maybe it's easier than we thought. Attached is a patch on top of > > yours that I think might work, please have a look. > > > > That one didn't even compile. This one is better. > > Martin How about this one instead. The idea is that once we are in the cleanup handler, we just cleanup and exit. But before we enter it, we atomically exchange running, and if running was 0, we pause(), since the checker is either about to cancel us, or already has. diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c index 894ad41..3774a17 100644 --- a/libmultipath/checkers/tur.c +++ b/libmultipath/checkers/tur.c @@ -214,15 +214,12 @@ retry: static void cleanup_func(void *data) { - int running, holders; + int holders; struct tur_checker_context *ct = data; - running = uatomic_xchg(>running, 0); holders = uatomic_sub_return(>holders, 1); if (!holders) cleanup_context(ct); - if (!running) - pause(); } static int tur_running(struct tur_checker_context *ct) @@ -242,7 +239,7 @@ static void copy_msg_to_tcc(void *ct_p, const char *msg) static void *tur_thread(void *ctx) { struct tur_checker_context *ct = ctx; - int state; + int state, running; char devt[32]; condlog(3, "%s: tur checker starting up", @@ -268,6 +265,11 @@ static void *tur_thread(void *ctx) condlog(3, "%s: tur checker finished, state %s", tur_devt(devt, sizeof(devt), ct), checker_state_name(state)); + + running = uatomic_xchg(>running, 0); + if (!running) + pause(); + tur_thread_cleanup_pop(ct); return ((void *)0); > > -- > Dr. Martin Wilck, Tel. +49 (0)911 74053 2107 > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) > From afb9c7de3658d49c4f28f6b9ee618a87b806ecdd Mon Sep 17 00:00:00 2001 > From: Martin Wilck > Date: Sat, 10 Feb 2018 00:22:17 +0100 > Subject: [PATCH] tur checker: make sure pthread_cancel isn't called for exited > thread > > If we enter the cleanup function as the result of a pthread_cancel by another > thread, we don't need to wait for a cancellation any more. If we exit > regularly, just tell the other thread not to try to cancel us. > --- > libmultipath/checkers/tur.c | 9 + > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c > index 894ad41c89c3..5d2b36bfa883 100644 > --- a/libmultipath/checkers/tur.c > +++ b/libmultipath/checkers/tur.c > @@ -214,15 +214,13 @@ retry: > > static void cleanup_func(void *data) > { > - int running, holders; > + int holders; > struct tur_checker_context *ct = data; > > - running = uatomic_xchg(>running, 0); > + uatomic_set(>running, 0); > holders = uatomic_sub_return(>holders, 1); > if (!holders) > cleanup_context(ct); > - if (!running) > - pause(); > } > > static int tur_running(struct tur_checker_context *ct) > @@ -266,6 +264,9 @@ static void *tur_thread(void *ctx) > pthread_cond_signal(>active); > pthread_mutex_unlock(>lock); > > + /* Tell main checker thread not to cancel us, as we exit anyway */ > + uatomic_set(>running, 0); > + > condlog(3, "%s: tur checker finished, state %s", > tur_devt(devt, sizeof(devt), ct), checker_state_name(state)); > tur_thread_cleanup_pop(ct); > -- > 2.16.1 > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v2 1/7] libmultipath: fix tur checker locking
On Sat, Feb 10, 2018 at 12:36:05AM +0100, Martin Wilck wrote: > On Sat, 2018-02-10 at 00:28 +0100, Martin Wilck wrote: > > Maybe it's easier than we thought. Attached is a patch on top of > > yours that I think might work, please have a look. > > > > That one didn't even compile. This one is better. > > Martin So if we have this ordering - checker calls uatomic_xchg() which returns 1 and then gets scheduled - thread calls uatomic_set() and then runs till it terminates - checker calls pthread_cancel() You will get Bart's original bug. I realize that having the condlog() after the uatomic_set() in the thread makes this unlikely, but I don't races like this. I would be happier with simply taking the original code and moving the condlog(), if neither of my other two options are acceptable. -Ben > > -- > Dr. Martin Wilck, Tel. +49 (0)911 74053 2107 > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) > From afb9c7de3658d49c4f28f6b9ee618a87b806ecdd Mon Sep 17 00:00:00 2001 > From: Martin Wilck > Date: Sat, 10 Feb 2018 00:22:17 +0100 > Subject: [PATCH] tur checker: make sure pthread_cancel isn't called for exited > thread > > If we enter the cleanup function as the result of a pthread_cancel by another > thread, we don't need to wait for a cancellation any more. If we exit > regularly, just tell the other thread not to try to cancel us. > --- > libmultipath/checkers/tur.c | 9 + > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c > index 894ad41c89c3..5d2b36bfa883 100644 > --- a/libmultipath/checkers/tur.c > +++ b/libmultipath/checkers/tur.c > @@ -214,15 +214,13 @@ retry: > > static void cleanup_func(void *data) > { > - int running, holders; > + int holders; > struct tur_checker_context *ct = data; > > - running = uatomic_xchg(>running, 0); > + uatomic_set(>running, 0); > holders = uatomic_sub_return(>holders, 1); > if (!holders) > cleanup_context(ct); > - if (!running) > - pause(); > } > > static int tur_running(struct tur_checker_context *ct) > @@ -266,6 +264,9 @@ static void *tur_thread(void *ctx) > pthread_cond_signal(>active); > pthread_mutex_unlock(>lock); > > + /* Tell main checker thread not to cancel us, as we exit anyway */ > + uatomic_set(>running, 0); > + > condlog(3, "%s: tur checker finished, state %s", > tur_devt(devt, sizeof(devt), ct), checker_state_name(state)); > tur_thread_cleanup_pop(ct); > -- > 2.16.1 > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v2 1/7] libmultipath: fix tur checker locking
On Fri, 2018-02-09 at 17:04 -0600, Benjamin Marzinski wrote: > On Fri, Feb 09, 2018 at 09:30:56PM +0100, Martin Wilck wrote: > > On Thu, 2018-02-08 at 17:56 -0600, Benjamin Marzinski wrote: > > > ct->running is now an atomic variable. When the thread is > > > started > > > it is set to 1. When the checker wants to kill a thread, it > > > atomically > > > sets the value to 0 and reads the previous value. If it was 1, > > > the checker cancels the thread. If it was 0, the nothing needs to > > > be > > > done. After the checker has dealt with the thread, it sets ct- > > > > thread > > > > > > to NULL. > > > > > > When the thread is done, it atomicalllys sets the value of ct- > > > > running > > > > > > to 0 and reads the previous value. If it was 1, the thread just > > > exits. > > > If it was 0, then the checker is trying to cancel the thread, and > > > so > > > the thread calls pause(), which is a cancellation point. > > > > > > > I'm missing one thing here. My poor brain is aching. > > > > cleanup_func() can be entered in two ways: a) if the thread has > > been > > cancelled and passed a cancellation point already, or b) if it > > exits > > normally and calls pthread_cleanup_pop(). > > In case b), waiting for the cancellation request by calling pause() > > makes sense to me. But in case a), the thread has already seen the > > cancellation request - wouldn't calling pause() cause it to sleep > > forever? > > Urgh. You're right. If it is in the cleanup helper because it already > has been cancelled, then the pause isn't going get cancelled. So much > for my quick rewrite. Maybe it's easier than we thought. Attached is a patch on top of yours that I think might work, please have a look. It's quite late here, so I'll need to ponder your alternatives below the other day. Cheers Martin > > That leaves three options. > > 1. have either the thread or the checker detach the thread (depending >on which one exits first) > 2. make the checker always cancel and detach the thread. This > simplifies >the code, but there will zombie threads hanging around between > calls >to the checker. > 3. just move the condlog > > I really don't care which one we pick anymore. > > -Ben > > > > > Martin > > > > -- > > Dr. Martin Wilck, Tel. +49 (0)911 74053 2107 > > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham > > Norton > > HRB 21284 (AG Nürnberg) > > -- Dr. Martin Wilck , Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) From 831ef27b41858fa248201b74f2dd8ea5b7c4aece Mon Sep 17 00:00:00 2001 From: Martin Wilck Date: Sat, 10 Feb 2018 00:22:17 +0100 Subject: [PATCH] tur checker: make sure pthread_cancel isn't called for exited thread If we enter the cleanup function as the result of a pthread_cancel by another thread, we don't need to wait for a cancellation any more. If we exit regularly, just tell the other thread not to try to cancel us. --- libmultipath/checkers/tur.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c index 894ad41c89c3..31a87d2b5cf2 100644 --- a/libmultipath/checkers/tur.c +++ b/libmultipath/checkers/tur.c @@ -221,8 +221,6 @@ static void cleanup_func(void *data) holders = uatomic_sub_return(>holders, 1); if (!holders) cleanup_context(ct); - if (!running) - pause(); } static int tur_running(struct tur_checker_context *ct) @@ -266,6 +264,9 @@ static void *tur_thread(void *ctx) pthread_cond_signal(>active); pthread_mutex_unlock(>lock); + /* Tell main checker thread not to cancel us, as we exit anyway */ + running = uatomic_xchg(>running, 0); + condlog(3, "%s: tur checker finished, state %s", tur_devt(devt, sizeof(devt), ct), checker_state_name(state)); tur_thread_cleanup_pop(ct); -- 2.16.1 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v2 1/7] libmultipath: fix tur checker locking
On Sat, 2018-02-10 at 00:28 +0100, Martin Wilck wrote: > Maybe it's easier than we thought. Attached is a patch on top of > yours that I think might work, please have a look. > That one didn't even compile. This one is better. Martin -- Dr. Martin Wilck, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) From afb9c7de3658d49c4f28f6b9ee618a87b806ecdd Mon Sep 17 00:00:00 2001 From: Martin Wilck Date: Sat, 10 Feb 2018 00:22:17 +0100 Subject: [PATCH] tur checker: make sure pthread_cancel isn't called for exited thread If we enter the cleanup function as the result of a pthread_cancel by another thread, we don't need to wait for a cancellation any more. If we exit regularly, just tell the other thread not to try to cancel us. --- libmultipath/checkers/tur.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c index 894ad41c89c3..5d2b36bfa883 100644 --- a/libmultipath/checkers/tur.c +++ b/libmultipath/checkers/tur.c @@ -214,15 +214,13 @@ retry: static void cleanup_func(void *data) { - int running, holders; + int holders; struct tur_checker_context *ct = data; - running = uatomic_xchg(>running, 0); + uatomic_set(>running, 0); holders = uatomic_sub_return(>holders, 1); if (!holders) cleanup_context(ct); - if (!running) - pause(); } static int tur_running(struct tur_checker_context *ct) @@ -266,6 +264,9 @@ static void *tur_thread(void *ctx) pthread_cond_signal(>active); pthread_mutex_unlock(>lock); + /* Tell main checker thread not to cancel us, as we exit anyway */ + uatomic_set(>running, 0); + condlog(3, "%s: tur checker finished, state %s", tur_devt(devt, sizeof(devt), ct), checker_state_name(state)); tur_thread_cleanup_pop(ct); -- 2.16.1 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v2 1/7] libmultipath: fix tur checker locking
On Fri, Feb 09, 2018 at 09:30:56PM +0100, Martin Wilck wrote: > On Thu, 2018-02-08 at 17:56 -0600, Benjamin Marzinski wrote: > > ct->running is now an atomic variable. When the thread is started > > it is set to 1. When the checker wants to kill a thread, it > > atomically > > sets the value to 0 and reads the previous value. If it was 1, > > the checker cancels the thread. If it was 0, the nothing needs to be > > done. After the checker has dealt with the thread, it sets ct- > > >thread > > to NULL. > > > > When the thread is done, it atomicalllys sets the value of ct- > > >running > > to 0 and reads the previous value. If it was 1, the thread just > > exits. > > If it was 0, then the checker is trying to cancel the thread, and so > > the thread calls pause(), which is a cancellation point. > > > > I'm missing one thing here. My poor brain is aching. > > cleanup_func() can be entered in two ways: a) if the thread has been > cancelled and passed a cancellation point already, or b) if it exits > normally and calls pthread_cleanup_pop(). > In case b), waiting for the cancellation request by calling pause() > makes sense to me. But in case a), the thread has already seen the > cancellation request - wouldn't calling pause() cause it to sleep > forever? Urgh. You're right. If it is in the cleanup helper because it already has been cancelled, then the pause isn't going get cancelled. So much for my quick rewrite. That leaves three options. 1. have either the thread or the checker detach the thread (depending on which one exits first) 2. make the checker always cancel and detach the thread. This simplifies the code, but there will zombie threads hanging around between calls to the checker. 3. just move the condlog I really don't care which one we pick anymore. -Ben > > Martin > > -- > Dr. Martin Wilck, Tel. +49 (0)911 74053 2107 > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v2 1/7] libmultipath: fix tur checker locking
On Thu, 2018-02-08 at 17:56 -0600, Benjamin Marzinski wrote: > ct->running is now an atomic variable. When the thread is started > it is set to 1. When the checker wants to kill a thread, it > atomically > sets the value to 0 and reads the previous value. If it was 1, > the checker cancels the thread. If it was 0, the nothing needs to be > done. After the checker has dealt with the thread, it sets ct- > >thread > to NULL. > > When the thread is done, it atomicalllys sets the value of ct- > >running > to 0 and reads the previous value. If it was 1, the thread just > exits. > If it was 0, then the checker is trying to cancel the thread, and so > the thread calls pause(), which is a cancellation point. > I'm missing one thing here. My poor brain is aching. cleanup_func() can be entered in two ways: a) if the thread has been cancelled and passed a cancellation point already, or b) if it exits normally and calls pthread_cleanup_pop(). In case b), waiting for the cancellation request by calling pause() makes sense to me. But in case a), the thread has already seen the cancellation request - wouldn't calling pause() cause it to sleep forever? Martin -- Dr. Martin Wilck, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v2 1/7] libmultipath: fix tur checker locking
On Fri, 2018-02-09 at 11:26 -0600, Benjamin Marzinski wrote: > On Fri, Feb 09, 2018 at 04:15:34PM +, Bart Van Assche wrote: > > On Thu, 2018-02-08 at 17:56 -0600, Benjamin Marzinski wrote: > > > static void cleanup_func(void *data) > > > { > > > - int holders; > > > + int running, holders; > > > struct tur_checker_context *ct = data; > > > - pthread_spin_lock(>hldr_lock); > > > - ct->holders--; > > > - holders = ct->holders; > > > - ct->thread = 0; > > > - pthread_spin_unlock(>hldr_lock); > > > + > > > + running = uatomic_xchg(>running, 0); > > > + holders = uatomic_sub_return(>holders, 1); > > > if (!holders) > > > cleanup_context(ct); > > > + if (!running) > > > + pause(); > > > } > > > > Hello Ben, > > > > Why has the pause() call been added? I think it is safe to call > > pthread_cancel() > > for a non-detached thread that has finished so I don't think that pause() > > call > > is necessary. > > Martin objected to having the threads getting detached as part of > cancelling them (I think. I'm a little fuzzy on what he didn't like). > But he definitely said he preferred the thread to start detached, so in > this version, it does. That's why we need the pause(). If he's fine with > the threads getting detached later, I will happily replace the pause() > with > > if (running) > pthread_detach(pthread_self()); > > and add pthread_detach(ct->thread) after the calls to > pthread_cancel(ct->thread). Otherwise we need the pause() to solve your > original bug. Ah, thanks, I had overlooked that the tur checker detaches the checker thread. Have you considered to add a comment above the pause() call that explains the purpose of that call? Thanks, Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v2 1/7] libmultipath: fix tur checker locking
On Fri, Feb 09, 2018 at 04:15:34PM +, Bart Van Assche wrote: > On Thu, 2018-02-08 at 17:56 -0600, Benjamin Marzinski wrote: > > static void cleanup_func(void *data) > > { > > - int holders; > > + int running, holders; > > struct tur_checker_context *ct = data; > > - pthread_spin_lock(>hldr_lock); > > - ct->holders--; > > - holders = ct->holders; > > - ct->thread = 0; > > - pthread_spin_unlock(>hldr_lock); > > + > > + running = uatomic_xchg(>running, 0); > > + holders = uatomic_sub_return(>holders, 1); > > if (!holders) > > cleanup_context(ct); > > + if (!running) > > + pause(); > > } > > Hello Ben, > > Why has the pause() call been added? I think it is safe to call > pthread_cancel() > for a non-detached thread that has finished so I don't think that pause() call > is necessary. Martin objected to having the threads getting detached as part of cancelling them (I think. I'm a little fuzzy on what he didn't like). But he definitely said he preferred the thread to start detached, so in this version, it does. That's why we need the pause(). If he's fine with the threads getting detached later, I will happily replace the pause() with if (running) pthread_detach(pthread_self()); and add pthread_detach(ct->thread) after the calls to pthread_cancel(ct->thread). Otherwise we need the pause() to solve your original bug. As an aside, Martin, if your problem is the thread detaching itself, we can skip that if we are fine with a zombie thread hanging around until the next time we call libcheck_check() or libcheck_free(). Then the checker can always be in charge of detaching the thread. > > static int tur_running(struct tur_checker_context *ct) > > { > > - pthread_t thread; > > - > > - pthread_spin_lock(>hldr_lock); > > - thread = ct->thread; > > - pthread_spin_unlock(>hldr_lock); > > - > > - return thread != 0; > > + return (uatomic_read(>running) != 0); > > } > > Is such a one line function really useful? Nope. I just left it there to keep the number of changes that the patch makes lower, to make it more straightforward to review. I'm fine will inlining it. > I think the code will be easier to read if this function is inlined > into its callers. > > @@ -418,8 +396,12 @@ int libcheck_check(struct checker * c) > > (tur_status == PATH_PENDING || tur_status == > > PATH_UNCHECKED)) { > > condlog(3, "%s: tur checker still running", > > tur_devt(devt, sizeof(devt), ct)); > > - ct->running = 1; > > tur_status = PATH_PENDING; > > + } else { > > + int running = uatomic_xchg(>running, 0); > > + if (running) > > + pthread_cancel(ct->thread); > > + ct->thread = 0; > > } > > } > > Why has this pthread_cancel() call been added? I think that else clause can > only be > reached if ct->running == 0 so I don't think that the pthread_cancel() call > will ever > be reached. It can be reached if ct->running is 1, as long as tur_status has been updated. In practice this means that the thread should have done everything it needs to do, and all that's left is for it to shutdown. However, if the thread doesn't shut itself down before the next time you call libcheck_check(), the checker will give up and return PATH_TIMEOUT. It seems pretty unlikely that this will happen, since there should be a significant delay before calling libcheck_check() again. This theoretical race has been in the code for a while, and AFAIK, it's never occured. But there definitely is the possiblity that the thread will still be running at the end of libcheck_check(), and it doesn't hurt things to forceably shut the thread down, if it is. -Ben > Thanks, > > Bart. > > > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v2 1/7] libmultipath: fix tur checker locking
On Thu, 2018-02-08 at 17:56 -0600, Benjamin Marzinski wrote: > static void cleanup_func(void *data) > { > - int holders; > + int running, holders; > struct tur_checker_context *ct = data; > - pthread_spin_lock(>hldr_lock); > - ct->holders--; > - holders = ct->holders; > - ct->thread = 0; > - pthread_spin_unlock(>hldr_lock); > + > + running = uatomic_xchg(>running, 0); > + holders = uatomic_sub_return(>holders, 1); > if (!holders) > cleanup_context(ct); > + if (!running) > + pause(); > } Hello Ben, Why has the pause() call been added? I think it is safe to call pthread_cancel() for a non-detached thread that has finished so I don't think that pause() call is necessary. > static int tur_running(struct tur_checker_context *ct) > { > - pthread_t thread; > - > - pthread_spin_lock(>hldr_lock); > - thread = ct->thread; > - pthread_spin_unlock(>hldr_lock); > - > - return thread != 0; > + return (uatomic_read(>running) != 0); > } Is such a one line function really useful? I think the code will be easier to read if this function is inlined into its callers. > @@ -418,8 +396,12 @@ int libcheck_check(struct checker * c) > (tur_status == PATH_PENDING || tur_status == > PATH_UNCHECKED)) { > condlog(3, "%s: tur checker still running", > tur_devt(devt, sizeof(devt), ct)); > - ct->running = 1; > tur_status = PATH_PENDING; > + } else { > + int running = uatomic_xchg(>running, 0); > + if (running) > + pthread_cancel(ct->thread); > + ct->thread = 0; > } > } Why has this pthread_cancel() call been added? I think that else clause can only be reached if ct->running == 0 so I don't think that the pthread_cancel() call will ever be reached. Thanks, Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel