Hi Minh This solution will probably work quite well but there is still a minor risk that the mutex will be taken before the thread is cancelled and there is also a risk (see Note2) "Asynchronous cancelability means that the thread can be canceled at any time (usually immediately, but the system does not guarantee this)"
Instead: In the thread a cancellation function that unlocks the mutex can be pushed just after the mutex is locked. This function takes the mutex as in-parameter and its only function is to unlock the mutex. Instead of unlocking the mutex by calling osaf_mutex_unlock_ordie(&ntfimcn_mutex) as in the current code the cancellation function is popped and executed (it is unlocking the mutex) See pthread_cleanup_push() and pthread_cleanup_pop(). With this solution the cancellation type should not be changed to ASYNCHORNOUS it shall remain PTHREAD_CANCEL_DEFERRED. This will guarantee that the thread will never be terminated with the mutex locked. Note1: Also with this solution the mutex shall be taken in the main thread before rc = pthread_cancel(ipar.thread); and released just after. Note2: I found that changing to cancellation type ASYNCHORNOUS is not safe because only a few are async-cancel-safe and for example waitpid() is not one of the safe fuctions! Thanks Lennart > -----Original Message----- > From: Minh Chau [mailto:[email protected]] > Sent: den 22 juni 2017 06:14 > To: Lennart Lund <[email protected]>; > [email protected] > Cc: [email protected]; Minh Hon Chau > <[email protected]> > Subject: [PATCH 1/1] ntfd: Ensure mutex is not taken after cnsurvail_thread > is canceled [#2508] > > In the scenario of shutting down SC while SC switchover is on going, > ntfd coredump is generated due to failure of pthread_mutex_destroy() > with errorcode:16(EBUSY). That means the mutex had been taken and > was not unlocked at the time phtread_mutex_destroy() is called. > > One solution is adding mutex protection for pthread_cancel() so that > there's no cancellation request if cnsurvail_thread() is taking mutex, > or cnsurvail_thread() can not take mutex if the thread cancellation > request is issued. That also needs the cnsurvail_thread to have the > cancellation type as ASYNCHORNOUS. Otherwise the same coredump issue > still occurs since the cancellation request is deffered (cancellation > type as PTHREAD_CANCEL_DEFERRED set by default) > --- > src/ntf/ntfd/ntfs_imcnutil.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/src/ntf/ntfd/ntfs_imcnutil.c b/src/ntf/ntfd/ntfs_imcnutil.c > index dd27a255c..672a3910e 100644 > --- a/src/ntf/ntfd/ntfs_imcnutil.c > +++ b/src/ntf/ntfd/ntfs_imcnutil.c > @@ -238,6 +238,10 @@ static void *cnsurvail_thread(void *_init_params) > > TRACE_ENTER(); > > + pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &status); > + pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, > &status); > + TRACE("old cancellation type: %d", status); > + > while (1) { > osaf_mutex_lock_ordie(&ntfimcn_mutex); > pid = create_imcnprocess(ipar->ha_state); > @@ -344,7 +348,9 @@ int stop_ntfimcn(void) > > if (ipar.ha_state != 0) { > TRACE("%s: Cancel the imcn surveillance thread", > __FUNCTION__); > + osaf_mutex_lock_ordie(&ntfimcn_mutex); > rc = pthread_cancel(ipar.thread); > + osaf_mutex_unlock_ordie(&ntfimcn_mutex); > if (rc != 0) > osaf_abort(rc); > rc = pthread_join(ipar.thread, &join_ret); > -- > 2.11.0 ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
