Hi Minh

This solution will probably work quite well but there is still a minor risk 
that the mutex will be taken before the thread is cancelled and there is also a 
risk (see Note2)
"Asynchronous cancelability means that the thread can be canceled at any time 
(usually immediately, but the system does not guarantee this)"

Instead:
In the thread a cancellation function that unlocks the mutex can be pushed just 
after the mutex is locked. This function takes the mutex as in-parameter and 
its only function is to unlock the mutex. Instead of unlocking the mutex by 
calling osaf_mutex_unlock_ordie(&ntfimcn_mutex) as in the current code the 
cancellation function is popped and executed (it is unlocking the mutex) See 
pthread_cleanup_push() and pthread_cleanup_pop(). With this solution the 
cancellation type should not be changed to ASYNCHORNOUS it shall remain 
PTHREAD_CANCEL_DEFERRED. This will guarantee that the thread will never be 
terminated with the mutex locked.
Note1: Also with this solution the mutex shall be taken in the main thread 
before rc = pthread_cancel(ipar.thread); and released just after.

Note2: I found that changing to cancellation type ASYNCHORNOUS is not safe 
because only a few are async-cancel-safe and for example waitpid() is not one 
of the safe fuctions!

Thanks
Lennart

> -----Original Message-----
> From: Minh Chau [mailto:[email protected]]
> Sent: den 22 juni 2017 06:14
> To: Lennart Lund <[email protected]>;
> [email protected]
> Cc: [email protected]; Minh Hon Chau
> <[email protected]>
> Subject: [PATCH 1/1] ntfd: Ensure mutex is not taken after cnsurvail_thread
> is canceled [#2508]
> 
> In the scenario of shutting down SC while SC switchover is on going,
> ntfd coredump is generated due to failure of pthread_mutex_destroy()
> with errorcode:16(EBUSY). That means the mutex had been taken and
> was not unlocked at the time phtread_mutex_destroy() is called.
> 
> One solution is adding mutex protection for pthread_cancel() so that
> there's no cancellation request if cnsurvail_thread() is taking mutex,
> or cnsurvail_thread() can not take mutex if the thread cancellation
> request is issued. That also needs the cnsurvail_thread to have the
> cancellation type as ASYNCHORNOUS. Otherwise the same coredump issue
> still occurs since the cancellation request is deffered (cancellation
> type as PTHREAD_CANCEL_DEFERRED set by default)
> ---
>  src/ntf/ntfd/ntfs_imcnutil.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/src/ntf/ntfd/ntfs_imcnutil.c b/src/ntf/ntfd/ntfs_imcnutil.c
> index dd27a255c..672a3910e 100644
> --- a/src/ntf/ntfd/ntfs_imcnutil.c
> +++ b/src/ntf/ntfd/ntfs_imcnutil.c
> @@ -238,6 +238,10 @@ static void *cnsurvail_thread(void *_init_params)
> 
>       TRACE_ENTER();
> 
> +     pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &status);
> +     pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS,
> &status);
> +     TRACE("old cancellation type: %d", status);
> +
>       while (1) {
>               osaf_mutex_lock_ordie(&ntfimcn_mutex);
>               pid = create_imcnprocess(ipar->ha_state);
> @@ -344,7 +348,9 @@ int stop_ntfimcn(void)
> 
>       if (ipar.ha_state != 0) {
>               TRACE("%s: Cancel the imcn surveillance thread",
> __FUNCTION__);
> +             osaf_mutex_lock_ordie(&ntfimcn_mutex);
>               rc = pthread_cancel(ipar.thread);
> +             osaf_mutex_unlock_ordie(&ntfimcn_mutex);
>               if (rc != 0)
>                       osaf_abort(rc);
>               rc = pthread_join(ipar.thread, &join_ret);
> --
> 2.11.0


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to