In the scenario of shutting down SC while SC switchover is on going,
ntfd coredump is generated due to failure of pthread_mutex_destroy()
with errorcode:16(EBUSY). That means the mutex had been taken and
was not unlocked at the time phtread_mutex_destroy() is called.
One solution is adding mutex protection for pthread_cancel() so that
there's no cancellation request if cnsurvail_thread() is taking mutex,
or cnsurvail_thread() can not take mutex if the thread cancellation
request is issued. That also needs the cnsurvail_thread to have the
cancellation type as ASYNCHORNOUS. Otherwise the same coredump issue
still occurs since the cancellation request is deffered (cancellation
type as PTHREAD_CANCEL_DEFERRED set by default)
---
src/ntf/ntfd/ntfs_imcnutil.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/src/ntf/ntfd/ntfs_imcnutil.c b/src/ntf/ntfd/ntfs_imcnutil.c
index dd27a255c..672a3910e 100644
--- a/src/ntf/ntfd/ntfs_imcnutil.c
+++ b/src/ntf/ntfd/ntfs_imcnutil.c
@@ -238,6 +238,10 @@ static void *cnsurvail_thread(void *_init_params)
TRACE_ENTER();
+ pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &status);
+ pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &status);
+ TRACE("old cancellation type: %d", status);
+
while (1) {
osaf_mutex_lock_ordie(&ntfimcn_mutex);
pid = create_imcnprocess(ipar->ha_state);
@@ -344,7 +348,9 @@ int stop_ntfimcn(void)
if (ipar.ha_state != 0) {
TRACE("%s: Cancel the imcn surveillance thread", __FUNCTION__);
+ osaf_mutex_lock_ordie(&ntfimcn_mutex);
rc = pthread_cancel(ipar.thread);
+ osaf_mutex_unlock_ordie(&ntfimcn_mutex);
if (rc != 0)
osaf_abort(rc);
rc = pthread_join(ipar.thread, &join_ret);
--
2.11.0
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel