Comments:
1.
The explanation indicates that the problem is not very easy to understand. In 
order to save future maintainers some work the explanation below can be added 
to for example the README file.
In the saNtfFinalize() you can add a comment saying that the mutex must not be 
locked and a reference to the README file can be given.

2.
Add test-case(s) in the UML test suite. Maybe the test code attached to the 
ticket can  be used.

Thanks
Lennart

> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: den 12 oktober 2015 13:23
> To: Lennart Lund; Minh Chau H
> Cc: [email protected]
> Subject: [PATCH 1 of 1] ntfa: fix deadlock situation due to saNtfFinalize
> [#1521]
> 
>  osaf/libs/agents/saf/ntfa/ntfa_api.c |  4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> 
> Generally any API called after finalize returns with BAD_HANDLE.
> But in a multi-threaded application, deadlock situation may arise between
> saNtfFinalize()
> with saNtfNotificationUnsubscribe(),
> or with saNtfStateChangeNotificationAllocate(),
> or with and saNtfPtrValAllocate().
> 
> After getting successful response from the server, saNtfFinalize() takes lock
> on ntfa_cb.cb_lock
> and tries to free all the handles and resources. Here it gets block in
> ntfa_hdl_rec_del()
> ---->ncshm_destroy_hdl() (actually in in ncshm_destroy_hdl()---
> >hm_block_me())) as client_handle
> is in use by other threads and waits for it to get released. At the same time,
> one of the above three APIS may be waiting for taking the lock on
> ntfa_cb.cb_lock which is
> not avialable as it is taken by saNtfFinalize().
> 
> From the solution perspective, the LEAP API where santfFinalize() gets
> blocked wants other threads to
> complete thier functionality and release the handle. In this situation, new 
> API
> calls
> will return with BAD_HANDLE (LEAP functionality take_handle() fails) and
> existing API calls which has
> already taken the client_handle will be able to give back client_handle (LEAP
> functionality
> give_handle()). So if the lock is removed from Finalize(), other APIS will be
> able to complete their
> call and will return handle one by one which eventually will unblock
> saNtfFinalize().
> 
> diff --git a/osaf/libs/agents/saf/ntfa/ntfa_api.c
> b/osaf/libs/agents/saf/ntfa/ntfa_api.c
> --- a/osaf/libs/agents/saf/ntfa/ntfa_api.c
> +++ b/osaf/libs/agents/saf/ntfa/ntfa_api.c
> @@ -1187,7 +1187,6 @@ SaAisErrorT saNtfFinalize(SaNtfHandleT n
>               rc = SA_AIS_ERR_NO_RESOURCES;
> 
>       if (rc == SA_AIS_OK) {
> -
>       osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0);
> 
>       /** delete the hdl rec
>           ** including all resources allocated by client if MDS send is
>           ** succesful.
> @@ -1197,7 +1196,6 @@ SaAisErrorT saNtfFinalize(SaNtfHandleT n
>                       TRACE_1("ntfa_hdl_rec_del
> failed");
>                       rc = SA_AIS_ERR_BAD_HANDLE;
>               }
> -
>       osafassert(pthread_mutex_unlock(&ntfa_cb.cb_lock) == 0);
>       }
> 
>   done_give_hdl:
> @@ -2739,6 +2737,7 @@ SaAisErrorT saNtfNotificationUnsubscribe
> 
>       ntfHandle = ntfHandleGet(subscriptionId);
>       if (ntfHandle == 0) {
> +             TRACE_1("ntfHandleGet failed, subscription not
> exist");
>               rc = SA_AIS_ERR_NOT_EXIST;
>               goto done;
>       }
> @@ -2772,6 +2771,7 @@ SaAisErrorT saNtfNotificationUnsubscribe
>               /* Send a sync MDS message to obtain a log
> stream id */
>               rc = ntfa_mds_msg_sync_send(&ntfa_cb,
> &msg, &o_msg, timeout);
>               if (rc != NCSCC_RC_SUCCESS) {
> +
>       TRACE_1("ntfa_mds_msg_sync_send failed with: %u",rc);
>                       rc = SA_AIS_ERR_TRY_AGAIN;
>                       goto done_give_hdl;
>               }

------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to