Hi Lennart,

I have published v2 of the patch with following changes:
1)Added a cautionary Note in the header of saNtfFinalize().
Commit log has enough details, I think, about the problem and solution 
and it will get committed with the fix, so a separate README is not needed.
2)Added two test cases for saNtfFinalize().

Thanks,
Praveen

On 13-Oct-15 2:13 PM, Lennart Lund wrote:
> Comments:
> 1.
> The explanation indicates that the problem is not very easy to understand. In 
> order to save future maintainers some work the explanation below can be added 
> to for example the README file.
> In the saNtfFinalize() you can add a comment saying that the mutex must not 
> be locked and a reference to the README file can be given.
>
> 2.
> Add test-case(s) in the UML test suite. Maybe the test code attached to the 
> ticket can  be used.
>
> Thanks
> Lennart
>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]]
>> Sent: den 12 oktober 2015 13:23
>> To: Lennart Lund; Minh Chau H
>> Cc: [email protected]
>> Subject: [PATCH 1 of 1] ntfa: fix deadlock situation due to saNtfFinalize
>> [#1521]
>>
>>   osaf/libs/agents/saf/ntfa/ntfa_api.c |  4 ++--
>>   1 files changed, 2 insertions(+), 2 deletions(-)
>>
>>
>> Generally any API called after finalize returns with BAD_HANDLE.
>> But in a multi-threaded application, deadlock situation may arise between
>> saNtfFinalize()
>> with saNtfNotificationUnsubscribe(),
>> or with saNtfStateChangeNotificationAllocate(),
>> or with and saNtfPtrValAllocate().
>>
>> After getting successful response from the server, saNtfFinalize() takes lock
>> on ntfa_cb.cb_lock
>> and tries to free all the handles and resources. Here it gets block in
>> ntfa_hdl_rec_del()
>> ---->ncshm_destroy_hdl() (actually in in ncshm_destroy_hdl()---
>>> hm_block_me())) as client_handle
>> is in use by other threads and waits for it to get released. At the same 
>> time,
>> one of the above three APIS may be waiting for taking the lock on
>> ntfa_cb.cb_lock which is
>> not avialable as it is taken by saNtfFinalize().
>>
>>  From the solution perspective, the LEAP API where santfFinalize() gets
>> blocked wants other threads to
>> complete thier functionality and release the handle. In this situation, new 
>> API
>> calls
>> will return with BAD_HANDLE (LEAP functionality take_handle() fails) and
>> existing API calls which has
>> already taken the client_handle will be able to give back client_handle (LEAP
>> functionality
>> give_handle()). So if the lock is removed from Finalize(), other APIS will be
>> able to complete their
>> call and will return handle one by one which eventually will unblock
>> saNtfFinalize().
>>
>> diff --git a/osaf/libs/agents/saf/ntfa/ntfa_api.c
>> b/osaf/libs/agents/saf/ntfa/ntfa_api.c
>> --- a/osaf/libs/agents/saf/ntfa/ntfa_api.c
>> +++ b/osaf/libs/agents/saf/ntfa/ntfa_api.c
>> @@ -1187,7 +1187,6 @@ SaAisErrorT saNtfFinalize(SaNtfHandleT n
>>              rc = SA_AIS_ERR_NO_RESOURCES;
>>
>>      if (rc == SA_AIS_OK) {
>> -
>>      osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0);
>>
>>      /** delete the hdl rec
>>            ** including all resources allocated by client if MDS send is
>>            ** succesful.
>> @@ -1197,7 +1196,6 @@ SaAisErrorT saNtfFinalize(SaNtfHandleT n
>>                      TRACE_1("ntfa_hdl_rec_del
>> failed");
>>                      rc = SA_AIS_ERR_BAD_HANDLE;
>>              }
>> -
>>      osafassert(pthread_mutex_unlock(&ntfa_cb.cb_lock) == 0);
>>      }
>>
>>    done_give_hdl:
>> @@ -2739,6 +2737,7 @@ SaAisErrorT saNtfNotificationUnsubscribe
>>
>>      ntfHandle = ntfHandleGet(subscriptionId);
>>      if (ntfHandle == 0) {
>> +            TRACE_1("ntfHandleGet failed, subscription not
>> exist");
>>              rc = SA_AIS_ERR_NOT_EXIST;
>>              goto done;
>>      }
>> @@ -2772,6 +2771,7 @@ SaAisErrorT saNtfNotificationUnsubscribe
>>              /* Send a sync MDS message to obtain a log
>> stream id */
>>              rc = ntfa_mds_msg_sync_send(&ntfa_cb,
>> &msg, &o_msg, timeout);
>>              if (rc != NCSCC_RC_SUCCESS) {
>> +
>>      TRACE_1("ntfa_mds_msg_sync_send failed with: %u",rc);
>>                      rc = SA_AIS_ERR_TRY_AGAIN;
>>                      goto done_give_hdl;
>>              }

------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to