On 13/04/16 15:43, praveen malviya wrote:
>
>
> On 12-Apr-16 10:24 PM, minh chau wrote:
>>
>>
>> On 12/04/16 21:49, praveen malviya wrote:
>>>
>>>
>>> On 12-Apr-16 3:56 PM, minh chau wrote:
>>>> Hi Praveen
>>>>
>>>> NTF server also accepts initialize request (and here it comes from
>>>> reinitializeClient() after headless) if NTF server has not initialized
>>>> with CLM.
>>>> So after headless, this situation will most likely happen. The 
>>>> recovery
>>>> would succeeds, but after that what if NTF server notifies the 
>>>> agent it
>>>> is not longer a member, could a subscriber be waiting for notification
>>>> while agent is not a member anymore?
>>>>
>>> There is only one event that can lead to this and that is OpenSAF stop
>>> on the node as admin operations are not available in headless state.
>>> But this is the limitation of whole headless solution in every service
>>> as there is no recovery of CLM status of client node at each director
>>> and also recovery of clients is being done very early at MDS up event
>>> of the service.
>>>
>> [Minh] Actually, in non-headless this situation also happens. When
>> client is subscribing for notification, lock a clm node. This client
>> will not be informed error code SA_AIS_ERR_UNAVAILABLE if its filter
>> does not match to any notifications. It has to wait until clm node is
>> unlocked and there is notification to come, so saNtfDispatch will return
>> SA_AIS_ERR_UNAVAILABLE. But if filter does not match, this client will
>> be waiting and can't finalize handle.
>> If this situation is solved in non-headless, the problem stated above in
>> headless should also be solved by the same solution.
>>
> [Praveen]Not only in NTFSv, same logic of waiting for an event to get 
> unblocked from poll() is valid for all the other services applications 
> also as all SAF services are integrated with CLMSv. I do not know 
> whether one should poll indefinitely or not and in case of finite poll 
> time what an application must do after poll times out.
>
> But I think, from SAF perspective still this cannot be classified as a 
> problem. The reason is any such application's life cycle is monitored 
> by AMF and AMF terminates such process as part of CLM node eviction. 
> Also CLM provides traker interface for this purpose only.
> At the same time, I have observed that for ERR_UNAVAILABLE AMF spec is 
> particularly more clear as it states on section 7.2.1 on page 243
> ================================
> However, there are a few special situations in which processes may 
> call Availability Management Framework API functions.
> • An Availability Management Framework API function is called by a 
> process nearly at the same time when the node exits the cluster and 
> the Availability Management Framework area server on the node has not 
> yet terminated the process.
> ..........
> =================================
> And for above mentioned cases AMF will return ERR_UNAVAILABLE.So it 
> seems ERR_UNAVAILABLE is meant for such special cases.So any 
> application must rely on its own subscription to CLMSv. Or Admin will 
> have to take care of this.
> I will check other SAF documents like Cprogramming doc and overview 
> doc if something in this context is mentioned.
[Minh] I think application can be purely NTF client only which does not 
have to initialize with AMF, or maybe I don't understand your idea.
Let's look at this example: Running subscriber with filter "ABC", lock 
CLM node, unlock CLM node again. Then some applications in cluster raise 
notification ABC.
With current implementation, this subscriber get notified 
ERR_UNAVAILABLE when notification ABC coming to its mailbox, thus it 
eventually lost this notification ABC.
But if NTF notified ERR_UNAVAILABLE after locking CLM node, this 
subscriber can earlier finalize its handle with NTF. It can wait by 
somehow until CLM node is unlocked again, or it can initialize CLMsv to 
know when a node becoming a member again. After unlock CLM as above 
example, this subscriber is ready to receive notification and when 
notification ABC comes, subscriber can receive it. And I guess this is 
the idea mentioned in NTF spec:

/"If the cluster node rejoins the cluster membership, processes 
executing on the cluster node will be able to reinitialize new library 
handles and use the entire set of Notification Service APIs that operate 
on these new handles; however, invocation of APIs that operate on 
handles acquired by any process before the cluster node left the 
membership will continue to fail with SA_AIS_ERR_UNAVAILABLE with the 
exception of saNtfFinalize(), which is used to free the library handles 
and all resources associated with these handles. Hence, it is 
recommended for the processes to finalize the library handles as soon as 
the processes detect that the cluster node left the membership."

/Thanks,
Miinh/
/
>
>
>> Another issue but not relate to this ticket, that ntftool does not
>> handle SA_AIS_ERR_UNAVAILABLE. I get ntfsubscriber indefinite loop in
>> calling saNtfDispatch() when ntfsubscriber receives 
>> SA_AIS_ERR_UNAVAILABLE.
>>
> [Praveen]I will fix this as a part of #1745.
>
>
> Thanks,
> Praveen
>> Thanks,
>> Minh
>>>
>>> Thanks,
>>> Praveen
>>>> Thanks,
>>>> Minh
>>>>
>>>> On 11/04/16 15:46, praveen.malv...@oracle.com wrote:
>>>>> osaf/libs/agents/saf/ntfa/ntfa_api.c |  28
>>>>> ++++++++++++++++++----------
>>>>>   1 files changed, 18 insertions(+), 10 deletions(-)
>>>>>
>>>>>
>>>>> During headless state, OpenSAF may get stopped on payload with NTF 
>>>>> app
>>>>> running.
>>>>> Since OpenSAF is not running on the payload, any A.01.02 NTF client
>>>>> should not be served on
>>>>> this node and this client should not be recovered. After first
>>>>> controller comes up, A.01.02
>>>>> client will not be recovered and application will get
>>>>> SA_AIS_ERR_UNAVAILABLE upon which an
>>>>> app can call saNtfFinalize() for freeing the resources.
>>>>>
>>>>> diff --git a/osaf/libs/agents/saf/ntfa/ntfa_api.c
>>>>> b/osaf/libs/agents/saf/ntfa/ntfa_api.c
>>>>> --- a/osaf/libs/agents/saf/ntfa/ntfa_api.c
>>>>> +++ b/osaf/libs/agents/saf/ntfa/ntfa_api.c
>>>>> @@ -966,7 +966,8 @@ SaAisErrorT reinitializeClient(ntfa_clie
>>>>>       }
>>>>>       if ((rc = o_msg->info.api_resp_info.rc) != SA_AIS_OK) {
>>>>>           TRACE("info.api_resp_info.rc:%u",
>>>>> o_msg->info.api_resp_info.rc);
>>>>> -        rc = SA_AIS_ERR_BAD_HANDLE;
>>>>> +        if (rc != SA_AIS_ERR_UNAVAILABLE)
>>>>> +            rc = SA_AIS_ERR_BAD_HANDLE;
>>>>>           goto done;
>>>>>       }
>>>>> @@ -1033,7 +1034,8 @@ SaAisErrorT recoverReader(ntfa_client_hd
>>>>>       osafassert(o_msg != NULL);
>>>>>       if ((rc = o_msg->info.api_resp_info.rc) != SA_AIS_OK) {
>>>>>           TRACE("o_msg->info.api_resp_info.rc:%u",
>>>>> o_msg->info.api_resp_info.rc);
>>>>> -        rc = SA_AIS_ERR_BAD_HANDLE;
>>>>> +        if (rc != SA_AIS_ERR_UNAVAILABLE)
>>>>> +            rc = SA_AIS_ERR_BAD_HANDLE;
>>>>>           goto done;
>>>>>       }
>>>>> @@ -1108,7 +1110,8 @@ SaAisErrorT recoverSubscriber(ntfa_clien
>>>>>       if ((rc = o_msg->info.api_resp_info.rc) != SA_AIS_OK) {
>>>>>           TRACE("o_msg->info.api_resp_info.rc:%u",
>>>>> o_msg->info.api_resp_info.rc);
>>>>> -        rc = SA_AIS_ERR_BAD_HANDLE;
>>>>> +        if (rc != SA_AIS_ERR_UNAVAILABLE)
>>>>> +            rc = SA_AIS_ERR_BAD_HANDLE;
>>>>>           goto done;
>>>>>       }
>>>>> @@ -1437,7 +1440,7 @@ SaAisErrorT saNtfDispatch(SaNtfHandleT n
>>>>>       if (!hdl_rec->valid) {
>>>>>           /* recovery */
>>>>>           if ((rc = recoverClient(hdl_rec)) != SA_AIS_OK) {
>>>>> -            if ((rc == SA_AIS_ERR_BAD_HANDLE) || (rc ==
>>>>> SA_AIS_ERR_UNAVAILABLE)) {
>>>>> +            if (rc == SA_AIS_ERR_BAD_HANDLE) {
>>>>>                   ncshm_give_hdl(ntfHandle);
>>>>> osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0);
>>>>> ntfa_hdl_rec_force_del(&ntfa_cb.client_list, hdl_rec);
>>>>> @@ -1445,6 +1448,11 @@ SaAisErrorT saNtfDispatch(SaNtfHandleT n
>>>>>                   ntfa_shutdown(false);
>>>>>                   goto done;
>>>>>               }
>>>>> +            if (rc == SA_AIS_ERR_UNAVAILABLE) {
>>>>> +                TRACE("Node not CLM member or stale client");
>>>>> +                ncshm_give_hdl(ntfHandle);
>>>>> +                goto done;
>>>>> +            }
>>>>>           }
>>>>>       }
>>>>> @@ -1807,7 +1815,7 @@ SaAisErrorT saNtfNotificationSend(SaNtfN
>>>>>           if ((rc = recoverClient(client_rec)) != SA_AIS_OK) {
>>>>>               ncshm_give_hdl(client_handle);
>>>>>               ncshm_give_hdl(notificationHandle);
>>>>> -            if ((rc == SA_AIS_ERR_BAD_HANDLE) || (rc ==
>>>>> SA_AIS_ERR_UNAVAILABLE)) {
>>>>> +            if (rc == SA_AIS_ERR_BAD_HANDLE) {
>>>>> osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0);
>>>>> ntfa_hdl_rec_force_del(&ntfa_cb.client_list,
>>>>> client_rec);
>>>>> osafassert(pthread_mutex_unlock(&ntfa_cb.cb_lock) ==
>>>>> 0);
>>>>> @@ -2153,7 +2161,7 @@ SaAisErrorT saNtfNotificationSubscribe(c
>>>>>           if (notificationFilterHandles->alarmFilterHandle)
>>>>>
>>>>> ncshm_give_hdl(notificationFilterHandles->alarmFilterHandle);
>>>>>       }
>>>>> -    if (recovery_failed && ((rc == SA_AIS_ERR_BAD_HANDLE) || (rc ==
>>>>> SA_AIS_ERR_UNAVAILABLE))) {
>>>>> +    if (recovery_failed && (rc == SA_AIS_ERR_BAD_HANDLE)) {
>>>>> osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0);
>>>>>           ntfa_hdl_rec_force_del(&ntfa_cb.client_list, 
>>>>> client_hdl_rec);
>>>>> osafassert(pthread_mutex_unlock(&ntfa_cb.cb_lock) == 0);
>>>>> @@ -3355,7 +3363,7 @@ SaAisErrorT saNtfNotificationUnsubscribe
>>>>>       if (!client_hdl_rec->valid && getServerState() ==
>>>>> NTFA_NTFSV_UP) {
>>>>>           if ((rc = recoverClient(client_hdl_rec)) != SA_AIS_OK) {
>>>>> -            if ((rc == SA_AIS_ERR_BAD_HANDLE) || (rc ==
>>>>> SA_AIS_ERR_UNAVAILABLE)) {
>>>>> +            if (rc == SA_AIS_ERR_BAD_HANDLE) {
>>>>>                   ncshm_give_hdl(ntfHandle);
>>>>> osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0);
>>>>> ntfa_hdl_rec_force_del(&ntfa_cb.client_list,
>>>>> client_hdl_rec);
>>>>> @@ -3517,7 +3525,7 @@ done_give_client_hdl:
>>>>>       }
>>>>> ncshm_give_hdl(notificationFilterHandles->alarmFilterHandle);
>>>>> -    if (recovery_failed && ((rc == SA_AIS_ERR_BAD_HANDLE) || (rc ==
>>>>> SA_AIS_ERR_UNAVAILABLE))) {
>>>>> +    if (recovery_failed && (rc == SA_AIS_ERR_BAD_HANDLE)) {
>>>>> osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0);
>>>>>           ntfa_hdl_rec_force_del(&ntfa_cb.client_list, 
>>>>> client_hdl_rec);
>>>>> osafassert(pthread_mutex_unlock(&ntfa_cb.cb_lock) == 0);
>>>>> @@ -3621,7 +3629,7 @@ SaAisErrorT saNtfNotificationReadFinaliz
>>>>>       if (!client_hdl_rec->valid && getServerState() ==
>>>>> NTFA_NTFSV_UP) {
>>>>>           if ((rc = recoverClient(client_hdl_rec)) != SA_AIS_OK) {
>>>>> -            if ((rc == SA_AIS_ERR_BAD_HANDLE) || (rc ==
>>>>> SA_AIS_ERR_UNAVAILABLE)) {
>>>>> +            if (rc == SA_AIS_ERR_BAD_HANDLE) {
>>>>> ncshm_give_hdl(client_hdl_rec->local_hdl);
>>>>>                   ncshm_give_hdl(readhandle);
>>>>> osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0);
>>>>> @@ -3699,7 +3707,7 @@ SaAisErrorT saNtfNotificationReadNext(Sa
>>>>>           if ((rc = recoverClient(client_hdl_rec)) != SA_AIS_OK) {
>>>>> ncshm_give_hdl(client_hdl_rec->local_hdl);
>>>>>               ncshm_give_hdl(readHandle);
>>>>> -            if ((rc == SA_AIS_ERR_BAD_HANDLE) || (rc ==
>>>>> SA_AIS_ERR_UNAVAILABLE)) {
>>>>> +            if (rc == SA_AIS_ERR_BAD_HANDLE) {
>>>>> osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0);
>>>>> ntfa_hdl_rec_force_del(&ntfa_cb.client_list,
>>>>> client_hdl_rec);
>>>>> osafassert(pthread_mutex_unlock(&ntfa_cb.cb_lock) ==
>>>>> 0);
>>>>>
>>>>
>>>
>>
>

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to