Hi Praveen, Would you think about quick patch that notify client's mailbox a dummy callback after Agent detect it's non-member, so NTF client can finalize handle right after that. Otherwise as below your explanation, there will be implicit dependency of NTF user on AMF or CLM in this case, and that should be documented.
Thanks, Minh On 14/04/16 07:01, minh chau wrote: > > > On 13/04/16 15:43, praveen malviya wrote: >> >> >> On 12-Apr-16 10:24 PM, minh chau wrote: >>> >>> >>> On 12/04/16 21:49, praveen malviya wrote: >>>> >>>> >>>> On 12-Apr-16 3:56 PM, minh chau wrote: >>>>> Hi Praveen >>>>> >>>>> NTF server also accepts initialize request (and here it comes from >>>>> reinitializeClient() after headless) if NTF server has not >>>>> initialized >>>>> with CLM. >>>>> So after headless, this situation will most likely happen. The >>>>> recovery >>>>> would succeeds, but after that what if NTF server notifies the >>>>> agent it >>>>> is not longer a member, could a subscriber be waiting for >>>>> notification >>>>> while agent is not a member anymore? >>>>> >>>> There is only one event that can lead to this and that is OpenSAF stop >>>> on the node as admin operations are not available in headless state. >>>> But this is the limitation of whole headless solution in every service >>>> as there is no recovery of CLM status of client node at each director >>>> and also recovery of clients is being done very early at MDS up event >>>> of the service. >>>> >>> [Minh] Actually, in non-headless this situation also happens. When >>> client is subscribing for notification, lock a clm node. This client >>> will not be informed error code SA_AIS_ERR_UNAVAILABLE if its filter >>> does not match to any notifications. It has to wait until clm node is >>> unlocked and there is notification to come, so saNtfDispatch will >>> return >>> SA_AIS_ERR_UNAVAILABLE. But if filter does not match, this client will >>> be waiting and can't finalize handle. >>> If this situation is solved in non-headless, the problem stated >>> above in >>> headless should also be solved by the same solution. >>> >> [Praveen]Not only in NTFSv, same logic of waiting for an event to get >> unblocked from poll() is valid for all the other services >> applications also as all SAF services are integrated with CLMSv. I do >> not know whether one should poll indefinitely or not and in case of >> finite poll time what an application must do after poll times out. >> >> But I think, from SAF perspective still this cannot be classified as >> a problem. The reason is any such application's life cycle is >> monitored by AMF and AMF terminates such process as part of CLM node >> eviction. Also CLM provides traker interface for this purpose only. >> At the same time, I have observed that for ERR_UNAVAILABLE AMF spec >> is particularly more clear as it states on section 7.2.1 on page 243 >> ================================ >> However, there are a few special situations in which processes may >> call Availability Management Framework API functions. >> • An Availability Management Framework API function is called by a >> process nearly at the same time when the node exits the cluster and >> the Availability Management Framework area server on the node has not >> yet terminated the process. >> .......... >> ================================= >> And for above mentioned cases AMF will return ERR_UNAVAILABLE.So it >> seems ERR_UNAVAILABLE is meant for such special cases.So any >> application must rely on its own subscription to CLMSv. Or Admin will >> have to take care of this. >> I will check other SAF documents like Cprogramming doc and overview >> doc if something in this context is mentioned. > [Minh] I think application can be purely NTF client only which does > not have to initialize with AMF, or maybe I don't understand your idea. > Let's look at this example: Running subscriber with filter "ABC", lock > CLM node, unlock CLM node again. Then some applications in cluster > raise notification ABC. > With current implementation, this subscriber get notified > ERR_UNAVAILABLE when notification ABC coming to its mailbox, thus it > eventually lost this notification ABC. > But if NTF notified ERR_UNAVAILABLE after locking CLM node, this > subscriber can earlier finalize its handle with NTF. It can wait by > somehow until CLM node is unlocked again, or it can initialize CLMsv > to know when a node becoming a member again. After unlock CLM as above > example, this subscriber is ready to receive notification and when > notification ABC comes, subscriber can receive it. And I guess this is > the idea mentioned in NTF spec: > > /"If the cluster node rejoins the cluster membership, processes > executing on the cluster node will be able to reinitialize new library > handles and use the entire set of Notification Service APIs that > operate on these new handles; however, invocation of APIs that operate > on handles acquired by any process before the cluster node left the > membership will continue to fail with SA_AIS_ERR_UNAVAILABLE with the > exception of saNtfFinalize(), which is used to free the library > handles and all resources associated with these handles. Hence, it is > recommended for the processes to finalize the library handles as soon > as the processes detect that the cluster node left the membership." > > /Thanks, > Miinh/ > / >> >> >>> Another issue but not relate to this ticket, that ntftool does not >>> handle SA_AIS_ERR_UNAVAILABLE. I get ntfsubscriber indefinite loop in >>> calling saNtfDispatch() when ntfsubscriber receives >>> SA_AIS_ERR_UNAVAILABLE. >>> >> [Praveen]I will fix this as a part of #1745. >> >> >> Thanks, >> Praveen >>> Thanks, >>> Minh >>>> >>>> Thanks, >>>> Praveen >>>>> Thanks, >>>>> Minh >>>>> >>>>> On 11/04/16 15:46, praveen.malv...@oracle.com wrote: >>>>>> osaf/libs/agents/saf/ntfa/ntfa_api.c | 28 >>>>>> ++++++++++++++++++---------- >>>>>> 1 files changed, 18 insertions(+), 10 deletions(-) >>>>>> >>>>>> >>>>>> During headless state, OpenSAF may get stopped on payload with >>>>>> NTF app >>>>>> running. >>>>>> Since OpenSAF is not running on the payload, any A.01.02 NTF client >>>>>> should not be served on >>>>>> this node and this client should not be recovered. After first >>>>>> controller comes up, A.01.02 >>>>>> client will not be recovered and application will get >>>>>> SA_AIS_ERR_UNAVAILABLE upon which an >>>>>> app can call saNtfFinalize() for freeing the resources. >>>>>> >>>>>> diff --git a/osaf/libs/agents/saf/ntfa/ntfa_api.c >>>>>> b/osaf/libs/agents/saf/ntfa/ntfa_api.c >>>>>> --- a/osaf/libs/agents/saf/ntfa/ntfa_api.c >>>>>> +++ b/osaf/libs/agents/saf/ntfa/ntfa_api.c >>>>>> @@ -966,7 +966,8 @@ SaAisErrorT reinitializeClient(ntfa_clie >>>>>> } >>>>>> if ((rc = o_msg->info.api_resp_info.rc) != SA_AIS_OK) { >>>>>> TRACE("info.api_resp_info.rc:%u", >>>>>> o_msg->info.api_resp_info.rc); >>>>>> - rc = SA_AIS_ERR_BAD_HANDLE; >>>>>> + if (rc != SA_AIS_ERR_UNAVAILABLE) >>>>>> + rc = SA_AIS_ERR_BAD_HANDLE; >>>>>> goto done; >>>>>> } >>>>>> @@ -1033,7 +1034,8 @@ SaAisErrorT recoverReader(ntfa_client_hd >>>>>> osafassert(o_msg != NULL); >>>>>> if ((rc = o_msg->info.api_resp_info.rc) != SA_AIS_OK) { >>>>>> TRACE("o_msg->info.api_resp_info.rc:%u", >>>>>> o_msg->info.api_resp_info.rc); >>>>>> - rc = SA_AIS_ERR_BAD_HANDLE; >>>>>> + if (rc != SA_AIS_ERR_UNAVAILABLE) >>>>>> + rc = SA_AIS_ERR_BAD_HANDLE; >>>>>> goto done; >>>>>> } >>>>>> @@ -1108,7 +1110,8 @@ SaAisErrorT recoverSubscriber(ntfa_clien >>>>>> if ((rc = o_msg->info.api_resp_info.rc) != SA_AIS_OK) { >>>>>> TRACE("o_msg->info.api_resp_info.rc:%u", >>>>>> o_msg->info.api_resp_info.rc); >>>>>> - rc = SA_AIS_ERR_BAD_HANDLE; >>>>>> + if (rc != SA_AIS_ERR_UNAVAILABLE) >>>>>> + rc = SA_AIS_ERR_BAD_HANDLE; >>>>>> goto done; >>>>>> } >>>>>> @@ -1437,7 +1440,7 @@ SaAisErrorT saNtfDispatch(SaNtfHandleT n >>>>>> if (!hdl_rec->valid) { >>>>>> /* recovery */ >>>>>> if ((rc = recoverClient(hdl_rec)) != SA_AIS_OK) { >>>>>> - if ((rc == SA_AIS_ERR_BAD_HANDLE) || (rc == >>>>>> SA_AIS_ERR_UNAVAILABLE)) { >>>>>> + if (rc == SA_AIS_ERR_BAD_HANDLE) { >>>>>> ncshm_give_hdl(ntfHandle); >>>>>> osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0); >>>>>> ntfa_hdl_rec_force_del(&ntfa_cb.client_list, hdl_rec); >>>>>> @@ -1445,6 +1448,11 @@ SaAisErrorT saNtfDispatch(SaNtfHandleT n >>>>>> ntfa_shutdown(false); >>>>>> goto done; >>>>>> } >>>>>> + if (rc == SA_AIS_ERR_UNAVAILABLE) { >>>>>> + TRACE("Node not CLM member or stale client"); >>>>>> + ncshm_give_hdl(ntfHandle); >>>>>> + goto done; >>>>>> + } >>>>>> } >>>>>> } >>>>>> @@ -1807,7 +1815,7 @@ SaAisErrorT saNtfNotificationSend(SaNtfN >>>>>> if ((rc = recoverClient(client_rec)) != SA_AIS_OK) { >>>>>> ncshm_give_hdl(client_handle); >>>>>> ncshm_give_hdl(notificationHandle); >>>>>> - if ((rc == SA_AIS_ERR_BAD_HANDLE) || (rc == >>>>>> SA_AIS_ERR_UNAVAILABLE)) { >>>>>> + if (rc == SA_AIS_ERR_BAD_HANDLE) { >>>>>> osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0); >>>>>> ntfa_hdl_rec_force_del(&ntfa_cb.client_list, >>>>>> client_rec); >>>>>> osafassert(pthread_mutex_unlock(&ntfa_cb.cb_lock) == >>>>>> 0); >>>>>> @@ -2153,7 +2161,7 @@ SaAisErrorT saNtfNotificationSubscribe(c >>>>>> if (notificationFilterHandles->alarmFilterHandle) >>>>>> >>>>>> ncshm_give_hdl(notificationFilterHandles->alarmFilterHandle); >>>>>> } >>>>>> - if (recovery_failed && ((rc == SA_AIS_ERR_BAD_HANDLE) || (rc == >>>>>> SA_AIS_ERR_UNAVAILABLE))) { >>>>>> + if (recovery_failed && (rc == SA_AIS_ERR_BAD_HANDLE)) { >>>>>> osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0); >>>>>> ntfa_hdl_rec_force_del(&ntfa_cb.client_list, client_hdl_rec); >>>>>> osafassert(pthread_mutex_unlock(&ntfa_cb.cb_lock) == 0); >>>>>> @@ -3355,7 +3363,7 @@ SaAisErrorT saNtfNotificationUnsubscribe >>>>>> if (!client_hdl_rec->valid && getServerState() == >>>>>> NTFA_NTFSV_UP) { >>>>>> if ((rc = recoverClient(client_hdl_rec)) != SA_AIS_OK) { >>>>>> - if ((rc == SA_AIS_ERR_BAD_HANDLE) || (rc == >>>>>> SA_AIS_ERR_UNAVAILABLE)) { >>>>>> + if (rc == SA_AIS_ERR_BAD_HANDLE) { >>>>>> ncshm_give_hdl(ntfHandle); >>>>>> osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0); >>>>>> ntfa_hdl_rec_force_del(&ntfa_cb.client_list, >>>>>> client_hdl_rec); >>>>>> @@ -3517,7 +3525,7 @@ done_give_client_hdl: >>>>>> } >>>>>> ncshm_give_hdl(notificationFilterHandles->alarmFilterHandle); >>>>>> - if (recovery_failed && ((rc == SA_AIS_ERR_BAD_HANDLE) || (rc == >>>>>> SA_AIS_ERR_UNAVAILABLE))) { >>>>>> + if (recovery_failed && (rc == SA_AIS_ERR_BAD_HANDLE)) { >>>>>> osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0); >>>>>> ntfa_hdl_rec_force_del(&ntfa_cb.client_list, client_hdl_rec); >>>>>> osafassert(pthread_mutex_unlock(&ntfa_cb.cb_lock) == 0); >>>>>> @@ -3621,7 +3629,7 @@ SaAisErrorT saNtfNotificationReadFinaliz >>>>>> if (!client_hdl_rec->valid && getServerState() == >>>>>> NTFA_NTFSV_UP) { >>>>>> if ((rc = recoverClient(client_hdl_rec)) != SA_AIS_OK) { >>>>>> - if ((rc == SA_AIS_ERR_BAD_HANDLE) || (rc == >>>>>> SA_AIS_ERR_UNAVAILABLE)) { >>>>>> + if (rc == SA_AIS_ERR_BAD_HANDLE) { >>>>>> ncshm_give_hdl(client_hdl_rec->local_hdl); >>>>>> ncshm_give_hdl(readhandle); >>>>>> osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0); >>>>>> @@ -3699,7 +3707,7 @@ SaAisErrorT saNtfNotificationReadNext(Sa >>>>>> if ((rc = recoverClient(client_hdl_rec)) != SA_AIS_OK) { >>>>>> ncshm_give_hdl(client_hdl_rec->local_hdl); >>>>>> ncshm_give_hdl(readHandle); >>>>>> - if ((rc == SA_AIS_ERR_BAD_HANDLE) || (rc == >>>>>> SA_AIS_ERR_UNAVAILABLE)) { >>>>>> + if (rc == SA_AIS_ERR_BAD_HANDLE) { >>>>>> osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0); >>>>>> ntfa_hdl_rec_force_del(&ntfa_cb.client_list, >>>>>> client_hdl_rec); >>>>>> osafassert(pthread_mutex_unlock(&ntfa_cb.cb_lock) == >>>>>> 0); >>>>>> >>>>> >>>> >>> >> > ------------------------------------------------------------------------------ Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel