Regarding ticket [#1781], I think that one requires some more thought. First of all, do we want to assign the STANDBY role to the OpenSAF directors running on a CLM locked node? If we do want a CLM locked node to become standby, then CLM ought to provide service to middleware clients running on a locked node! It must differentiate between middleware- and non-middleware clients.
By the way, saClmInitialize_4() and saClmSelectionObjectGet() should not return ERR_UNAVAILABLE on configured non-member nodes - they shall only return ERR_UNAVAILABLE on unconfigured nodes. regards, Anders Widell On 07/20/2016 12:55 PM, praveen malviya wrote: > Hi Minh, > > For the ticket #1812 I had given one comment. > It was: > "In the fix of ticekt #1781, it has been suggested for spare controllers > to init with CLM before become AMF role aware. Here also same problem > will come. If spare is running on CLM locked node then it will never > come out of avd_clm_init() as ERR_UNAVAILBLE is handled there for > reinit. Also admin will not be able to unlock it until one of the > controller joins. In this particular case AMFD must exit instead of > indefinitely calling the CLM init API. > Also this fix will cause re-init will CLM in non-headless case also. I > think in non-headless case it will be good to initialize with CLM in a > separate thread" > > I think it is valid here also or is it handled. > Thanks, > Praveen > On 11-Jul-16 1:01 PM, Minh Hon Chau wrote: >> osaf/services/saf/amf/amfd/clm.cc | 103 >> ++++++++++++++++++++++++++---- >> osaf/services/saf/amf/amfd/include/cb.h | 7 +- >> osaf/services/saf/amf/amfd/include/clm.h | 6 +- >> osaf/services/saf/amf/amfd/include/ntf.h | 3 + >> osaf/services/saf/amf/amfd/main.cc | 20 ++++- >> osaf/services/saf/amf/amfd/ntf.cc | 85 +++++++++++++++++++++++++ >> osaf/services/saf/amf/amfd/role.cc | 15 +--- >> 7 files changed, 204 insertions(+), 35 deletions(-) >> >> >> In new controller reallocation scenario with roaming sc feature, if immnd >> dies in the node becoming active, the circular dependencies among Opensaf >> services appear, which leads eventually to a reboot. >> >> The dependencies are: >> .clmd can not use IMM services since immnd dies >> .immnd needs restarted by amfnd >> .amfnd is hanging since amfnd is calling CLM services >> .amfd is also hanging since amfd is calling CLM and NTF services >> .ntfd is hanging due to logd's dependencies on IMM >> >> The problem could be solved if: >> . amfd initializes NTF, CLM handle in thread in initialization phase >> . amfnd initializes CLM in thread if amfnd receives clm bad handle >> >> Since amfnd has already initialized CLM in thread up on receiving clm bad >> handle. This patch does initialze CLM, NTF in thread at amfd side. Also, >> threading initialization in this patch can be refactored later by utilizing >> the support of #1609 >> >> diff --git a/osaf/services/saf/amf/amfd/clm.cc >> b/osaf/services/saf/amf/amfd/clm.cc >> --- a/osaf/services/saf/amf/amfd/clm.cc >> +++ b/osaf/services/saf/amf/amfd/clm.cc >> @@ -386,14 +386,26 @@ static const SaClmCallbacksT_4 clm_callb >> /*.saClmClusterTrackCallback =*/ clm_track_cb >> }; >> >> -SaAisErrorT avd_clm_init(void) >> +SaAisErrorT avd_clm_init(AVD_CL_CB* cb) >> { >> - SaAisErrorT error = SA_AIS_OK; >> + SaAisErrorT error = SA_AIS_OK; >> + SaClmHandleT clm_handle = 0; >> + SaSelectionObjectT sel_obj = 0; >> >> + cb->clmHandle = 0; >> + cb->clm_sel_obj = 0; >> TRACE_ENTER(); >> + /* >> + * TODO: This CLM initialization thread can be re-factored >> + * after having osaf dedicated thread, so that all APIs calls >> + * to external service can be automatically retried with result >> + * code (TRY_AGAIN, TIMEOUT, UNAVAILABLE), or reinitialized within >> + * BAD_HANDLE. Also, duplicated codes in initialization thread >> + * will be moved to osaf dedicated thread >> + */ >> for (;;) { >> SaVersionT Version = { 'B', 4, 1 }; >> - error = saClmInitialize_4(&avd_cb->clmHandle, &clm_callbacks, >> &Version); >> + error = saClmInitialize_4(&clm_handle, &clm_callbacks, >> &Version); >> if (error == SA_AIS_ERR_TRY_AGAIN || >> error == SA_AIS_ERR_TIMEOUT || >> error == SA_AIS_ERR_UNAVAILABLE) { >> @@ -404,15 +416,21 @@ SaAisErrorT avd_clm_init(void) >> osaf_nanosleep(&kHundredMilliseconds); >> continue; >> } >> - if (error == SA_AIS_OK) break; >> - LOG_ER("Failed to Initialize with CLM: %u", error); >> + if (error == SA_AIS_OK) { >> + break; >> + }else { >> + LOG_ER("Failed to Initialize with CLM: %u", error); >> + goto done; >> + } >> + } >> + cb->clmHandle = clm_handle; >> + error = saClmSelectionObjectGet(cb->clmHandle, &sel_obj); >> + if (error != SA_AIS_OK) { >> + LOG_ER("Failed to get selection object from CLM %u", error); >> + cb->clmHandle = 0; >> goto done; >> } >> - error = saClmSelectionObjectGet(avd_cb->clmHandle, >> &avd_cb->clm_sel_obj); >> - if (SA_AIS_OK != error) { >> - LOG_ER("Failed to get selection object from CLM %u", error); >> - goto done; >> - } >> + cb->clm_sel_obj = sel_obj; >> >> TRACE("Successfully initialized CLM"); >> >> @@ -428,10 +446,15 @@ SaAisErrorT avd_clm_track_start(void) >> >> TRACE_ENTER(); >> error = saClmClusterTrack_4(avd_cb->clmHandle, trackFlags, nullptr); >> - if (SA_AIS_OK != error) >> - LOG_ER("Failed to start cluster tracking %u", error); >> - >> - TRACE_LEAVE(); >> + if (error != SA_AIS_OK) { >> + if (error == SA_AIS_ERR_TRY_AGAIN || error == >> SA_AIS_ERR_TIMEOUT || >> + error == SA_AIS_ERR_UNAVAILABLE) { >> + LOG_WA("Failed to start cluster tracking %u", error); >> + } else { >> + LOG_ER("Failed to start cluster tracking %u", error); >> + } >> + } >> + TRACE_LEAVE(); >> return error; >> } >> >> @@ -468,3 +491,55 @@ void clm_node_terminate(AVD_AVND *node) >> else >> TRACE("Waiting for the pending SU presence state updates"); >> } >> + >> +static void* avd_clm_init_thread(void* arg) >> +{ >> + TRACE_ENTER(); >> + AVD_CL_CB* cb = static_cast<AVD_CL_CB*>(arg); >> + SaAisErrorT error = SA_AIS_OK; >> + >> + if (avd_clm_init(cb) != SA_AIS_OK) { >> + LOG_ER("avd_clm_init FAILED"); >> + goto done; >> + } >> + >> + if (cb->avail_state_avd == SA_AMF_HA_ACTIVE) { >> + for (;;) { >> + error = avd_clm_track_start(); >> + if (error == SA_AIS_ERR_TRY_AGAIN || >> + error == SA_AIS_ERR_TIMEOUT || >> + error == SA_AIS_ERR_UNAVAILABLE) { >> + osaf_nanosleep(&kHundredMilliseconds); >> + continue; >> + } >> + if (error == SA_AIS_OK) { >> + break; >> + } else { >> + LOG_ER("avd_clm_track_start FAILED, error: %u", >> error); >> + goto done; >> + } >> + } >> + } >> + >> +done: >> + TRACE_LEAVE(); >> + return nullptr; >> +} >> + >> +SaAisErrorT avd_start_clm_init_bg(void) >> +{ >> + pthread_t thread; >> + pthread_attr_t attr; >> + pthread_attr_init(&attr); >> + pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED); >> + >> + if (pthread_create(&thread, &attr, avd_clm_init_thread, avd_cb) != 0) { >> + LOG_ER("pthread_create FAILED: %s", strerror(errno)); >> + exit(EXIT_FAILURE); >> + } >> + >> + pthread_attr_destroy(&attr); >> + return SA_AIS_OK; >> +} >> + >> + >> diff --git a/osaf/services/saf/amf/amfd/include/cb.h >> b/osaf/services/saf/amf/amfd/include/cb.h >> --- a/osaf/services/saf/amf/amfd/include/cb.h >> +++ b/osaf/services/saf/amf/amfd/include/cb.h >> @@ -47,6 +47,7 @@ >> >> #include <list> >> #include <queue> >> +#include <atomic> >> >> class AVD_SI; >> class AVD_AVND; >> @@ -193,7 +194,7 @@ typedef struct cl_cb_tag { >> since the cluster boot time */ >> >> /********** NTF related params ***********/ >> - SaNtfHandleT ntfHandle; >> + std::atomic<SaNtfHandleT> ntfHandle; >> >> /********Peer AvD related*********************/ >> AVD_EXT_COMP_INFO ext_comp_info; >> @@ -207,8 +208,8 @@ typedef struct cl_cb_tag { >> bool is_implementer; >> >> /* Clm stuff */ >> - SaClmHandleT clmHandle; >> - SaSelectionObjectT clm_sel_obj; >> + std::atomic<SaClmHandleT> clmHandle; >> + std::atomic<SaSelectionObjectT> clm_sel_obj; >> >> bool fully_initialized; >> bool swap_switch; /* true - In middle of role switch. */ >> diff --git a/osaf/services/saf/amf/amfd/include/clm.h >> b/osaf/services/saf/amf/amfd/include/clm.h >> --- a/osaf/services/saf/amf/amfd/include/clm.h >> +++ b/osaf/services/saf/amf/amfd/include/clm.h >> @@ -21,10 +21,14 @@ >> #ifndef _AVD_CLM_H >> #define _AVD_CLM_H >> >> -extern SaAisErrorT avd_clm_init(void); >> +struct cl_cb_tag; >> + >> + >> +extern SaAisErrorT avd_clm_init(struct cl_cb_tag*); >> extern SaAisErrorT avd_clm_track_start(void); >> extern SaAisErrorT avd_clm_track_stop(void); >> extern void clm_node_terminate(AVD_AVND *node); >> +extern SaAisErrorT avd_start_clm_init_bg(void); >> >> #endif >> >> diff --git a/osaf/services/saf/amf/amfd/include/ntf.h >> b/osaf/services/saf/amf/amfd/include/ntf.h >> --- a/osaf/services/saf/amf/amfd/include/ntf.h >> +++ b/osaf/services/saf/amf/amfd/include/ntf.h >> @@ -105,4 +105,7 @@ void avd_alarm_clear(const SaNameT *name >> >> void avd_send_error_report_ntf(const SaNameT *name, >> SaAmfRecommendedRecoveryT recovery); >> >> +extern SaAisErrorT avd_ntf_init(struct cl_cb_tag*); >> +extern SaAisErrorT avd_start_ntf_init_bg(void); >> + >> #endif >> diff --git a/osaf/services/saf/amf/amfd/main.cc >> b/osaf/services/saf/amf/amfd/main.cc >> --- a/osaf/services/saf/amf/amfd/main.cc >> +++ b/osaf/services/saf/amf/amfd/main.cc >> @@ -578,12 +578,21 @@ static uint32_t initialize(void) >> >> // CLM init is independent of this SC's role. Init with CLM early. >> >> - if (avd_clm_init() != SA_AIS_OK) { >> + cb->avail_state_avd = role; >> + >> + if (avd_start_clm_init_bg() != SA_AIS_OK) { >> LOG_EM("avd_clm_init FAILED"); >> rc = NCSCC_RC_FAILURE; >> goto done; >> } >> >> + // Initialize NTF handle in thread >> + if (avd_start_ntf_init_bg() != SA_AIS_OK) { >> + LOG_EM("avd_start_ntf_init_bg FAILED"); >> + goto done; >> + } >> + >> + >> if ((rc = initialize_for_assignment(cb, role)) >> != NCSCC_RC_SUCCESS) { >> LOG_ER("initialize_for_assignment FAILED %u", (unsigned) rc); >> @@ -633,11 +642,14 @@ static void main_loop(void) >> while (1) { >> fds[FD_MBCSV].fd = cb->mbcsv_sel_obj; >> fds[FD_MBCSV].events = POLLIN; >> - fds[FD_CLM].fd = cb->clm_sel_obj; >> - fds[FD_CLM].events = POLLIN; >> fds[FD_IMM].fd = cb->imm_sel_obj; // IMM fd must be last in >> array >> fds[FD_IMM].events = POLLIN; >> - >> + >> + if (cb->clmHandle != 0) { >> + fds[FD_CLM].fd = cb->clm_sel_obj; >> + fds[FD_CLM].events = POLLIN; >> + } >> + >> if (cb->immOiHandle != 0) { >> fds[FD_IMM].fd = cb->imm_sel_obj; >> fds[FD_IMM].events = POLLIN; >> diff --git a/osaf/services/saf/amf/amfd/ntf.cc >> b/osaf/services/saf/amf/amfd/ntf.cc >> --- a/osaf/services/saf/amf/amfd/ntf.cc >> +++ b/osaf/services/saf/amf/amfd/ntf.cc >> @@ -25,6 +25,7 @@ >> #include <logtrace.h> >> #include <util.h> >> #include <ntf.h> >> +#include "osaf_time.h" >> >> >> /***************************************************************************** >> Name : avd_send_comp_inst_failed_alarm >> @@ -572,6 +573,12 @@ uint32_t sendAlarmNotificationAvd(AVD_CL >> return status; >> } >> >> + if (avd_cb->ntfHandle == 0) { >> + LOG_ER("NTF handle has not been initialized, alarm notification >> " >> + "for (%s) will be lost", ntf_object.value); >> + return status; >> + } >> + >> if (type != 0) { >> add_info_items = 1; >> allocation_size = SA_NTF_ALLOC_SYSTEM_LIMIT; >> @@ -660,6 +667,13 @@ uint32_t sendStateChangeNotificationAvd( >> LOG_WA("State change notification lost for '%s'", >> ntf_object.value); >> return status; >> } >> + >> + if (avd_cb->ntfHandle == 0) { >> + LOG_WA("NTF handle has not been initialized, state change >> notification " >> + "for (%s) will be lost", ntf_object.value); >> + return status; >> + } >> + >> if (additional_info_is_present == true) { >> add_info_items = 1; >> allocation_size = SA_NTF_ALLOC_SYSTEM_LIMIT; >> @@ -770,4 +784,75 @@ void avd_send_error_report_ntf(const SaN >> TRACE_LEAVE(); >> } >> >> +SaAisErrorT avd_ntf_init(AVD_CL_CB* cb) >> +{ >> + SaAisErrorT error = SA_AIS_OK; >> + SaNtfHandleT ntf_handle; >> + TRACE_ENTER(); >> >> + // reset handle >> + cb->ntfHandle = 0; >> + >> + /* >> + * TODO: to be re-factored as CLM initialization thread >> + */ >> + for (;;) { >> + SaVersionT ntfVersion = { 'A', 0x01, 0x01 }; >> + >> + error = saNtfInitialize(&ntf_handle, NULL, &ntfVersion); >> + if (error == SA_AIS_ERR_TRY_AGAIN || >> + error == SA_AIS_ERR_TIMEOUT || >> + error == SA_AIS_ERR_UNAVAILABLE) { >> + if (error != SA_AIS_ERR_TRY_AGAIN) { >> + LOG_WA("saNtfInitialize returned %u", >> + (unsigned) error); >> + } >> + osaf_nanosleep(&kHundredMilliseconds); >> + continue; >> + } >> + if (error == SA_AIS_OK) { >> + break; >> + } else { >> + LOG_ER("Failed to Initialize with NTF: %u", error); >> + goto done; >> + } >> + } >> + cb->ntfHandle = ntf_handle; >> + TRACE("Successfully initialized NTF"); >> + >> +done: >> + TRACE_LEAVE(); >> + return error; >> +} >> + >> +static void* avd_ntf_init_thread(void* arg) >> +{ >> + TRACE_ENTER(); >> + AVD_CL_CB* cb = static_cast<AVD_CL_CB*>(arg); >> + >> + if (avd_ntf_init(cb) != SA_AIS_OK) { >> + LOG_ER("avd_clm_init FAILED"); >> + goto done; >> + } >> + >> +done: >> + TRACE_LEAVE(); >> + return nullptr; >> +} >> + >> +SaAisErrorT avd_start_ntf_init_bg(void) >> +{ >> + pthread_t thread; >> + pthread_attr_t attr; >> + pthread_attr_init(&attr); >> + pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED); >> + >> + if (pthread_create(&thread, &attr, avd_ntf_init_thread, avd_cb) != 0) { >> + LOG_ER("pthread_create FAILED: %s", strerror(errno)); >> + exit(EXIT_FAILURE); >> + } >> + >> + pthread_attr_destroy(&attr); >> + >> + return SA_AIS_OK; >> +} >> diff --git a/osaf/services/saf/amf/amfd/role.cc >> b/osaf/services/saf/amf/amfd/role.cc >> --- a/osaf/services/saf/amf/amfd/role.cc >> +++ b/osaf/services/saf/amf/amfd/role.cc >> @@ -174,9 +174,8 @@ void avd_role_change_evh(AVD_CL_CB *cb, >> uint32_t initialize_for_assignment(cl_cb_tag* cb, SaAmfHAStateT ha_state) >> { >> TRACE_ENTER2("ha_state = %d", static_cast<int>(ha_state)); >> - SaVersionT ntfVersion = {'A', 0x01, 0x01}; >> uint32_t rc = NCSCC_RC_SUCCESS; >> - SaAisErrorT error; >> + >> if (cb->fully_initialized) goto done; >> cb->avail_state_avd = ha_state; >> if (ha_state == SA_AMF_HA_QUIESCED) { >> @@ -199,12 +198,7 @@ uint32_t initialize_for_assignment(cl_cb >> rc = NCSCC_RC_FAILURE; >> goto done; >> } >> - if ((error = saNtfInitialize(&cb->ntfHandle, nullptr, &ntfVersion)) != >> - SA_AIS_OK) { >> - LOG_ER("saNtfInitialize Failed (%u)", error); >> - rc = NCSCC_RC_FAILURE; >> - goto done; >> - } >> + >> if ((rc = avd_mds_set_vdest_role(cb, ha_state)) != NCSCC_RC_SUCCESS) { >> LOG_ER("avd_mds_set_vdest_role FAILED"); >> goto done; >> @@ -273,11 +267,6 @@ uint32_t avd_active_role_initialization( >> >> avd_imm_update_runtime_attrs(); >> >> - if (avd_clm_track_start() != SA_AIS_OK) { >> - LOG_ER("avd_clm_track_start FAILED"); >> - goto done; >> - } >> - >> status = NCSCC_RC_SUCCESS; >> done: >> TRACE_LEAVE(); >> > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic > patterns at an interface-level. Reveals which users, apps, and protocols are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity planning > reports.http://sdm.link/zohodev2dev > _______________________________________________ > Opensaf-devel mailing list > Opensaf-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/opensaf-devel > ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports.http://sdm.link/zohodev2dev _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel