Hi Praveen,
I am thinking if creating a CLM job (likely IMM update, NTF send job) in
order to stop tracking at non-active AMFD.
Should it be a better version than this patch that AMFD will still have
chance to stop tracking until being promoted back to active role?
Thanks,
Minh
On 18/11/16 11:38, minh chau wrote:
> Hi Praveen,
>
> It has been seen once in swapping 2N Opensaf SI. It was just an error
> indicating that AMFD failed to stop track callback, after that the
> si-swap also succeeded. CLM service could be busy during switch-over
> but AMF (as well as other services) is supposed to handle that return
> code properly as AMFD has done in track start. Further testing, it
> exposed a problem due to clm track callback received on standby AMFD.
>
> AMFD can not be too busy just try to stop clm track. First option,
> AMFD can start another thread which just tries to stop track callback,
> but that's likely to generate more problems where this standby AMFD is
> promoted back to active in the meantime that thread of stopping track
> is on going. At that time, main thread will be starting track and
> another thread is trying to stop track.
>
> Second option as in the patch, AMFD retries to stop track where
> problem happens in callback. At the time standby AMFD switching back
> to active, we stop and start track again, but it seems no point of
> doing this. I hope it's less impact on existing behavior, do you see
> any problems?
>
> Regarding standby controller with locked/disabled CLM, I think it is
> mentioned (in #1828 or some where else?) that this matter requires
> more thoughts as whether we want Ha role assigned to AMFD in non
> member node. It seems to beyond this ticket.
>
> Thanks,
> Minh
>
> On 17/11/16 22:03, praveen malviya wrote:
>> Hi Minh,
>>
>> We have not seen this issue earlier. Why CLMA is returning timeout
>> where it is busy?
>> If we are allowing Standby controller to continue tracking the Node,
>> then I think (though not fully sure), it will increase one more
>> client for validation step and this client does not exist locally.
>> Also we need to evaluate the situation where Standby controller is
>> CLM locked or disabled.
>>
>> Thanks,
>> Praveen
>>
>>
>>
>>
>> On 17-Nov-16 3:44 AM, minh chau wrote:
>>> Hi all,
>>>
>>> Has anyone had chance to review this patch?
>>>
>>> Thanks,
>>> Minh
>>>
>>> On 14/11/16 15:27, Minh Hon Chau wrote:
>>>> osaf/services/saf/amf/amfd/clm.cc | 37
>>>> +++++++++++++++++++++++---------
>>>> osaf/services/saf/amf/amfd/include/cb.h | 1 +
>>>> osaf/services/saf/amf/amfd/role.cc | 16 +++++++-------
>>>> 3 files changed, 35 insertions(+), 19 deletions(-)
>>>>
>>>>
>>>> In controller failover/switchover, sometimes active AMFD fails to stop
>>>> CLM track callback. Therefore, when this AMFD become standby, AMFD can
>>>> continue receiving CLM track callback and trigger the operations which
>>>> should only be executed in active AMFD.
>>>>
>>>> diff --git a/osaf/services/saf/amf/amfd/clm.cc
>>>> b/osaf/services/saf/amf/amfd/clm.cc
>>>> --- a/osaf/services/saf/amf/amfd/clm.cc
>>>> +++ b/osaf/services/saf/amf/amfd/clm.cc
>>>> @@ -219,7 +219,13 @@ static void clm_track_cb(const SaClmClus
>>>> LOG_ER("ClmTrackCallback received in error");
>>>> goto done;
>>>> }
>>>> -
>>>> + if (avd_cb->avail_state_avd != SA_AMF_HA_ACTIVE) {
>>>> + if (avd_cb->is_clm_track_started == true) {
>>>> + LOG_NO("Retry to stop clm track with AMFD state(%d)",
>>>> avd_cb->avail_state_avd);
>>>> + avd_clm_track_stop();
>>>> + }
>>>> + goto done;
>>>> + }
>>>> /*
>>>> ** The CLM cluster can be larger than the AMF cluster thus it is
>>>> not an
>>>> ** error if the corresponding AMF node cannot be found.
>>>> @@ -394,6 +400,7 @@ SaAisErrorT avd_clm_init(AVD_CL_CB* cb)
>>>> cb->clmHandle = 0;
>>>> cb->clm_sel_obj = 0;
>>>> + cb->is_clm_track_started = false;
>>>> TRACE_ENTER();
>>>> /*
>>>> * TODO: This CLM initialization thread can be re-factored
>>>> @@ -453,6 +460,8 @@ SaAisErrorT avd_clm_track_start(void)
>>>> } else {
>>>> LOG_ER("Failed to start cluster tracking %u", error);
>>>> }
>>>> + } else {
>>>> + avd_cb->is_clm_track_started = true;
>>>> }
>>>> TRACE_LEAVE();
>>>> return error;
>>>> @@ -460,17 +469,23 @@ SaAisErrorT avd_clm_track_start(void)
>>>> SaAisErrorT avd_clm_track_stop(void)
>>>> {
>>>> - SaAisErrorT error = SA_AIS_OK;
>>>> + SaAisErrorT error = SA_AIS_OK;
>>>> + TRACE_ENTER();
>>>> + error = saClmClusterTrackStop(avd_cb->clmHandle);
>>>> + if (error != SA_AIS_OK) {
>>>> + if (error == SA_AIS_ERR_TRY_AGAIN || error ==
>>>> SA_AIS_ERR_TIMEOUT ||
>>>> + error == SA_AIS_ERR_UNAVAILABLE) {
>>>> + LOG_WA("Failed to stop cluster tracking %u", error);
>>>> + } else {
>>>> + LOG_ER("Failed to stop cluster tracking %u", error);
>>>> + }
>>>> + } else {
>>>> + TRACE("Sucessfully stops cluster tracking");
>>>> + avd_cb->is_clm_track_started = false;
>>>> + }
>>>> - TRACE_ENTER();
>>>> - error = saClmClusterTrackStop(avd_cb->clmHandle);
>>>> - if (SA_AIS_OK != error)
>>>> - LOG_ER("Failed to stop cluster tracking %u", error);
>>>> - else
>>>> - TRACE("Sucessfully stops cluster tracking");
>>>> -
>>>> - TRACE_LEAVE();
>>>> - return error;
>>>> + TRACE_LEAVE();
>>>> + return error;
>>>> }
>>>> void clm_node_terminate(AVD_AVND *node)
>>>> diff --git a/osaf/services/saf/amf/amfd/include/cb.h
>>>> b/osaf/services/saf/amf/amfd/include/cb.h
>>>> --- a/osaf/services/saf/amf/amfd/include/cb.h
>>>> +++ b/osaf/services/saf/amf/amfd/include/cb.h
>>>> @@ -210,6 +210,7 @@ typedef struct cl_cb_tag {
>>>> /* Clm stuff */
>>>> std::atomic<SaClmHandleT> clmHandle;
>>>> std::atomic<SaSelectionObjectT> clm_sel_obj;
>>>> + bool is_clm_track_started;
>>>> bool fully_initialized;
>>>> bool swap_switch; /* true - In middle of role switch. */
>>>> diff --git a/osaf/services/saf/amf/amfd/role.cc
>>>> b/osaf/services/saf/amf/amfd/role.cc
>>>> --- a/osaf/services/saf/amf/amfd/role.cc
>>>> +++ b/osaf/services/saf/amf/amfd/role.cc
>>>> @@ -1050,9 +1050,7 @@ uint32_t amfd_switch_actv_qsd(AVD_CL_CB
>>>> /* Mark AVD as Quiesced. */
>>>> cb->avail_state_avd = SA_AMF_HA_QUIESCED;
>>>>
>>>> - if (avd_clm_track_stop() != SA_AIS_OK) {
>>>> - LOG_ER("ClmTrack stop failed");
>>>> - }
>>>> + avd_clm_track_stop();
>>>> /* Go ahead and set mds role as already the NCS SU has been
>>>> switched */
>>>> if (NCSCC_RC_SUCCESS != (rc = avd_mds_set_vdest_role(cb,
>>>> SA_AMF_HA_QUIESCED))) {
>>>> @@ -1260,11 +1258,13 @@ uint32_t amfd_switch_stdby_actv(AVD_CL_C
>>>> if (NCSCC_RC_SUCCESS != avd_rde_set_role(SA_AMF_HA_ACTIVE)) {
>>>> LOG_ER("rde role change failed from stdy -> Active");
>>>> }
>>>> -
>>>> - if(avd_clm_track_start() != SA_AIS_OK) {
>>>> - LOG_ER("Switch Standby --> Active, clm track start failed");
>>>> - avd_d2d_chg_role_rsp(cb, NCSCC_RC_FAILURE, SA_AMF_HA_ACTIVE);
>>>> - return NCSCC_RC_FAILURE;
>>>> + // reuse clm track start
>>>> + if (avd_cb->is_clm_track_started == false) {
>>>> + if(avd_clm_track_start() != SA_AIS_OK) {
>>>> + LOG_ER("Switch Standby --> Active, clm track start
>>>> failed");
>>>> + avd_d2d_chg_role_rsp(cb, NCSCC_RC_FAILURE,
>>>> SA_AMF_HA_ACTIVE);
>>>> + return NCSCC_RC_FAILURE;
>>>> + }
>>>> }
>>>> /* Send the message to other avd for role change rsp as
>>>> success */
>>>>
>>>
>>
>
------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel