Hi Minh, Please see response inline with [Praveen] Thanks, Praveen
On 31-Aug-16 11:19 AM, minh chau wrote: > Hi Praveen > > I think @admin_ng needs to be restore as well as > @ng_using_saAmfSGAdminState, since just found that @admin_ng check is > required in avd_node_down_appl_susi_failover(), which happens if node > having pending csi callback reboot. [Praveen] Inside avd_node_down_appl_susi_failover(), reason of calling process_su_si_response_for_ng() in non-headless case is : -If this happens to be the last node in the nodegroup where node group operation was going on, then migrate node/NG from SHUTTING_DOWN to locked,clear ng_using_saAmfSGAdminState and respond to IMM because we will not get any su_si_assign() event to trigger it agian. -Even if this is not the last node, then also this nodes should move from SHUTTING_DOWN to locked state again because there will not be any further trigger (when whole SG is mapped in NG). Can we think of a possibility of making it OR with @ng_using_saAmfSGAdminState here also in avd_node_down_appl_susi_failover() ? A)With this small fix in headless case: After headless if again some node faults then AMFD will still be calling process_su_si_response_for_ng() for whole SG mapped case. But what about other case when whole SG is not mapped.In this case node_fail_su_oper() would have marked atleast node from SHUTTING_DOWN to LOCKED state. Then how to mark NG from SHUTTING_DOWN to LOCKED because admin_ng is NULL and ng_using_saAMFSGAdminState is not set for this case. I think this can be done deductively using headless state vairable and other facts like node->admin_node_pend_cbk.invocation and ng->admin_ng_pend_cbk.admin_oper after headless to cross verify whether this is really the context of admin operation before headless or after headless. Based on this deduction, we need to mark node from SHUTING_DON to LOCKED. b)With this small fix in non-headless case: There should not be any impact because admin_ng is not NULL and AMFD was already making a call. Thanks, Praveen > I will use SG_FSM_ADMIN to differentiate cases 1 and 2. > > Thanks, > Minh > On 29/08/16 16:25, praveen malviya wrote: >> Hi Minh, >> >> Please see inline with [Praveen] >> >> Thanks, >> Praveen >> >> On 29-Aug-16 5:57 AM, minh chau wrote: >>> Hi Praveen, >>> >>> Thanks for looking through the patch. >>> The potential problem of restoring nodegroup because nodegroup allows to >>> be created in LOCKED while the SUs are having assignment, this could >>> cause an ambiguity for AMFD after headless. For example: >>> Suppose having SU4 hosted on PL4, SU5 hosted on PL5, SU4 has active >>> assignment, SU5 has standby assignment. >>> case 1: Create nodegroup (PL4 + PL5) with LOCKED, lock PL5, lock PL4, >>> delay quiesced csi cbk, stop SC, restart SC. >> [Praveen] In this case, after headless state SG fsm will not be in >> SG_ADMIN state because payload are being locked one by one. So in this >> case it is distinguishable that it is a not a NG operation case >> as SG is not in SG_ADMIN state even though SG is fully assigned in NG. >>> case 2: Create nodegroup (PL4 + PL5) with LOCKED, lock nodegroup, delay >>> quiesced csi cbk, stop SC, restart SC. >> [Praveen] In this case we have following information after headless >> state: >> -SG is in SG_ADMIN state. >> -NG is in SHUTTING_DOWN or LOCKED state. >> -Nodes in SHUTTING_DOWN or LOCKED state. >> -SG FSM remains in SG_ADMIN state only in case of admin operation >> on SG. But after headless SG is not found in UNLOCKED state and one NG >> is found in LOCKED/SHUTTING down state and its nodes. >> I think with above information, AMFD can set >> @ng_using_saAmfSGAdminState and set SG admin state to SHUTTING_DOWN or >> LOCKED. Restoring admin_ng is not required as in su_si_assign(), there >> is an OR condition between @ng_using_saAmfSGAdminState and @admin_ng >> for calling process_su_si_response_for_ng(). Also checks on admin_ng >> is used only for updating counters related to completion of admin >> opeations which is not required after headless. >> >>> >>> if case 2 actually happened before headless, then @admin_ng and >>> @ng_using_saAmfSGAdminState needs to be restored, otherwise >>> process_su_si_response_for_ng() won't be called and saAmfSGAdminState >>> remains LOCKED and SG is still not STABLE state. >>> >>> But in both cases, after headless, AMFD sees all PLs are LOCKED, >>> nodegroup is LOCKED, SU4 has pending quiesced csi cbk, thus they are >>> running into the same code flow. In case 1, @admin_ng and >>> @ng_using_saAmfSGAdminState should not bet set since case1 was not >>> nodegroup operation before headless. >>> I have run a test of both cases, they are working with the patch >>> attached in ticket, but it still looks a potential problem since all >>> cases are not transparent to AMFD after headless, the @admin_ng and >>> @ng_using_asAmfSGAdminState maybe get hit in some points in case 1 >> [Praveen] Besides above cases, there remains only one case: when >> operation was initiated on NG and SG is partially mapped in NG. In >> this case, after headless state we can get only two states of SG >> either SU_OPER or SG_REALIGN. In both the cases I think we do not >> require to restore @ng_using_saAmfSGAdminState and @admin_ng because >> we do not require to enter in process_su_si_response_for_ng().In >> sg_2n_fsm, it marks Node from SHUTTING_DOWN to locked in >> susi_success_su_oper(). >> >> Have I missed any other case? >> >>> >>> If case 1 looks ok to you from nodegroup point of view, then I will >>> float the patch for review. >>> >>> Thanks, >>> Minh >>> >>> >>> On 26/08/16 16:08, praveen malviya wrote: >>>> Hi, >>>> >>>> I have gone through amfd traces. Also patch for NG seems to be ok but >>>> some minor can be done. >>>> >>>> As pointed by Minh, when whole SG is mapped in NG (say case a), AMFD >>>> uses SG_ADMIN flow and SG admin state without exposing it to the user >>>> through IMM for 2N model. In the other case when only one SU is >>>> assigned in NG (say case b) there should not be any problem because >>>> operation fully depends on NG admin state. Since other case b) does >>>> not use SG admin state and ng_using_saAMfSGAdminState, it should work >>>> fine. >>>> >>>> I think we can take the help of following facts and functions to >>>> improve the patch and with that restoring ng_using_saAmfSGAdminState >>>> from IMM may not be required: >>>> 1)In normal cluster, if controller switchover/failover happens when NG >>>> operation is going on then standby controller continues admin >>>> operation with information that it gets through CKPT updates in >>>> dec_sg_admin_state() and dec_ng_admin_state(). Active controller never >>>> checkpoints ng_using_saAmfSGAdminState and deduce it in these >>>> functions.The situation after headless is almost like that. >>>> I think, in case a when shutdown operation is going on, admin >>>> state of NG is still SHUTTING_DOWN and system becomes headless, >>>> requires more params and not the lock operation. In shutdown >>>> operation, AMFD has to ensure transition of NG and Nodes to LOCKED >>>> state. >>>> >>>> 2)Like controller fail-over/switch-over after headless also, we are >>>> not bound to reply to IMM for admin operation completion. So we need >>>> to analyse if we require to restore node->admin_ng. Half of the code >>>> in process_su_si_response_for_ng() is for tracking the state of admin >>>> operation so that AMFD replies to IMM for admin operation and this is >>>> not required after headless state. >>>> >>>> I think problem is not that much complex as it is valid for only 2N >>>> models and only in case a). >>>> >>>> Thanks, >>>> Praveen >>>> >>>> On 25-Aug-16 6:38 PM, minh chau wrote: >>>>> Hi, >>>>> >>>>> The test failed because two reasons: >>>>> 1. There are two places that nodegroup operation borrows 2N SG FSM, >>>>> but >>>>> the AdminState of SG is not stored to IMM >>>>> saAmfSGAdminState = ng->saAmfNGAdminState; >>>>> ... >>>>> su->sg_of_su->saAmfSGAdminState = SA_AMF_ADMIN_UNLOCKED; >>>>> >>>>> This setting needs to be called by AVD_SG::set_admin_state() >>>>> >>>>> 2. After receives su_si assignment response after headless, @admin_ng, >>>>> @ng_using_saAmfSGAdminState have not been restored. >>>>> They need to be restored by somehow. Since nodegroup allows to be >>>>> created at any adminState. So there should be the case nodegroup's >>>>> AdminState is created with LOCKED but the belonging SUs are still >>>>> having >>>>> assignment, so adminState of nodegroup can't be used. >>>>> The admin_ng, ng_using_saAmfSGAdminState seem need to be stored to >>>>> IMM? >>>>> @Praveen: any suggestions? >>>>> >>>>> } else if ((su->sg_of_su->sg_ncs_spec == false) && >>>>> ((su->su_on_node->admin_ng != nullptr) || >>>>> (su->sg_of_su->ng_using_saAmfSGAdminState == true))) { >>>>> AVD_AMF_NG *ng = su->su_on_node->admin_ng; >>>>> //Got response from AMFND for assignments decrement >>>>> su_cnt_admin_oper. >>>>> if ((ng != nullptr) && >>>>> (((((ng->admin_ng_pend_cbk.admin_oper == >>>>> SA_AMF_ADMIN_SHUTDOWN) || >>>>> (ng->admin_ng_pend_cbk.admin_oper == >>>>> SA_AMF_ADMIN_LOCK)) && >>>>> (su->saAmfSUNumCurrActiveSIs == 0) && >>>>> (su->saAmfSUNumCurrStandbySIs == 0) && >>>>> (AVSV_SUSI_ACT_DEL == >>>>> n2d_msg->msg_info.n2d_su_si_assign.msg_act))) || >>>>> (ng->admin_ng_pend_cbk.admin_oper == >>>>> SA_AMF_ADMIN_UNLOCK))) { >>>>> su->su_on_node->su_cnt_admin_oper--; >>>>> TRACE("node:'%s', su_cnt_admin_oper:%u", >>>>> su->su_on_node->name.c_str(),su->su_on_node->su_cnt_admin_oper); >>>>> } >>>>> process_su_si_response_for_ng(su, SA_AIS_OK); >>>>> >>>>> On 25/08/16 21:36, Nagendra Kumar wrote: >>>>>> Further testing results: >>>>>> Node group lock has resulted in SG unstable. Logs and configuration >>>>>> file attached. >>>>>> >>>>>> Configuration : SC-1, PL-3 and PL-4. >>>>>> >>>>>> Steps: >>>>>> >>>>>> 1. Unlock SU1(on PL-3), SU2 and SU3 (Both on PL-4). >>>>>> 2. Create node group of PL-3 and PL-4: >>>>>> 3. Lock the node group. >>>>>> amf-adm lock safAmfNodeGroup=nagu,safAmfCluster=myAmfCluster >>>>>> 4. Keep gdb in csi set callback, stop SC-1 and start respond OK from >>>>>> csi set callback and start SC-1. >>>>>> >>>>>> SG becomes unstable if you try to unlock the Node group: >>>>>> Aug 25 16:57:06 PM_SC-1 osafamfd[2166]: NO >>>>>> 'safSg=AmfDemo_2N,safApp=AmfDemo1' is in unstable/transition state >>>>>> >>>>>> >>>>>> Thanks >>>>>> -Nagu >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Nagendra Kumar >>>>>>> Sent: 24 August 2016 16:58 >>>>>>> To: Minh Hon Chau; hans.nordeb...@ericsson.com; Praveen Malviya; >>>>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>> Subject: Re: [devel] [PATCH 2 of 2] AMFND: Admin operation >>>>>>> continuation if >>>>>>> csi callback completes during headless [#1725 part 1] V1 >>>>>>> >>>>>>> The below is the assignments after the test case (SU2 has standby >>>>>>> assignment): >>>>>>> >>>>>>> PM_SC-1:/home/nagu/views/staging-1725 # /etc/init.d/opensafd status >>>>>>> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >>>>>>> mo1,safApp=AmfDemo1 >>>>>>> saAmfSISUHAState=STANDBY(2) >>>>>>> safSISU=safSu=PL- >>>>>>> 3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> safSISU=safSu=PL- >>>>>>> 4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> safSISU=safSu=SC- >>>>>>> 2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> safSISU=safSu=SC- >>>>>>> 1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>>>>>> 2N,safApp=OpenSAF >>>>>>> saAmfSISUHAState=STANDBY(2) >>>>>>> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>>>>>> 2N,safApp=OpenSAF >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> >>>>>>> Thanks >>>>>>> -Nagu >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Nagendra Kumar >>>>>>>> Sent: 24 August 2016 16:55 >>>>>>>> To: Minh Hon Chau; hans.nordeb...@ericsson.com; Praveen Malviya; >>>>>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >>>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>>> Subject: Re: [devel] [PATCH 2 of 2] AMFND: Admin operation >>>>>>>> continuation if csi callback completes during headless [#1725 >>>>>>>> part 1] >>>>>>>> V1 >>>>>>>> >>>>>>>> Hi Minh, >>>>>>>> With 1725_phase_1_V2.tgz, the below email TC has failed. Please >>>>>>> find >>>>>>>> the traces attached along with the configuration in the ticket. >>>>>>>> >>>>>>>> Thanks >>>>>>>> -Nagu >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Nagendra Kumar >>>>>>>>> Sent: 23 August 2016 15:15 >>>>>>>>> To: Minh Hon Chau; hans.nordeb...@ericsson.com; Praveen Malviya; >>>>>>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >>>>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>>>> Subject: Re: [devel] [PATCH 2 of 2] AMFND: Admin operation >>>>>>>>> continuation if csi callback completes during headless [#1725 part >>>>>>>>> 1] >>>>>>>>> V1 >>>>>>>>> >>>>>>>>> Hi Minh, >>>>>>>>> The following SU lock case is not working. This issue will >>>>>>>>> exist >>>>>>>>> for all the flows, so please check. >>>>>>>>> >>>>>>>>> Configuration and traces attached in the ticket. >>>>>>>>> >>>>>>>>> Steps: >>>>>>>>> 1. Start SC-1, SC-2, PL-3 and PL-4. Run the following command: >>>>>>>>> immcfg -f /tmp/AppConfig-2N-1725.xml amf-adm unlock-in >>>>>>>>> safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1 >>>>>>>>> amf-adm unlock-in safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1 >>>>>>>>> amf-adm unlock-in safSu=SU3,safSg=AmfDemo_2N,safApp=AmfDemo1 >>>>>>>>> amf-adm unlock safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1 >>>>>>>>> amf-adm unlock safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1 >>>>>>>>> amf-adm unlock safSu=SU3,safSg=AmfDemo_2N,safApp=AmfDemo1 >>>>>>>>> >>>>>>>>> Assignments are: >>>>>>>>> PM_SC-1:/home/nagu/views/staging-1725 # /etc/init.d/opensafd >>>>>>>>> status >>>>>>>>> safSISU=safSu=SC- >>>>>>>>> 1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF >>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>>>>>>>> 2N,safApp=OpenSAF >>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>> safSISU=safSu=SC- >>>>>>>>> 2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF >>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>> safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>>>>>>>> 2N,safApp=OpenSAF >>>>>>>>> saAmfSISUHAState=STANDBY(2) >>>>>>>>> safSISU=safSu=PL- >>>>>>>>> 4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF >>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>> safSISU=safSu=PL- >>>>>>>>> 3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF >>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>> >>>>>>> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >>>>>>>>> mo1,safApp=AmfDemo1 >>>>>>>>> saAmfSISUHAState=STANDBY(2) >>>>>>>>> >>>>>>> safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >>>>>>>>> mo1,safApp=AmfDemo1 >>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>> >>>>>>>>> 2. Issue lock on SU1. >>>>>>>>> amf-adm lock safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1 >>>>>>>>> And keep gdb in csi_set callback. Stop SC-1 and SC-2. >>>>>>>>> Send Ok from csi_set callback. >>>>>>>>> >>>>>>>>> 3. Start SC-1 and SC-2. >>>>>>>>> >>>>>>>>> 4. Assignment to components of SU2 is not given and assignments of >>>>>>>>> SU2 still shows Standby. >>>>>>>>> PM_SC-1:/home/nagu/views/staging-1725 # /etc/init.d/opensafd >>>>>>>>> status >>>>>>>>> >>>>>>> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >>>>>>>>> mo1,safApp=AmfDemo1 >>>>>>>>> saAmfSISUHAState=STANDBY(2) >>>>>>>>> safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>>>>>>>> 2N,safApp=OpenSAF >>>>>>>>> saAmfSISUHAState=STANDBY(2) >>>>>>>>> safSISU=safSu=SC- >>>>>>>>> 1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF >>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>> safSISU=safSu=PL- >>>>>>>>> 4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF >>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>> safSISU=safSu=PL- >>>>>>>>> 3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF >>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>> safSISU=safSu=SC- >>>>>>>>> 2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF >>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>>>>>>>> 2N,safApp=OpenSAF >>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> -Nagu >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] >>>>>>>>>> Sent: 05 August 2016 02:50 >>>>>>>>>> To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen Malviya; >>>>>>>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au; >>>>>>>>>> minh.c...@dektech.com.au >>>>>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>>>>> Subject: [PATCH 2 of 2] AMFND: Admin operation continuation if >>>>>>>>>> csi >>>>>>>>> callback >>>>>>>>>> completes during headless [#1725 part 1] V1 >>>>>>>>>> >>>>>>>>>> osaf/services/saf/amf/amfnd/di.cc | 199 >>>>>>>>>> +++++++++++++++++- >>>>>>> --- >>>>>>>> -- >>>>>>>>> -- >>>>>>>>>> osaf/services/saf/amf/amfnd/include/avnd_di.h | 1 + >>>>>>>>>> 2 files changed, 134 insertions(+), 66 deletions(-) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The patch buffers susi_resp_msg during headless stage and resend >>>>>>>>>> it to AMFD after headless. >>>>>>>>>> >>>>>>>>>> diff --git a/osaf/services/saf/amf/amfnd/di.cc >>>>>>>>>> b/osaf/services/saf/amf/amfnd/di.cc >>>>>>>>>> --- a/osaf/services/saf/amf/amfnd/di.cc >>>>>>>>>> +++ b/osaf/services/saf/amf/amfnd/di.cc >>>>>>>>>> @@ -804,11 +804,6 @@ uint32_t avnd_di_susi_resp_send(AVND_CB >>>>>>>>>> if (cb->term_state == >>>>>>>>>> AVND_TERM_STATE_OPENSAF_SHUTDOWN_STARTED) >>>>>>>>>> return rc; >>>>>>>>>> >>>>>>>>>> - if (cb->is_avd_down == true) { >>>>>>>>>> - m_AVND_SU_ALL_SI_RESET(su); >>>>>>>>>> - return rc; >>>>>>>>>> - } >>>>>>>>>> - >>>>>>>>>> // should be in assignment pending state to be here >>>>>>>>>> osafassert(m_AVND_SU_IS_ASSIGN_PEND(su)); >>>>>>>>>> >>>>>>>>>> @@ -819,64 +814,76 @@ uint32_t avnd_di_susi_resp_send(AVND_CB >>>>>>>>>> TRACE_ENTER2("Sending Resp su=%s, si=%s, curr_state=%u, >>>>>>>>>> prv_state=%u", su->name.value, curr_si->name.value,curr_si- >>>>>>>>>>> curr_state,curr_si->prv_state); >>>>>>>>>> /* populate the susi resp msg */ >>>>>>>>>> msg.info.avd = new AVSV_DND_MSG(); >>>>>>>>>> - msg.type = AVND_MSG_AVD; >>>>>>>>>> - msg.info.avd->msg_type = AVSV_N2D_INFO_SU_SI_ASSIGN_MSG; >>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_id = ++(cb- >>>>>>>>>>> snd_msg_id); >>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.node_id = cb- >>>>>>>>>>> node_info.nodeId; >>>>>>>>>> - if (si) { >>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.single_csi = >>>>>>>>>> - ((si->single_csi_add_rem_in_si == >>>>>>>>>> AVSV_SUSI_ACT_BASE) >>>>>>> ? >>>>>>>>>> false : true); >>>>>>>>>> - } >>>>>>>>>> - TRACE("curr_assign_state '%u'", >>>>>>>>>> curr_si->curr_assign_state); >>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_act = >>>>>>>>>> - (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) >>>>>>>> || >>>>>>>>>> - >>>>>>> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNING(curr_si)) >>>>>>>> ? >>>>>>>>>> - ((!curr_si->prv_state) ? AVSV_SUSI_ACT_ASGN : >>>>>>>>>> AVSV_SUSI_ACT_MOD) : AVSV_SUSI_ACT_DEL; >>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.su_name = su->name; >>>>>>>>>> - if (si) { >>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.si_name = >>>>>>>>>> si- >>>>>>>> name; >>>>>>>>>> - if (AVSV_SUSI_ACT_ASGN == >>>>>>>>>> si->single_csi_add_rem_in_si) { >>>>>>>>>> - TRACE("si->curr_assign_state '%u'", curr_si- >>>>>>>>>>> curr_assign_state); >>>>>>>>>> - >>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.msg_act = >>>>>>>>>> - >>>>>>>>>> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) || >>>>>>>>>> - >>>>>>>>>> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNING(curr_si)) ? >>>>>>>>>> - AVSV_SUSI_ACT_ASGN : >>>>>>>>>> AVSV_SUSI_ACT_DEL; >>>>>>>>>> - } >>>>>>>>>> - } >>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.ha_state = >>>>>>>>>> - (SA_AMF_HA_QUIESCING == curr_si->curr_state) ? >>>>>>>>>> SA_AMF_HA_QUIESCED : curr_si->curr_state; >>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.error = >>>>>>>>>> - (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) >>>>>>>> || >>>>>>>>>> - >>>>>>> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_REMOVED(curr_si)) >>>>>>>> ? >>>>>>>>>> NCSCC_RC_SUCCESS : NCSCC_RC_FAILURE; >>>>>>>>>> + msg.type = AVND_MSG_AVD; >>>>>>>>>> + msg.info.avd->msg_type = AVSV_N2D_INFO_SU_SI_ASSIGN_MSG; >>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.node_id = cb- >>>>>>>>>>> node_info.nodeId; >>>>>>>>>> + if (si) { >>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.single_csi = >>>>>>>>>> + ((si->single_csi_add_rem_in_si == >>>>>>>>>> AVSV_SUSI_ACT_BASE) ? false : true); >>>>>>>>>> + } >>>>>>>>>> + TRACE("curr_assign_state '%u'", curr_si->curr_assign_state); >>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_act = >>>>>>>>>> + >>>>>>>>>> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) || >>>>>>>>>> + >>>>>>>>>> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNING(curr_si)) ? >>>>>>>>>> + ((!curr_si->prv_state) ? >>>>>>>>>> AVSV_SUSI_ACT_ASGN : AVSV_SUSI_ACT_MOD) : AVSV_SUSI_ACT_DEL; >>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.su_name = su->name; >>>>>>>>>> + if (si) { >>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.si_name = si- >>>>>>>>>>> name; >>>>>>>>>> + if (AVSV_SUSI_ACT_ASGN == >>>>>>>>>> si->single_csi_add_rem_in_si) { >>>>>>>>>> + TRACE("si->curr_assign_state '%u'", curr_si- >>>>>>>>>>> curr_assign_state); >>>>>>>>>> + msg.info.avd- >>>>>>>>>>> msg_info.n2d_su_si_assign.msg_act = >>>>>>>>>> + >>>>>>>>>> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) || >>>>>>>>>> + >>>>>>>>>> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNING(curr_si)) ? >>>>>>>>>> + AVSV_SUSI_ACT_ASGN : >>>>>>>>>> AVSV_SUSI_ACT_DEL; >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.ha_state = >>>>>>>>>> + (SA_AMF_HA_QUIESCING == curr_si->curr_state) ? >>>>>>>>>> SA_AMF_HA_QUIESCED : curr_si->curr_state; >>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.error = >>>>>>>>>> + >>>>>>>>>> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) || >>>>>>>>>> + >>>>>>>>>> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_REMOVED(curr_si)) ? >>>>>>>>>> +NCSCC_RC_SUCCESS : NCSCC_RC_FAILURE; >>>>>>>>>> >>>>>>>>>> - if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act == >>>>>>>>>> AVSV_SUSI_ACT_ASGN) >>>>>>>>>> - osafassert(si); >>>>>>>>>> + if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act == >>>>>>>>>> AVSV_SUSI_ACT_ASGN) >>>>>>>>>> + osafassert(si); >>>>>>>>>> >>>>>>>>>> - /* send the msg to AvD */ >>>>>>>>>> - TRACE("Sending. msg_id'%u', node_id'%u', msg_act'%u', >>>>>>>>>> su'%s', >>>>>>>>> si'%s', >>>>>>>>>> ha_state'%u', error'%u', single_csi'%u'", >>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_id, >>>>>>>> msg.info.avd- >>>>>>>>>>> msg_info.n2d_su_si_assign.node_id, >>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_act, >>>>>>>>> msg.info.avd- >>>>>>>>>>> msg_info.n2d_su_si_assign.su_name.value, >>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.si_name.value, >>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.ha_state, >>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.error, >>>>>>> msg.info.avd- >>>>>>>>>>> msg_info.n2d_su_si_assign.single_csi); >>>>>>>>>> + /* send the msg to AvD */ >>>>>>>>>> + TRACE("Sending. msg_id'%u', node_id'%u', msg_act'%u', >>>>>>>>>> su'%s', >>>>>>>>>> si'%s', ha_state'%u', error'%u', single_csi'%u'", >>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_id, >>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.node_id, >>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_act, >>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.su_name.value, >>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.si_name.value, >>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.ha_state, >>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.error, >>>>>>>>>> +msg.info.avd->msg_info.n2d_su_si_assign.single_csi); >>>>>>>>>> >>>>>>>>>> - if ((su->si_list.n_nodes > 1) && (si == nullptr)) { >>>>>>>>>> - if >>>>>>>>>> (msg.info.avd->msg_info.n2d_su_si_assign.msg_act == >>>>>>>>>> AVSV_SUSI_ACT_DEL) >>>>>>>>>> - LOG_NO("Removed 'all SIs' from '%s'", >>>>>>>>>> su->name.value); >>>>>>>>>> + if ((su->si_list.n_nodes > 1) && (si == nullptr)) { >>>>>>>>>> + if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act == >>>>>>>>>> AVSV_SUSI_ACT_DEL) >>>>>>>>>> + LOG_NO("Removed 'all SIs' from '%s'", su- >>>>>>>>>>> name.value); >>>>>>>>>> - if >>>>>>>>>> (msg.info.avd->msg_info.n2d_su_si_assign.msg_act == >>>>>>>>>> AVSV_SUSI_ACT_MOD) >>>>>>>>>> - LOG_NO("Assigned 'all SIs' %s of '%s'", >>>>>>>>>> - ha_state[msg.info.avd- >>>>>>>>>>> msg_info.n2d_su_si_assign.ha_state], >>>>>>>>>> - su->name.value); >>>>>>>>>> - } >>>>>>>>>> + if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act == >>>>>>>>>> AVSV_SUSI_ACT_MOD) >>>>>>>>>> + LOG_NO("Assigned 'all SIs' %s of '%s'", >>>>>>>>>> + ha_state[msg.info.avd- >>>>>>>>>>> msg_info.n2d_su_si_assign.ha_state], >>>>>>>>>> + su->name.value); >>>>>>>>>> + } >>>>>>>>>> >>>>>>>>>> - rc = avnd_di_msg_send(cb, &msg); >>>>>>>>>> - if (NCSCC_RC_SUCCESS == rc) >>>>>>>>>> - msg.info.avd = 0; >>>>>>>>>> - >>>>>>>>>> - /* we have completed the SU SI msg processing */ >>>>>>>>>> - if (su_assign_state_is_stable(su)) >>>>>>>>>> - m_AVND_SU_ASSIGN_PEND_RESET(su); >>>>>>>>>> - m_AVND_SU_ALL_SI_RESET(su); >>>>>>>>>> + if (cb->is_avd_down == true) { >>>>>>>>>> + // We are in headless, buffer this msg >>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 0; >>>>>>>>>> + if (avnd_diq_rec_add(cb, &msg) == nullptr) { >>>>>>>>>> + rc = NCSCC_RC_FAILURE; >>>>>>>>>> + } >>>>>>>>>> + m_AVND_SU_ALL_SI_RESET(su); >>>>>>>>>> + LOG_NO("avnd_di_susi_resp_send() deferred as AMF >>>>>>>>>> director is offline"); >>>>>>>>>> + } else { >>>>>>>>>> + // We are in normal cluster, send msg to director >>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_id = ++(cb- >>>>>>>>>>> snd_msg_id); >>>>>>>>>> + /* send the msg to AvD */ >>>>>>>>>> + rc = avnd_di_msg_send(cb, &msg); >>>>>>>>>> + if (NCSCC_RC_SUCCESS == rc) >>>>>>>>>> + msg.info.avd = 0; >>>>>>>>>> + /* we have completed the SU SI msg processing */ >>>>>>>>>> + if (su_assign_state_is_stable(su)) { >>>>>>>>>> + m_AVND_SU_ASSIGN_PEND_RESET(su); >>>>>>>>>> + } >>>>>>>>>> + m_AVND_SU_ALL_SI_RESET(su); >>>>>>>>>> + } >>>>>>>>>> >>>>>>>>>> /* free the contents of avnd message */ >>>>>>>>>> avnd_msg_content_free(cb, &msg); @@ -1255,14 +1262,7 @@ >>>>>>>>>> void >>>>>>>>>> avnd_diq_rec_del(AVND_CB *cb, AVND_ >>>>>>>>>> /* stop the AvD msg response timer */ >>>>>>>>>> if (m_AVND_TMR_IS_ACTIVE(rec->resp_tmr)) { >>>>>>>>>> m_AVND_TMR_MSG_RESP_STOP(cb, *rec); >>>>>>>>>> - // Resend msgs from queue because amfd dropped during >>>>>>>>>> sync >>>>>>>>>> - if ((cb->dnd_list.head != nullptr)) { >>>>>>>>>> - TRACE("retransmit message to amfd"); >>>>>>>>>> - AVND_DND_MSG_LIST *pending_rec = 0; >>>>>>>>>> - for (pending_rec = cb->dnd_list.head; pending_rec != >>>>>>>>>> nullptr; pending_rec = pending_rec->next) { >>>>>>>>>> - avnd_diq_rec_send(cb, pending_rec); >>>>>>>>>> - } >>>>>>>>>> - } >>>>>>>>>> + avnd_diq_rec_send_buffered_msg(cb); >>>>>>>>>> /* resend pg start track */ >>>>>>>>>> avnd_di_resend_pg_start_track(cb); >>>>>>>>>> } >>>>>>>>>> @@ -1275,6 +1275,73 @@ void avnd_diq_rec_del(AVND_CB *cb, >>>>>>> AVND_ >>>>>>>>>> TRACE_LEAVE(); >>>>>>>>>> return; >>>>>>>>>> } >>>>>>>>>> >>>>>>> +/************************************************************ >>>>>>>>>> **************** >>>>>>>>>> + Name : avnd_diq_rec_send_buffered_msg >>>>>>>>>> + >>>>>>>>>> + Description : Resend buffered msg >>>>>>>>>> + >>>>>>>>>> + Arguments : cb - ptr to the AvND control block >>>>>>>>>> + >>>>>>>>>> + Return Values : None. >>>>>>>>>> + >>>>>>>>>> + Notes : None. >>>>>>>>>> >>>>>>> +************************************************************* >>>>>>>>>> ********** >>>>>>>>>> +*******/ void avnd_diq_rec_send_buffered_msg(AVND_CB *cb) { >>>>>>>>>> + TRACE_ENTER(); >>>>>>>>>> + // Resend msgs from queue because amfnd dropped during >>>>>>>>>> headless >>>>>>>>>> + // or headless-synchronization >>>>>>>>>> + if ((cb->dnd_list.head != nullptr)) { >>>>>>>>>> + AVND_DND_MSG_LIST *pending_rec = 0; >>>>>>>>>> + TRACE("Attach msg_id of buffered msg"); >>>>>>>>>> + bool found = true; >>>>>>>>>> + while (found) { >>>>>>>>>> + found = false; >>>>>>>>>> + for (pending_rec = cb->dnd_list.head; pending_rec != >>>>>>>>>> nullptr; pending_rec = pending_rec->next) { >>>>>>>>>> + if (pending_rec->msg.type == >>>>>>>>>> AVND_MSG_AVD) { >>>>>>>>>> + // At this moment, only oper_state >>>>>>>>>> msg needs to report to director >>>>>>>>>> + if (pending_rec->msg.info.avd- >>>>>>>>>>> msg_type == AVSV_N2D_INFO_SU_SI_ASSIGN_MSG && >>>>>>>>>> + pending_rec->msg.info.avd- >>>>>>>>>>> msg_info.n2d_su_si_assign.msg_id == 0) { >>>>>>>>>> + m_AVND_DIQ_REC_POP(cb, >>>>>>>>>> pending_rec); #if 0 >>>>>>>>>> + // only resend if this SUSI >>>>>>>>>> does exist >>>>>>>>>> + AVND_SU *su = >>>>>>>>>> m_AVND_SUDB_REC_GET(cb->sudb, >>>>>>>>>> + pending_rec- >>>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.su_name); >>>>>>>>>> + if (su != nullptr && su- >>>>>>>>>>> si_list.n_nodes > 0) { #endif >>>>>>>>>> + pending_rec- >>>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.msg_id = >>>>>>>>>>> ++(cb->snd_msg_id); >>>>>>>>>> + >>>>>>>>>> m_AVND_DIQ_REC_PUSH(cb, pending_rec); >>>>>>>>>> + LOG_NO("Found and >>>>>>>>>> resend buffered su_si_assign msg for SU:'%s', " >>>>>>>>>> + >>>>>>>>>> "SI:'%s', ha_state:'%u', msg_act:'%u', single_csi:'%u', " >>>>>>>>>> + >>>>>>>>>> "error:'%u', msg_id:'%u'", >>>>>>>>>> + >>>>>>>>>> pending_rec->msg.info.avd- >>>>>>>>>>> msg_info.n2d_su_si_assign.su_name.value, >>>>>>>>>> + >>>>>>>>>> pending_rec->msg.info.avd- >>>>>>>>>>> msg_info.n2d_su_si_assign.si_name.value, >>>>>>>>>> + >>>>>>>>>> pending_rec->msg.info.avd->msg_info.n2d_su_si_assign.ha_state, >>>>>>>>>> + >>>>>>>>>> pending_rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_act, >>>>>>>>>> + >>>>>>>>>> pending_rec->msg.info.avd->msg_info.n2d_su_si_assign.single_csi, >>>>>>>>>> + >>>>>>>>>> pending_rec->msg.info.avd->msg_info.n2d_su_si_assign.error, >>>>>>>>>> + >>>>>>>>>> pending_rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id); >>>>>>>>>> + >>>>>>>>>> +#if 0 >>>>>>>>>> + } else { >>>>>>>>>> + >>>>>>>>>> avnd_msg_content_free(cb, &pending_rec->msg); >>>>>>>>>> + delete pending_rec; >>>>>>>>>> + pending_rec = cb- >>>>>>>>>>> dnd_list.head; >>>>>>>>>> + } >>>>>>>>>> +#endif >>>>>>>>>> + found = true; >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> + TRACE("retransmit message to amfd"); >>>>>>>>>> + for (pending_rec = cb->dnd_list.head; pending_rec != >>>>>>>>>> nullptr; >>>>>>>>>> pending_rec = pending_rec->next) { >>>>>>>>>> + avnd_diq_rec_send(cb, pending_rec); >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> + TRACE_LEAVE(); >>>>>>>>>> + return; >>>>>>>>>> +} >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>> /************************************************************* >>>>>>>>>> *************** >>>>>>>>>> Name : avnd_diq_rec_send >>>>>>>>>> diff --git a/osaf/services/saf/amf/amfnd/include/avnd_di.h >>>>>>>>>> b/osaf/services/saf/amf/amfnd/include/avnd_di.h >>>>>>>>>> --- a/osaf/services/saf/amf/amfnd/include/avnd_di.h >>>>>>>>>> +++ b/osaf/services/saf/amf/amfnd/include/avnd_di.h >>>>>>>>>> @@ -79,6 +79,7 @@ void avnd_di_msg_ack_process(struct avnd void >>>>>>>>>> avnd_diq_del(struct avnd_cb_tag *); AVND_DND_MSG_LIST >>>>>>>>>> *avnd_diq_rec_add(struct avnd_cb_tag *cb, AVND_MSG *msg); void >>>>>>>>>> avnd_diq_rec_del(struct avnd_cb_tag *cb, AVND_DND_MSG_LIST >>>>>>> *rec); >>>>>>>>>> +void avnd_diq_rec_send_buffered_msg(struct avnd_cb_tag *cb); >>>>>>>>>> uint32_t avnd_diq_rec_send(struct avnd_cb_tag *cb, >>>>>>>>> AVND_DND_MSG_LIST >>>>>>>>>> *rec); uint32_t avnd_di_reg_su_rsp_snd(struct avnd_cb_tag *cb, >>>>>>>>>> SaNameT *su_name, uint32_t ret_code); uint32_t >>>>>>>>> avnd_di_ack_nack_msg_send(struct >>>>>>>>>> avnd_cb_tag *cb, uint32_t rcv_id, uint32_t view_num); >>>>>>>>> -------------------------------------------------------------------- >>>>>>>>> >>>>>>>>> -- >>>>>>>>> -------- _______________________________________________ >>>>>>>>> Opensaf-devel mailing list >>>>>>>>> Opensaf-devel@lists.sourceforge.net >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >>>>>>>> ---------------------------------------------------------------------- >>>>>>>> >>>>>>>> >>>>>>>> -------- _______________________________________________ >>>>>>>> Opensaf-devel mailing list >>>>>>>> Opensaf-devel@lists.sourceforge.net >>>>>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Opensaf-devel mailing list >>>>>>> Opensaf-devel@lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >>>>> >>>> >>> >> > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel