Hi Praveen, I have just attached to ticket the #1725 part 2 that supports fault of node restart/poweroff while headless. Can you help to review it? I will update the patch that adds the logging as you suggested.
Thanks for reviewing, Minh On 24/08/16 16:07, praveen malviya wrote: > Hi Minh, > > Please see responses with [Praveen]. > > > Thanks, > Praveen > > On 23-Aug-16 7:18 PM, minh chau wrote: >> Hi Praveen, >> >> Please let me copy your questions and answer here in email, so it's >> easier we can add comment in line, please see [Minh]. >> >> Thanks, >> Minh >> >> ----------------------------- >> >> Hi Minh, >> I am going through the patches 1725_phase1.tgz. Some initial comments: >> 1) In patch 2 avnd_diq_rec_send_buffered_msg() checks presence of SUSI >> then only it sends buffered message to AMFD. In case removal of >> assignments completes during headless , AMFND deletes the SUSIs in >> su_si_oper_done(). So AMFND will never send the assignment message and >> admin operation will not continue. >> >> [Minh]: If this is the case AMFND deletes all SUSIs during headless, >> then there will not be any assignment to be sent in state_info message >> to AMFD after headless. However, in all admin operations of 2N I have >> been testing, >> the removal assignment sequence is the last step of admin LOCK/SHUTDOWN. >> If AMFND deletes SUSI while headless, that also means the prior steps of >> admin sequence had been done before headless. In this case, that is >> equivalent to a completion of admin operation. >> > [Praveen]Yes, in this case it is not needed because by this time > standby SU has become active. > But in some cases AMFD performs failover/switchover based on removal > of assignments status particularly when fault happens during admin op. > As of now I do not know how to reproduce this scenario without faults > but with faults it is possible. Since patch is not for admin op + > faults ,so it can be left for the future. >> 2) In patch1, I think after headless we will not get any invocation id >> for the admin operation that >> was going on before headless. Since AMF is continuing the admin >> operation we should somehow >> restrict other admin operation to start by setting some magic no for >> invocationid or any other way. >> >> [Minh]: If AMF is continuing the admin operation after headless, the sg >> fsm state should not be STABLE, I think (sg_fsm_state == >> AVD_SG_FSM_STABLE) should be enough to reject new admin operation? >> >> >> 3)If suswitch is in TOGGLED state then I think we should crosscheck that >> there are atleast two SUs >> having assignment. The reason is if this flag remains TOGGOLED and admin >> op does not continue then there is very less probability that if will >> get reset as it is used only in si-swap flow. >> >> [Minh]: Yes I don't particularly like this osafAmfSUSwitch to be written >> to IMM. I had the only test case 144 failed (test list attached to >> ticket) >> Test 144 is: Swap SI, delay csi STANDBY cbk in SU4, stop SCs, restart >> SCs, reboot PL5. And I ran into the code line which requires suswitch >> >> void SG_2N::node_fail_su_oper(AVD_SU *su) { >> >> .... >> /* the SU has standby SI assignments. if the other SUs >> switch field >> * is true, it is in service, having quiesced assigning >> state. >> * Send D2N-INFO_SU_SI_ASSIGN modify active all to the other >> SU. >> * Change switch field to false. Change state to SG_realign. >> * Free all the SI assignments to this SU. >> */ >> if ((su_oper_list_front()->su_switch == >> AVSV_SI_TOGGLE_SWITCH) >> && (su_oper_list_front()->saAmfSuReadinessState == >> SA_AMF_READINESS_IN_SERVICE)) { >> >> I think the *crosscheck* is actually a deduction of @su_switch from >> whatever states that AMFD receives after headless. If *crosscheck* is >> possible thing, then su_switch does not need to be checkpointed at >> standby AMFD also. >> In non-headless, we always need standby AMFD up-to-date all states by >> checkpoint so that if active AMFD has gone, the standby AMFD can take >> over by using these checkpointed states. >> Now in headless, we also have to write these states somewhere (here is >> IMM) so that the new active AMFD can use it. >> It's the best that su_switch is revertible from a set of states, but >> it's not easy to prove it's revertible from all scenarios of 2N si-swap. >> If you think removing osafAmfSUSwitch is really needed, then this needs >> to be looked more thoroughly later I think? >> >> 4)Since assignments are in progress. This could be because of admin >> operation or >> faults. AMFD should call one function here like log_admin_op(). This >> function will search the entity >> that is being under admin operation and log details like: >> -After headless state admin op on '%s' is continuing in syslog. >> -Also traces for susi states which are not assigned. >> >> [Minh]: Agree, some sort of logging like this is good idea, I think it's >> best to introduce this logging in the patch : [PATCH 4 of 4] AMFD: >> Validate headless cached RTA read from IMM [#1725] >> And maybe I need more details of what you would like to log. > [Praveen] I think to start with only name of entity and its admin > state can be logged. >> >> Thanks, >> Praveen >> >> --------------------- >> >> On 23/08/16 21:03, minh chau wrote: >>> Hi Nagu, >>> >>> I see in the trace you provided, the SU2/SU3 become IN_SERVICE late. >>> If there's a delay in PL4 joining cluster after headless in your test >>> then you could also see it in the latest patches (longDN rebased >>> version) >>> I'm looking in to this issue. >>> >>> Thanks. >>> Minh >>> >>> On 23/08/16 20:24, Nagendra Kumar wrote: >>>> Please ignore TC #2, my mistake. >>>> >>>> Thanks >>>> -Nagu >>>> >>>>> -----Original Message----- >>>>> From: Nagendra Kumar >>>>> Sent: 23 August 2016 15:49 >>>>> To: Minh Hon Chau; hans.nordeb...@ericsson.com; Praveen Malviya; >>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>> Subject: RE: [PATCH 2 of 2] AMFND: Admin operation continuation if >>>>> csi >>>>> callback completes during headless [#1725 part 1] V1 >>>>> >>>>> Please consider previous TC as TC #1 >>>>> >>>>> TC #2: Same configuration as TC #1. Logs attached in the ticket TC >>>>> #2. >>>>> >>>>> Steps: >>>>> 1. Same as step #1 of TC #1. >>>>> 2. After locking SU1, keep delay in >>>>> avnd_evt_avd_info_su_si_assign_evh and >>>>> stop SC-1 and SC-2. >>>>> 3. Start SC-1 and SC-2. SU1 is still in quisced state. Ideally, it >>>>> should have no >>>>> assignment and SU3 should have got assignment. >>>>> >>>>> safSISU=safSu=SU3\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >>>>> mo1,safApp=AmfDemo1 >>>>> saAmfSISUHAState=STANDBY(2) >>>>> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >>>>> mo1,safApp=AmfDemo1 >>>>> saAmfSISUHAState=ACTIVE(1) >>>>> safSISU=safSu=PL- >>>>> 4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF >>>>> saAmfSISUHAState=ACTIVE(1) >>>>> safSISU=safSu=SC- >>>>> 1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF >>>>> saAmfSISUHAState=ACTIVE(1) >>>>> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>>>> 2N,safApp=OpenSAF >>>>> saAmfSISUHAState=ACTIVE(1) >>>>> safSISU=safSu=SC- >>>>> 2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF >>>>> saAmfSISUHAState=ACTIVE(1) >>>>> safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>>>> 2N,safApp=OpenSAF >>>>> saAmfSISUHAState=STANDBY(2) >>>>> safSISU=safSu=PL- >>>>> 3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF >>>>> saAmfSISUHAState=ACTIVE(1) >>>>> >>>>> After that PL-3 rebooted by the following logs: >>>>> Aug 23 15:31:52 PM_PL-3 osafamfwd[18056]: TIMEOUT receiving AMF >>>>> health check request, generating core for amfnd Aug 23 15:31:52 >>>>> PM_PL-3 >>>>> osafamfwd[18056]: Last received healthcheck cnt=82 at Tue Aug 23 >>>>> 15:30:52 >>>>> 2016 Aug 23 15:31:52 PM_PL-3 osafamfwd[18056]: Rebooting OpenSAF >>>>> NodeId = 0 EE Name = No EE Mapped, Reason: AMFND unresponsive, >>>>> AMFWDOG initiated system reboot, OwnNodeId = 131855, SupervisionTime >>>>> = 60 Aug 23 15:31:52 PM_PL-3 opensaf_reboot: Rebooting local node; >>>>> timeout=60 >>>>> >>>>> Thanks >>>>> -Nagu >>>>> >>>>>> -----Original Message----- >>>>>> From: Nagendra Kumar >>>>>> Sent: 23 August 2016 15:19 >>>>>> To: Minh Hon Chau; hans.nordeb...@ericsson.com; Praveen Malviya; >>>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>> Subject: RE: [PATCH 2 of 2] AMFND: Admin operation continuation >>>>>> if csi >>>>>> callback completes during headless [#1725 part 1] V1 >>>>>> >>>>>> Please note that it is on change set 7846:31417997c82f and I have >>>>>> applied patch of ticket #1894. >>>>>> >>>>>> Thanks >>>>>> -Nagu >>>>>>> -----Original Message----- >>>>>>> From: Nagendra Kumar >>>>>>> Sent: 23 August 2016 15:15 >>>>>>> To: Minh Hon Chau; hans.nordeb...@ericsson.com; Praveen Malviya; >>>>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>> Subject: RE: [PATCH 2 of 2] AMFND: Admin operation continuation if >>>>>>> csi callback completes during headless [#1725 part 1] V1 >>>>>>> >>>>>>> Hi Minh, >>>>>>> The following SU lock case is not working. This issue will >>>>>>> exist >>>>>>> for all the flows, so please check. >>>>>>> >>>>>>> Configuration and traces attached in the ticket. >>>>>>> >>>>>>> Steps: >>>>>>> 1. Start SC-1, SC-2, PL-3 and PL-4. Run the following command: >>>>>>> immcfg -f /tmp/AppConfig-2N-1725.xml amf-adm unlock-in >>>>>>> safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1 >>>>>>> amf-adm unlock-in safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1 >>>>>>> amf-adm unlock-in safSu=SU3,safSg=AmfDemo_2N,safApp=AmfDemo1 >>>>>>> amf-adm unlock safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1 >>>>>>> amf-adm unlock safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1 >>>>>>> amf-adm unlock safSu=SU3,safSg=AmfDemo_2N,safApp=AmfDemo1 >>>>>>> >>>>>>> Assignments are: >>>>>>> PM_SC-1:/home/nagu/views/staging-1725 # /etc/init.d/opensafd >>>>>>> status >>>>>>> safSISU=safSu=SC- >>>>>>> 1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>>>>>> 2N,safApp=OpenSAF >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> safSISU=safSu=SC- >>>>>>> 2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>>>>>> 2N,safApp=OpenSAF >>>>>>> saAmfSISUHAState=STANDBY(2) >>>>>>> safSISU=safSu=PL- >>>>>>> 4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> safSISU=safSu=PL- >>>>>>> 3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> >>>>> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >>>>>>> mo1,safApp=AmfDemo1 >>>>>>> saAmfSISUHAState=STANDBY(2) >>>>>>> >>>>> safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >>>>>>> mo1,safApp=AmfDemo1 >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> >>>>>>> 2. Issue lock on SU1. >>>>>>> amf-adm lock safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1 >>>>>>> And keep gdb in csi_set callback. Stop SC-1 and SC-2. >>>>>>> Send Ok from csi_set callback. >>>>>>> >>>>>>> 3. Start SC-1 and SC-2. >>>>>>> >>>>>>> 4. Assignment to components of SU2 is not given and assignments of >>>>>>> SU2 still shows Standby. >>>>>>> PM_SC-1:/home/nagu/views/staging-1725 # /etc/init.d/opensafd >>>>>>> status >>>>>>> >>>>> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >>>>>>> mo1,safApp=AmfDemo1 >>>>>>> saAmfSISUHAState=STANDBY(2) >>>>>>> safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>>>>>> 2N,safApp=OpenSAF >>>>>>> saAmfSISUHAState=STANDBY(2) >>>>>>> safSISU=safSu=SC- >>>>>>> 1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> safSISU=safSu=PL- >>>>>>> 4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> safSISU=safSu=PL- >>>>>>> 3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> safSISU=safSu=SC- >>>>>>> 2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>>>>>> 2N,safApp=OpenSAF >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> -Nagu >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] >>>>>>>> Sent: 05 August 2016 02:50 >>>>>>>> To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen Malviya; >>>>>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au; >>>>>>>> minh.c...@dektech.com.au >>>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>>> Subject: [PATCH 2 of 2] AMFND: Admin operation continuation if csi >>>>>>>> callback completes during headless [#1725 part 1] V1 >>>>>>>> >>>>>>>> osaf/services/saf/amf/amfnd/di.cc | 199 >>>>>>>> +++++++++++++++++- >>>>> --- >>>>>> -- >>>>>>> -- >>>>>>>> osaf/services/saf/amf/amfnd/include/avnd_di.h | 1 + >>>>>>>> 2 files changed, 134 insertions(+), 66 deletions(-) >>>>>>>> >>>>>>>> >>>>>>>> The patch buffers susi_resp_msg during headless stage and resend >>>>>>>> it to AMFD after headless. >>>>>>>> >>>>>>>> diff --git a/osaf/services/saf/amf/amfnd/di.cc >>>>>>>> b/osaf/services/saf/amf/amfnd/di.cc >>>>>>>> --- a/osaf/services/saf/amf/amfnd/di.cc >>>>>>>> +++ b/osaf/services/saf/amf/amfnd/di.cc >>>>>>>> @@ -804,11 +804,6 @@ uint32_t avnd_di_susi_resp_send(AVND_CB >>>>>>>> if (cb->term_state == >>>>>>>> AVND_TERM_STATE_OPENSAF_SHUTDOWN_STARTED) >>>>>>>> return rc; >>>>>>>> >>>>>>>> - if (cb->is_avd_down == true) { >>>>>>>> - m_AVND_SU_ALL_SI_RESET(su); >>>>>>>> - return rc; >>>>>>>> - } >>>>>>>> - >>>>>>>> // should be in assignment pending state to be here >>>>>>>> osafassert(m_AVND_SU_IS_ASSIGN_PEND(su)); >>>>>>>> >>>>>>>> @@ -819,64 +814,76 @@ uint32_t avnd_di_susi_resp_send(AVND_CB >>>>>>>> TRACE_ENTER2("Sending Resp su=%s, si=%s, curr_state=%u, >>>>>>>> prv_state=%u", su->name.value, curr_si->name.value,curr_si- >>>>>>>>> curr_state,curr_si->prv_state); >>>>>>>> /* populate the susi resp msg */ >>>>>>>> msg.info.avd = new AVSV_DND_MSG(); >>>>>>>> - msg.type = AVND_MSG_AVD; >>>>>>>> - msg.info.avd->msg_type = AVSV_N2D_INFO_SU_SI_ASSIGN_MSG; >>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_id = ++(cb- >>>>>>>>> snd_msg_id); >>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.node_id = cb- >>>>>>>>> node_info.nodeId; >>>>>>>> - if (si) { >>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.single_csi = >>>>>>>> - ((si->single_csi_add_rem_in_si == AVSV_SUSI_ACT_BASE) >>>>> ? >>>>>>>> false : true); >>>>>>>> - } >>>>>>>> - TRACE("curr_assign_state '%u'", >>>>>>>> curr_si->curr_assign_state); >>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_act = >>>>>>>> - (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) >>>>>> || >>>>>>>> - >>>>> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNING(curr_si)) >>>>>> ? >>>>>>>> - ((!curr_si->prv_state) ? AVSV_SUSI_ACT_ASGN : >>>>>>>> AVSV_SUSI_ACT_MOD) : AVSV_SUSI_ACT_DEL; >>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.su_name = su->name; >>>>>>>> - if (si) { >>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.si_name = si- >>>>>> name; >>>>>>>> - if (AVSV_SUSI_ACT_ASGN == >>>>>>>> si->single_csi_add_rem_in_si) { >>>>>>>> - TRACE("si->curr_assign_state '%u'", >>>>>>>> curr_si- >>>>>>>>> curr_assign_state); >>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_act = >>>>>>>> - >>>>>>>> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) || >>>>>>>> - >>>>>>>> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNING(curr_si)) ? >>>>>>>> - AVSV_SUSI_ACT_ASGN : >>>>>>>> AVSV_SUSI_ACT_DEL; >>>>>>>> - } >>>>>>>> - } >>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.ha_state = >>>>>>>> - (SA_AMF_HA_QUIESCING == curr_si->curr_state) ? >>>>>>>> SA_AMF_HA_QUIESCED : curr_si->curr_state; >>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.error = >>>>>>>> - (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) >>>>>> || >>>>>>>> - >>>>> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_REMOVED(curr_si)) >>>>>> ? >>>>>>>> NCSCC_RC_SUCCESS : NCSCC_RC_FAILURE; >>>>>>>> + msg.type = AVND_MSG_AVD; >>>>>>>> + msg.info.avd->msg_type = AVSV_N2D_INFO_SU_SI_ASSIGN_MSG; >>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.node_id = cb- >>>>>>>>> node_info.nodeId; >>>>>>>> + if (si) { >>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.single_csi = >>>>>>>> + ((si->single_csi_add_rem_in_si == >>>>>>>> AVSV_SUSI_ACT_BASE) ? false : true); >>>>>>>> + } >>>>>>>> + TRACE("curr_assign_state '%u'", curr_si->curr_assign_state); >>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_act = >>>>>>>> + >>>>>>>> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) || >>>>>>>> + >>>>>>>> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNING(curr_si)) ? >>>>>>>> + ((!curr_si->prv_state) ? >>>>>>>> AVSV_SUSI_ACT_ASGN : AVSV_SUSI_ACT_MOD) : AVSV_SUSI_ACT_DEL; >>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.su_name = su->name; >>>>>>>> + if (si) { >>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.si_name = si- >>>>>>>>> name; >>>>>>>> + if (AVSV_SUSI_ACT_ASGN == si->single_csi_add_rem_in_si) { >>>>>>>> + TRACE("si->curr_assign_state '%u'", curr_si- >>>>>>>>> curr_assign_state); >>>>>>>> + msg.info.avd- >>>>>>>>> msg_info.n2d_su_si_assign.msg_act = >>>>>>>> + >>>>>>>> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) || >>>>>>>> + >>>>>>>> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNING(curr_si)) ? >>>>>>>> + AVSV_SUSI_ACT_ASGN : >>>>>>>> AVSV_SUSI_ACT_DEL; >>>>>>>> + } >>>>>>>> + } >>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.ha_state = >>>>>>>> + (SA_AMF_HA_QUIESCING == curr_si->curr_state) ? >>>>>>>> SA_AMF_HA_QUIESCED : curr_si->curr_state; >>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.error = >>>>>>>> + >>>>>>>> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) || >>>>>>>> + >>>>>>>> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_REMOVED(curr_si)) ? >>>>>>>> +NCSCC_RC_SUCCESS : NCSCC_RC_FAILURE; >>>>>>>> >>>>>>>> - if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act == >>>>>>>> AVSV_SUSI_ACT_ASGN) >>>>>>>> - osafassert(si); >>>>>>>> + if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act == >>>>>>>> AVSV_SUSI_ACT_ASGN) >>>>>>>> + osafassert(si); >>>>>>>> >>>>>>>> - /* send the msg to AvD */ >>>>>>>> - TRACE("Sending. msg_id'%u', node_id'%u', msg_act'%u', >>>>>>>> su'%s', >>>>>>> si'%s', >>>>>>>> ha_state'%u', error'%u', single_csi'%u'", >>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_id, >>>>>> msg.info.avd- >>>>>>>>> msg_info.n2d_su_si_assign.node_id, >>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_act, >>>>>>> msg.info.avd- >>>>>>>>> msg_info.n2d_su_si_assign.su_name.value, >>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.si_name.value, >>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.ha_state, >>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.error, >>>>> msg.info.avd- >>>>>>>>> msg_info.n2d_su_si_assign.single_csi); >>>>>>>> + /* send the msg to AvD */ >>>>>>>> + TRACE("Sending. msg_id'%u', node_id'%u', msg_act'%u', su'%s', >>>>>>>> si'%s', ha_state'%u', error'%u', single_csi'%u'", >>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_id, >>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.node_id, >>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_act, >>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.su_name.value, >>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.si_name.value, >>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.ha_state, >>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.error, >>>>>>>> +msg.info.avd->msg_info.n2d_su_si_assign.single_csi); >>>>>>>> >>>>>>>> - if ((su->si_list.n_nodes > 1) && (si == nullptr)) { >>>>>>>> - if >>>>>>>> (msg.info.avd->msg_info.n2d_su_si_assign.msg_act == >>>>>>>> AVSV_SUSI_ACT_DEL) >>>>>>>> - LOG_NO("Removed 'all SIs' from '%s'", >>>>>>>> su->name.value); >>>>>>>> + if ((su->si_list.n_nodes > 1) && (si == nullptr)) { >>>>>>>> + if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act == >>>>>>>> AVSV_SUSI_ACT_DEL) >>>>>>>> + LOG_NO("Removed 'all SIs' from '%s'", su- >>>>>>>>> name.value); >>>>>>>> - if >>>>>>>> (msg.info.avd->msg_info.n2d_su_si_assign.msg_act == >>>>>>>> AVSV_SUSI_ACT_MOD) >>>>>>>> - LOG_NO("Assigned 'all SIs' %s of '%s'", >>>>>>>> - ha_state[msg.info.avd- >>>>>>>>> msg_info.n2d_su_si_assign.ha_state], >>>>>>>> - su->name.value); >>>>>>>> - } >>>>>>>> + if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act == >>>>>>>> AVSV_SUSI_ACT_MOD) >>>>>>>> + LOG_NO("Assigned 'all SIs' %s of '%s'", >>>>>>>> + ha_state[msg.info.avd- >>>>>>>>> msg_info.n2d_su_si_assign.ha_state], >>>>>>>> + su->name.value); >>>>>>>> + } >>>>>>>> >>>>>>>> - rc = avnd_di_msg_send(cb, &msg); >>>>>>>> - if (NCSCC_RC_SUCCESS == rc) >>>>>>>> - msg.info.avd = 0; >>>>>>>> - >>>>>>>> - /* we have completed the SU SI msg processing */ >>>>>>>> - if (su_assign_state_is_stable(su)) >>>>>>>> - m_AVND_SU_ASSIGN_PEND_RESET(su); >>>>>>>> - m_AVND_SU_ALL_SI_RESET(su); >>>>>>>> + if (cb->is_avd_down == true) { >>>>>>>> + // We are in headless, buffer this msg >>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 0; >>>>>>>> + if (avnd_diq_rec_add(cb, &msg) == nullptr) { >>>>>>>> + rc = NCSCC_RC_FAILURE; >>>>>>>> + } >>>>>>>> + m_AVND_SU_ALL_SI_RESET(su); >>>>>>>> + LOG_NO("avnd_di_susi_resp_send() deferred as AMF >>>>>>>> director is offline"); >>>>>>>> + } else { >>>>>>>> + // We are in normal cluster, send msg to director >>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_id = ++(cb- >>>>>>>>> snd_msg_id); >>>>>>>> + /* send the msg to AvD */ >>>>>>>> + rc = avnd_di_msg_send(cb, &msg); >>>>>>>> + if (NCSCC_RC_SUCCESS == rc) >>>>>>>> + msg.info.avd = 0; >>>>>>>> + /* we have completed the SU SI msg processing */ >>>>>>>> + if (su_assign_state_is_stable(su)) { >>>>>>>> + m_AVND_SU_ASSIGN_PEND_RESET(su); >>>>>>>> + } >>>>>>>> + m_AVND_SU_ALL_SI_RESET(su); >>>>>>>> + } >>>>>>>> >>>>>>>> /* free the contents of avnd message */ >>>>>>>> avnd_msg_content_free(cb, &msg); @@ -1255,14 +1262,7 @@ void >>>>>>>> avnd_diq_rec_del(AVND_CB *cb, AVND_ >>>>>>>> /* stop the AvD msg response timer */ >>>>>>>> if (m_AVND_TMR_IS_ACTIVE(rec->resp_tmr)) { >>>>>>>> m_AVND_TMR_MSG_RESP_STOP(cb, *rec); >>>>>>>> - // Resend msgs from queue because amfd dropped during >>>>>>>> sync >>>>>>>> - if ((cb->dnd_list.head != nullptr)) { >>>>>>>> - TRACE("retransmit message to amfd"); >>>>>>>> - AVND_DND_MSG_LIST *pending_rec = 0; >>>>>>>> - for (pending_rec = cb->dnd_list.head; pending_rec != >>>>>>>> nullptr; pending_rec = pending_rec->next) { >>>>>>>> - avnd_diq_rec_send(cb, pending_rec); >>>>>>>> - } >>>>>>>> - } >>>>>>>> + avnd_diq_rec_send_buffered_msg(cb); >>>>>>>> /* resend pg start track */ >>>>>>>> avnd_di_resend_pg_start_track(cb); >>>>>>>> } >>>>>>>> @@ -1275,6 +1275,73 @@ void avnd_diq_rec_del(AVND_CB *cb, >>>>> AVND_ >>>>>>>> TRACE_LEAVE(); >>>>>>>> return; >>>>>>>> } >>>>>>>> >>>>> +/************************************************************ >>>>>>>> **************** >>>>>>>> + Name : avnd_diq_rec_send_buffered_msg >>>>>>>> + >>>>>>>> + Description : Resend buffered msg >>>>>>>> + >>>>>>>> + Arguments : cb - ptr to the AvND control block >>>>>>>> + >>>>>>>> + Return Values : None. >>>>>>>> + >>>>>>>> + Notes : None. >>>>>>>> >>>>> +************************************************************* >>>>>>>> ********** >>>>>>>> +*******/ void avnd_diq_rec_send_buffered_msg(AVND_CB *cb) { >>>>>>>> + TRACE_ENTER(); >>>>>>>> + // Resend msgs from queue because amfnd dropped during >>>>>>>> headless >>>>>>>> + // or headless-synchronization >>>>>>>> + if ((cb->dnd_list.head != nullptr)) { >>>>>>>> + AVND_DND_MSG_LIST *pending_rec = 0; >>>>>>>> + TRACE("Attach msg_id of buffered msg"); >>>>>>>> + bool found = true; >>>>>>>> + while (found) { >>>>>>>> + found = false; >>>>>>>> + for (pending_rec = cb->dnd_list.head; pending_rec != >>>>>>>> nullptr; pending_rec = pending_rec->next) { >>>>>>>> + if (pending_rec->msg.type == >>>>>>>> AVND_MSG_AVD) { >>>>>>>> + // At this moment, only oper_state >>>>>>>> msg needs to report to director >>>>>>>> + if (pending_rec->msg.info.avd- >>>>>>>>> msg_type == AVSV_N2D_INFO_SU_SI_ASSIGN_MSG && >>>>>>>> + pending_rec->msg.info.avd- >>>>>>>>> msg_info.n2d_su_si_assign.msg_id == 0) { >>>>>>>> + m_AVND_DIQ_REC_POP(cb, >>>>>>>> pending_rec); #if 0 >>>>>>>> + // only resend if this SUSI >>>>>>>> does exist >>>>>>>> + AVND_SU *su = >>>>>>>> m_AVND_SUDB_REC_GET(cb->sudb, >>>>>>>> + pending_rec- >>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.su_name); >>>>>>>> + if (su != nullptr && su- >>>>>>>>> si_list.n_nodes > 0) { #endif >>>>>>>> + pending_rec- >>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.msg_id = >>>>>>>>> ++(cb->snd_msg_id); >>>>>>>> + >>>>>>>> m_AVND_DIQ_REC_PUSH(cb, pending_rec); >>>>>>>> + LOG_NO("Found and >>>>>>>> resend buffered su_si_assign msg for SU:'%s', " >>>>>>>> + >>>>>>>> "SI:'%s', ha_state:'%u', msg_act:'%u', single_csi:'%u', " >>>>>>>> + >>>>>>>> "error:'%u', msg_id:'%u'", >>>>>>>> + >>>>>>>> pending_rec->msg.info.avd- >>>>>>>>> msg_info.n2d_su_si_assign.su_name.value, >>>>>>>> + >>>>>>>> pending_rec->msg.info.avd- >>>>>>>>> msg_info.n2d_su_si_assign.si_name.value, >>>>>>>> + >>>>>>>> pending_rec->msg.info.avd->msg_info.n2d_su_si_assign.ha_state, >>>>>>>> + >>>>>>>> pending_rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_act, >>>>>>>> + >>>>>>>> pending_rec->msg.info.avd->msg_info.n2d_su_si_assign.single_csi, >>>>>>>> + >>>>>>>> pending_rec->msg.info.avd->msg_info.n2d_su_si_assign.error, >>>>>>>> + >>>>>>>> pending_rec->msg.info.avd->msg_info.n2d_su_si_assign.msg_id); >>>>>>>> + >>>>>>>> +#if 0 >>>>>>>> + } else { >>>>>>>> + >>>>>>>> avnd_msg_content_free(cb, &pending_rec->msg); >>>>>>>> + delete pending_rec; >>>>>>>> + pending_rec = cb- >>>>>>>>> dnd_list.head; >>>>>>>> + } >>>>>>>> +#endif >>>>>>>> + found = true; >>>>>>>> + } >>>>>>>> + } >>>>>>>> + } >>>>>>>> + } >>>>>>>> + TRACE("retransmit message to amfd"); >>>>>>>> + for (pending_rec = cb->dnd_list.head; pending_rec != >>>>>>>> nullptr; >>>>>>>> pending_rec = pending_rec->next) { >>>>>>>> + avnd_diq_rec_send(cb, pending_rec); >>>>>>>> + } >>>>>>>> + } >>>>>>>> + TRACE_LEAVE(); >>>>>>>> + return; >>>>>>>> +} >>>>>>>> >>>>>>>> >>>>>>>> >>>>> /************************************************************* >>>>>>>> *************** >>>>>>>> Name : avnd_diq_rec_send >>>>>>>> diff --git a/osaf/services/saf/amf/amfnd/include/avnd_di.h >>>>>>>> b/osaf/services/saf/amf/amfnd/include/avnd_di.h >>>>>>>> --- a/osaf/services/saf/amf/amfnd/include/avnd_di.h >>>>>>>> +++ b/osaf/services/saf/amf/amfnd/include/avnd_di.h >>>>>>>> @@ -79,6 +79,7 @@ void avnd_di_msg_ack_process(struct avnd void >>>>>>>> avnd_diq_del(struct avnd_cb_tag *); AVND_DND_MSG_LIST >>>>>>>> *avnd_diq_rec_add(struct avnd_cb_tag *cb, AVND_MSG *msg); void >>>>>>>> avnd_diq_rec_del(struct avnd_cb_tag *cb, AVND_DND_MSG_LIST >>>>> *rec); >>>>>>>> +void avnd_diq_rec_send_buffered_msg(struct avnd_cb_tag *cb); >>>>>>>> uint32_t avnd_diq_rec_send(struct avnd_cb_tag *cb, >>>>>>> AVND_DND_MSG_LIST >>>>>>>> *rec); uint32_t avnd_di_reg_su_rsp_snd(struct avnd_cb_tag *cb, >>>>>>>> SaNameT *su_name, uint32_t ret_code); uint32_t >>>>>>>> avnd_di_ack_nack_msg_send(struct avnd_cb_tag *cb, uint32_t rcv_id, >>>>>>>> uint32_t view_num); >>> >> > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel