Hi Nagu, Thanks for your update, I hope I could float the patch so everyone can do code review, but let's wait for your testing first.
Thanks, Minh On 22/09/16 21:22, Nagendra Kumar wrote: > Hi Minh, > > I have tested following scenarios till now and works well: > 1. Faults under standalone system. Act and Std SUs are there and cluster went > headless. Faults occurred in Act and Standby separately and together with > recovery as restart, su f/o, node f/o, node s/o. Faults are also mixed with > two SGs. > 2. During headless, the escalations from comp restart->su restart->su f/o -> > node f/o. Two SG was used for testings. When cluster recovers, the system > works well as expected. > > SG and node Auto repair was enabled and disabled in many test cases. > > Further testing will continue: > 1. Faults during assignments. > 2. Faults during admin operations. > > Thanks > -Nagu > >> -----Original Message----- >> From: minh chau [mailto:minh.c...@dektech.com.au] >> Sent: 15 September 2016 12:27 >> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya; >> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: Re: [devel] [PATCH 2 of 4] AMFND: Admin operation continuation if >> csi completes during headless [#1725 part 1] V1 >> >> Hi Nagu, >> >> Yes you are right. The "component failover" is unstable, I think I will post >> my >> analysis of component failover problem to #1902 after you have verified the >> other recoveries. >> >> Thanks, >> Minh >> >> On 15/09/16 16:51, Nagendra Kumar wrote: >>> Hi Minh, >>> @2.a.) and @2.b.) are working except "Component Failover" >> as recovery. Other recovery like SU Failover, etc are working fine with >> 1725_pending_review.tgz and 07_no_recovery_if_no_pending_susi.diff. >>> Please confirm. >>> >>> Thanks >>> -Nagu >>> >>>> -----Original Message----- >>>> From: Nagendra Kumar >>>> Sent: 15 September 2016 12:13 >>>> To: minh chau; hans.nordeb...@ericsson.com; Praveen Malviya; >>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >>>> Cc: opensaf-devel@lists.sourceforge.net >>>> Subject: Re: [devel] [PATCH 2 of 4] AMFND: Admin operation >>>> continuation if csi completes during headless [#1725 part 1] V1 >>>> >>>> Hi Minh, >>>>>> If there's no any major problem, can we make SI Dep as last phase? >>>> Yes, absolutely. There is no problem. >>>>>> If I am right, I think you are testing @2.a) - and *fault* has just >>>>>> been as >>>> node reboot/powered-off by user during headless. >>>> Yes, you are right. >>>> >>>> Thanks >>>> -Nagu >>>> >>>>> -----Original Message----- >>>>> From: minh chau [mailto:minh.c...@dektech.com.au] >>>>> Sent: 14 September 2016 17:54 >>>>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya; >>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>> Subject: Re: [devel] [PATCH 2 of 4] AMFND: Admin operation >>>>> continuation if csi completes during headless [#1725 part 1] V1 >>>>> >>>>> Hi Nagu, >>>>> >>>>> I have proposed to change the order on 28 Jul: >>>>> >>>>> ============== >>>>> >>>>> I would like to change the above orders of implementation: >>>>> @0. We are here now: No admin op continuation, no recovery on faults >>>>> during headless. >>>>> Since componentRestart/suRestart has no impact on recovery after >>>>> headless, faults during headless here mean: failover escalation, >>>>> node reboot/powered-off by user during headless. Faults are >>>>> different phenomenons but they all result in loss of SUSI. Having >>>>> #1902 will remove the major impact of a node reboot due to immediate >>>>> escalation and AMF also has to deal with the loss of SUSI the same >>>>> as without >>>>> #1902 plus failover escalation >>>>> >>>>> @1. Admin op continuation without required recovery on faults during >>>>> headless >>>>> @1.a) All CSI(s) callback completes during headless, but SUSI states >>>>> are still QUIESCED/QUIESCING >>>>> @1.b) One of CSI(s) callback is still ongoing after headless (AMFD >>>>> would have to wait for it?) >>>>> >>>>> @2. Recovery on faults. (Doing fault recovery needs to consider >>>>> admin op continuation which would have been implemented in step @1) >>>>> Need >>>>> #1902 >>>>> @2.a.) Faults in normal flow: No admin op continuation is required >>>>> after headless, but fault did happen during headless >>>>> @2.b.) Faults happen during admin operation while headless, after >>>>> headless AMFD needs to consider a recovery on fault together with >>>>> admin op continuation. >>>>> >>>>> @3. @1 + @2 + With SI Dep. >>>>> >>>>> =============== >>>>> I thought we have followed the above order so far? Because part 1 >>>>> was acked, which is "@1. Admin op continuation without required >>>>> recovery on faults during headless" >>>>> If there's no any major problem, can we make SI Dep as last phase? >>>>> If I am right, I think you are testing @2.a) - and *fault* has just >>>>> been as node reboot/powered-off by user during headless. >>>>> >>>>> Thanks, >>>>> Minh >>>>> >>>>> On 14/09/16 21:48, Nagendra Kumar wrote: >>>>>> Hi Minh, >>>>>> If it is not tested, then it is fine. But, we had added (#1) the >>>>> following in the ticket #1725 on 27 Jul : >>>>>> =========================================== >>>>>> Nagendra Kumar - 2016-07-27 >>>>>> >>>>>> For 2N red model, implementation can be done in the following >>>>>> phased >>>>> manner. >>>>>> It has advantages of being logically segregated and it continues >>>>>> from >>>>> where we left in 5.0. >>>>>> (Phases #1, #2 and #3 is more related to ticket #1725 and phases #4 >>>>>> and #5 are related to #1902) >>>>>> >>>>>> 1. Node restart escalation (with and without SI Dep). >>>>>> 2. Without Si Dep : Admin op (no faults/escalations). >>>>>> 3. Without Si Dep : Admin Op + node restart faults/escalations during >>>>> headless. >>>>>> 4. Without Si Dep : >>>>>> a.) All faults in normal flows. >>>>>> b.) All faults during admin operation(minus node reboot >>>>>> during headless >>>>> as covered in #3). >>>>>> 5. With Si Dep : #2, #3 and #4. >>>>>> >>>>>> Since 5.0 already has immediate escalation model (component and >>>>>> node >>>>> restart/reboot), so #1, #2 and #3 completes left over portion of >>>>> headless contribution in 5.0 with that model. >>>>>> ====================================== >>>>>> >>>>>> Thanks >>>>>> -Nagu >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: minh chau [mailto:minh.c...@dektech.com.au] >>>>>>> Sent: 14 September 2016 17:05 >>>>>>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen >> Malviya; >>>>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>> Subject: Re: [devel] [PATCH 2 of 4] AMFND: Admin operation >>>>>>> continuation if csi completes during headless [#1725 part 1] V1 >>>>>>> >>>>>>> Hi Nagu, >>>>>>> >>>>>>> SI Dep is the last phase of implementation of headless recovery, >>>>>>> its support is not included in all patches attached in ticket #1725. >>>>>>> >>>>>>> Thanks, >>>>>>> Minh >>>>>>> >>>>>>> On 14/09/16 21:21, Nagendra Kumar wrote: >>>>>>>> Hi Minh, >>>>>>>> Have you tested Si Dep (2N Red model) for "node >> restart test >>>>>>> cases" ? I can't see it in the test case doc. >>>>>>>> Thanks >>>>>>>> -Nagu >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Nagendra Kumar >>>>>>>>> Sent: 13 September 2016 11:20 >>>>>>>>> To: minh chau; hans.nordeb...@ericsson.com; Praveen Malviya; >>>>>>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >>>>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>>>> Subject: Re: [devel] [PATCH 2 of 4] AMFND: Admin operation >>>>>>>>> continuation if csi completes during headless [#1725 part 1] V1 >>>>>>>>> >>>>>>>>> Hi Minh, >>>>>>>>> I have tested these scenarios again and it works well. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> -Nagu >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: minh chau [mailto:minh.c...@dektech.com.au] >>>>>>>>>> Sent: 12 September 2016 11:53 >>>>>>>>>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen >>>>> Malviya; >>>>>>>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au >>>>>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>>>>> Subject: Re: [PATCH 2 of 4] AMFND: Admin operation continuation >>>>>>>>>> if csi completes during headless [#1725 part 1] V1 >>>>>>>>>> >>>>>>>>>> Hi Nagu, >>>>>>>>>> >>>>>>>>>> One bug get hit by your configuration, where the absent SUSIs >>>>>>>>>> are found after headless but no real SUSIs are available also. >>>>>>>>>> In this case I think that AMFD can do like a fresh assignment. >>>>>>>>>> I attach the patch to ticket #1725, please help to test again. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Minh >>>>>>>>>> >>>>>>>>>> On 12/09/16 11:09, minh chau wrote: >>>>>>>>>>> Hi Nagu, >>>>>>>>>>> >>>>>>>>>>> I'm running the tests with this configuration and will get >>>>>>>>>>> back to >>>> you. >>>>>>>>>>> Thanks, >>>>>>>>>>> Minh >>>>>>>>>>> >>>>>>>>>>> On 09/09/16 22:26, Nagendra Kumar wrote: >>>>>>>>>>>> Hi Minh, >>>>>>>>>>>> I am using 1725_pending_review.tgz >>>>>>>>>>>> (1725_02_V2_bugfix_01_resend_buffer_in_set_leds.diff, >>>>>>>>>>>> 1725_02_V2_bugfix_02_honor_clusterinit_nodesync_timer.diff, >>>>>>>>>>>> 1725_02_V2_bugfix_03_restore_ng_admin.diff, >>>>>>>>>>>> 1725_03_V4_failover_absent_susi_longDn.diff, >>>>>>>>>>>> 1725_04_V2_headless_validation.diff, >>>>>>>>>>>> 1725_05_V2_resend_oper_state.diff, >>>>>>>>>>>> 1725_06a_fullscope_escalation_headless.diff). >>>>>>>>>>>> >>>>>>>>>>>> I am doing basic node reboot validation testing with no faults. >>>>>>>>>>>> >>>>>>>>>>>> Configuration: SU1(act) and SU2(stanby) both on PL-3. >>>>>>>>>>>> >>>>>>>>>>>> TC #1: Start SC-1, PL-3 and PL-5: Unlock SU1 and SU2. Stop >>>>>>>>>>>> SC-1 and stop PL-3, start PL-3 and start SC-1. >>>>>>>>>>>> After SC-1 and PL-3 comes back, ideally SU1 and SU2 should >>>>>>>>>>>> get assignments as Act and Std, but no assignment are being >>>>>>>>>>>> given to SUs on PL-3 and it shows following in status: >>>>>>>>>>>> >>>>>>>>>>>> Only Su2 has Std assignment. >>>>>>>>>>>> >>>>>>>>>>>> safSISU=safSu=SC- >>>>>>>>>> 1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=O >>>>>>>>>>>> penSAF >>>>>>>>>>>> >>>>>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>>>>> safSISU=safSu=PL- >>>>>>>>>> 5\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=O >>>>>>>>>>>> penSAF >>>>>>>>>>>> >>>>>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>>>>> >> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >>>>>>>>>> mo1,s >>>>>>>>>>>> afApp=AmfDemo1 >>>>>>>>>>>> >>>>>>>>>>>> saAmfSISUHAState=STANDBY(2) >>>>>>>>>>>> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>>>>>>>>> 2N,safApp=OpenSAF >>>>>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>>>>> safSISU=safSu=PL- >>>>>>>>>> 3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=O >>>>>>>>>>>> penSAF >>>>>>>>>>>> >>>>>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>>>>> >>>>>>>>>>>> TC #2: Configuration same as TC#1. Stop PL-3 and don't start. >>>>>>>>>>>> The same issue: >>>>>>>>>>>> safSISU=safSu=PL- >>>>>>>>>> 5\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=O >>>>>>>>>>>> penSAF >>>>>>>>>>>> >>>>>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>>>>> >> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >>>>>>>>>> mo1,s >>>>>>>>>>>> afApp=AmfDemo1 >>>>>>>>>>>> >>>>>>>>>>>> saAmfSISUHAState=STANDBY(2) >>>>>>>>>>>> safSISU=safSu=SC- >>>>>>>>>> 1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=O >>>>>>>>>>>> penSAF >>>>>>>>>>>> >>>>>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>>>>> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>>>>>>>>> 2N,safApp=OpenSAF >>>>>>>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>>>>>>> >>>>>>>>>>>> TC #3: Configured SU1(Act) on PL-3 and SU2(Std) on PL-4. >>>>>>>>>>>> Stop SC-1, stop PL-3 and PL-4, but PL-5 is running. start >>>>>>>>>>>> SC-1, the same issue. >>>>>>>>>>>> >>>>>>>>>>>> TC #4: Same as TC #3, but SU3 configured on PL-5 as spare. >>>>>>>>>>>> SU3 doesn't get any assignment and Sg is unstable. >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> -Nagu >>>>>>>>>>>> >>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] >>>>>>>>>>>>> Sent: 18 August 2016 05:46 >>>>>>>>>>>>> To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen >>>>>>>>> Malviya; >>>>>>>>>>>>> gary....@dektech.com.au; long.hb.ngu...@dektech.com.au; >>>>>>>>>>>>> minh.c...@dektech.com.au >>>>>>>>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>>>>>>>> Subject: [PATCH 2 of 4] AMFND: Admin operation continuation >>>>>>>>>>>>> if csi completes during headless [#1725 part 1] V1 >>>>>>>>>>>>> >>>>>>>>>>>>> osaf/services/saf/amf/amfnd/di.cc | 199 >>>>>>>>>>>>> +++++++++++++++++-------- >>>>>>>>>>>>> osaf/services/saf/amf/amfnd/include/avnd_di.h | 1 + >>>>>>>>>>>>> 2 files changed, 134 insertions(+), 66 deletions(-) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> There're two options basically that AMFD can continue admin >>>>>>>>>>>>> operation wih completed csi(s) >>>>>>>>>>>>> >>>>>>>>>>>>> First: AMFD can use the sync SUSI fsm state as latest, AMFD >>>>>>>>>>>>> then has to explore its SUSI assignments with adminStates of >>>>>>>>>>>>> relevant entities to determine which SU should be on call of >>>>> susi_success(). >>>>>>>>>>>>> Deeper level of exploration for csi addition. It also >>>>>>>>>>>>> depends on SG Fsm state which is being used variously in >>>>>>>>>>>>> different SG >>>> types. >>>>>>>>>>>>> Second: AMFD uses the SUSI fsm state read from IMM as >>>>>>>>>>>>> latest, and AMFND needs to resend susi_resp messages which >>>>>>>>>>>>> were deferred during headless so that AMFD can continue the >>>>>>>>>>>>> admin operation >>>>>>>>> sequence. >>>>>>>>>>>>> Both cases of csi completion [during or after] headless can >>>>>>>>>>>>> run in the same code flow. >>>>>>>>>>>>> >>>>>>>>>>>>> The patch buffers susi_resp_msg during headless stage and >>>>>>>>>>>>> resend it to AMFD after headless. There could be a chance >>>>>>>>>>>>> that AMFND sent out susi response message but AMFD could >> not >>>>> receive >>>>>>>>>>>>> or process it. This case could be seen as a defect, which >>>>>>>>>>>>> can be fixed by securing the result of sending susi_resp >>>>>>>>>>>>> message from AMFND toward >>>>>>>>> AMFD. >>>>>>>>>>>>> diff --git a/osaf/services/saf/amf/amfnd/di.cc >>>>>>>>>>>>> b/osaf/services/saf/amf/amfnd/di.cc >>>>>>>>>>>>> --- a/osaf/services/saf/amf/amfnd/di.cc >>>>>>>>>>>>> +++ b/osaf/services/saf/amf/amfnd/di.cc >>>>>>>>>>>>> @@ -805,11 +805,6 @@ uint32_t >>>>> avnd_di_susi_resp_send(AVND_CB >>>>>>>>>>>>> if (cb->term_state == >>>>>>>>>>>>> AVND_TERM_STATE_OPENSAF_SHUTDOWN_STARTED) >>>>>>>>>>>>> return rc; >>>>>>>>>>>>> >>>>>>>>>>>>> - if (cb->is_avd_down == true) { >>>>>>>>>>>>> - m_AVND_SU_ALL_SI_RESET(su); >>>>>>>>>>>>> - return rc; >>>>>>>>>>>>> - } >>>>>>>>>>>>> - >>>>>>>>>>>>> // should be in assignment pending state to be here >>>>>>>>>>>>> osafassert(m_AVND_SU_IS_ASSIGN_PEND(su)); >>>>>>>>>>>>> >>>>>>>>>>>>> @@ -820,64 +815,76 @@ uint32_t >>>>>>> avnd_di_susi_resp_send(AVND_CB >>>>>>>>>>>>> TRACE_ENTER2("Sending Resp su=%s, si=%s, >>>>>>>>>>>>> curr_state=%u, prv_state=%u", su->name.value, >>>>>>>>>>>>> curr_si->name.value,curr_si- >>>>>>>>>>>>>> curr_state,curr_si->prv_state); >>>>>>>>>>>>> /* populate the susi resp msg */ >>>>>>>>>>>>> msg.info.avd = new AVSV_DND_MSG(); >>>>>>>>>>>>> - msg.type = AVND_MSG_AVD; >>>>>>>>>>>>> - msg.info.avd->msg_type = >>>>>>>>> AVSV_N2D_INFO_SU_SI_ASSIGN_MSG; >>>>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_id = >> ++(cb- >>>>>>>>>>>>>> snd_msg_id); >>>>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.node_id = cb- >>>>>>>>>>>>>> node_info.nodeId; >>>>>>>>>>>>> - if (si) { >>>>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.single_csi = >>>>>>>>>>>>> - ((si->single_csi_add_rem_in_si == >>>>>>>>>>>>> AVSV_SUSI_ACT_BASE) ? >>>>>>>>>>>>> false : true); >>>>>>>>>>>>> - } >>>>>>>>>>>>> - TRACE("curr_assign_state '%u'", curr_si- >>>>> curr_assign_state); >>>>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_act = >>>>>>>>>>>>> - >> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) >>>> || >>>>>>>>>>>>> - >> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNING(curr_si)) >>>> ? >>>>>>>>>>>>> - ((!curr_si->prv_state) ? AVSV_SUSI_ACT_ASGN : >>>>>>>>>>>>> AVSV_SUSI_ACT_MOD) : AVSV_SUSI_ACT_DEL; >>>>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.su_name = su- >>>>>>>> name; >>>>>>>>>>>>> - if (si) { >>>>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.si_name = si- >>> name; >>>>>>>>>>>>> - if (AVSV_SUSI_ACT_ASGN == >>>>>>>>>>>>> si->single_csi_add_rem_in_si) { >>>>>>>>>>>>> - TRACE("si->curr_assign_state '%u'", >>>>>>>>>>>>> curr_si- >>>>>>>>>>>>>> curr_assign_state); >>>>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_act = >>>>>>>>>>>>> - >>>>>>>>>>>>> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) >> || >>>>>>>>>>>>> - >>>>>>>>>>>>> >> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNING(curr_si)) ? >>>>>>>>>>>>> - AVSV_SUSI_ACT_ASGN : >>>>>>>>>>>>> AVSV_SUSI_ACT_DEL; >>>>>>>>>>>>> - } >>>>>>>>>>>>> - } >>>>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.ha_state = >>>>>>>>>>>>> - (SA_AMF_HA_QUIESCING == curr_si->curr_state) ? >>>>>>>>>>>>> SA_AMF_HA_QUIESCED : curr_si->curr_state; >>>>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.error = >>>>>>>>>>>>> - >> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) >>>> || >>>>>>>>>>>>> - >> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_REMOVED(curr_si)) >>>> ? >>>>>>>>>>>>> NCSCC_RC_SUCCESS : NCSCC_RC_FAILURE; >>>>>>>>>>>>> + msg.type = AVND_MSG_AVD; >>>>>>>>>>>>> + msg.info.avd->msg_type = >>>>>>> AVSV_N2D_INFO_SU_SI_ASSIGN_MSG; >>>>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.node_id = cb- >>>>>>>>>>>>>> node_info.nodeId; >>>>>>>>>>>>> + if (si) { >>>>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.single_csi = >>>>>>>>>>>>> + ((si->single_csi_add_rem_in_si == >>>>>>>>>>>>> AVSV_SUSI_ACT_BASE) ? false : true); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + TRACE("curr_assign_state '%u'", curr_si- >>> curr_assign_state); >>>>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_act = >>>>>>>>>>>>> + >>>>>>>>>>>>> >>>> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) >>>>> || >>>>>>>>>>>>> + >>>>>>>>>>>>> >>>> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNING(curr_si)) >>>>> ? >>>>>>>>>>>>> + ((!curr_si->prv_state) ? >>>>>>>>>>>>> AVSV_SUSI_ACT_ASGN : AVSV_SUSI_ACT_MOD) : >>>>>>>>> AVSV_SUSI_ACT_DEL; >>>>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.su_name = su- >>>>>> name; >>>>>>>>>>>>> + if (si) { >>>>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.si_name = >>>>>>>>>>>>> + si- >>>>>>>>>>>>>> name; >>>>>>>>>>>>> + if (AVSV_SUSI_ACT_ASGN == >>>>>>>>>>>>> + si->single_csi_add_rem_in_si) >>>> { >>>>>>>>>>>>> + TRACE("si->curr_assign_state '%u'", curr_si- >>>>>>>>>>>>>> curr_assign_state); >>>>>>>>>>>>> + msg.info.avd- >>>>>>>>>>>>>> msg_info.n2d_su_si_assign.msg_act = >>>>>>>>>>>>> + >>>>>>>>>>>>> >>>> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) >>>>> || >>>>>>>>>>>>> + >>>>>>>>>>>>> >>>> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNING(curr_si)) >>>>> ? >>>>>>>>>>>>> + AVSV_SUSI_ACT_ASGN : >>>>>>>>>>>>> AVSV_SUSI_ACT_DEL; >>>>>>>>>>>>> + } >>>>>>>>>>>>> + } >>>>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.ha_state = >>>>>>>>>>>>> + (SA_AMF_HA_QUIESCING == curr_si->curr_state) ? >>>>>>>>>>>>> SA_AMF_HA_QUIESCED : curr_si->curr_state; >>>>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.error = >>>>>>>>>>>>> + >>>>>>>>>>>>> >>>> (m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_ASSIGNED(curr_si) >>>>> || >>>>>>>>>>>>> + >>>>>>>>>>>>> >>>> m_AVND_SU_SI_CURR_ASSIGN_STATE_IS_REMOVED(curr_si)) >>>>> ? >>>>>>>>>>>>> +NCSCC_RC_SUCCESS : NCSCC_RC_FAILURE; >>>>>>>>>>>>> >>>>>>>>>>>>> - if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act == >>>>>>>>>>>>> AVSV_SUSI_ACT_ASGN) >>>>>>>>>>>>> - osafassert(si); >>>>>>>>>>>>> + if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act == >>>>>>>>>>>>> AVSV_SUSI_ACT_ASGN) >>>>>>>>>>>>> + osafassert(si); >>>>>>>>>>>>> >>>>>>>>>>>>> - /* send the msg to AvD */ >>>>>>>>>>>>> - TRACE("Sending. msg_id'%u', node_id'%u', msg_act'%u', >>>>>>>>>>>>> su'%s', si'%s', >>>>>>>>>>>>> ha_state'%u', error'%u', single_csi'%u'", >>>>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_id, >>>>>>>>>>>>> msg.info.avd- >>>>>>>>>>>>>> msg_info.n2d_su_si_assign.node_id, >>>>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.msg_act, >>>>>>>>>>>>> msg.info.avd- >>>>>>>>>>>>>> msg_info.n2d_su_si_assign.su_name.value, >>>>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.si_name.value, >>>>>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.ha_state, >>>>>>>>>>>>> - msg.info.avd->msg_info.n2d_su_si_assign.error, >>>>>>>>>>>>> msg.info.avd- >>>>>>>>>>>>>> msg_info.n2d_su_si_assign.single_csi); >>>>>>>>>>>>> + /* send the msg to AvD */ >>>>>>>>>>>>> + TRACE("Sending. msg_id'%u', node_id'%u', msg_act'%u', >>>>>>>>>>>>> + su'%s', >>>>>>>>>>>>> si'%s', ha_state'%u', error'%u', single_csi'%u'", >>>>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_id, >>>>>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.node_id, >>>>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_act, >>>>>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.su_name.value, >>>>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.si_name.value, >>>>>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.ha_state, >>>>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.error, >>>>>>>>>>>>> +msg.info.avd->msg_info.n2d_su_si_assign.single_csi); >>>>>>>>>>>>> >>>>>>>>>>>>> - if ((su->si_list.n_nodes > 1) && (si == nullptr)) { >>>>>>>>>>>>> - if (msg.info.avd- >>> msg_info.n2d_su_si_assign.msg_act >>>> == >>>>>>>>>>>>> AVSV_SUSI_ACT_DEL) >>>>>>>>>>>>> - LOG_NO("Removed 'all SIs' from '%s'", >>>>>>>>>>>>> su->name.value); >>>>>>>>>>>>> + if ((su->si_list.n_nodes > 1) && (si == nullptr)) { >>>>>>>>>>>>> + if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act >>>>>>>>>>>>> + == >>>>>>>>>>>>> AVSV_SUSI_ACT_DEL) >>>>>>>>>>>>> + LOG_NO("Removed 'all SIs' from '%s'", su- >>>>>>>>>>>>>> name.value); >>>>>>>>>>>>> - if (msg.info.avd- >>> msg_info.n2d_su_si_assign.msg_act >>>> == >>>>>>>>>>>>> AVSV_SUSI_ACT_MOD) >>>>>>>>>>>>> - LOG_NO("Assigned 'all SIs' %s of '%s'", >>>>>>>>>>>>> - ha_state[msg.info.avd- >>>>>>>>>>>>>> msg_info.n2d_su_si_assign.ha_state], >>>>>>>>>>>>> - su->name.value); >>>>>>>>>>>>> - } >>>>>>>>>>>>> + if (msg.info.avd->msg_info.n2d_su_si_assign.msg_act >>>>>>>>>>>>> + == >>>>>>>>>>>>> AVSV_SUSI_ACT_MOD) >>>>>>>>>>>>> + LOG_NO("Assigned 'all SIs' %s of '%s'", >>>>>>>>>>>>> + ha_state[msg.info.avd- >>>>>>>>>>>>>> msg_info.n2d_su_si_assign.ha_state], >>>>>>>>>>>>> + su->name.value); >>>>>>>>>>>>> + } >>>>>>>>>>>>> >>>>>>>>>>>>> - rc = avnd_di_msg_send(cb, &msg); >>>>>>>>>>>>> - if (NCSCC_RC_SUCCESS == rc) >>>>>>>>>>>>> - msg.info.avd = 0; >>>>>>>>>>>>> - >>>>>>>>>>>>> - /* we have completed the SU SI msg processing */ >>>>>>>>>>>>> - if (su_assign_state_is_stable(su)) >>>>>>>>>>>>> - m_AVND_SU_ASSIGN_PEND_RESET(su); >>>>>>>>>>>>> - m_AVND_SU_ALL_SI_RESET(su); >>>>>>>>>>>>> + if (cb->is_avd_down == true) { >>>>>>>>>>>>> + // We are in headless, buffer this msg >>>>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_id = 0; >>>>>>>>>>>>> + if (avnd_diq_rec_add(cb, &msg) == nullptr) { >>>>>>>>>>>>> + rc = NCSCC_RC_FAILURE; >>>>>>>>>>>>> + } >>>>>>>>>>>>> + m_AVND_SU_ALL_SI_RESET(su); >>>>>>>>>>>>> + LOG_NO("avnd_di_susi_resp_send() deferred as AMF >>>>>>>>>>>>> director is offline"); >>>>>>>>>>>>> + } else { >>>>>>>>>>>>> + // We are in normal cluster, send msg to director >>>>>>>>>>>>> + msg.info.avd->msg_info.n2d_su_si_assign.msg_id = >>>>>>>>>>>>> + ++(cb- >>>>>>>>>>>>>> snd_msg_id); >>>>>>>>>>>>> + /* send the msg to AvD */ >>>>>>>>>>>>> + rc = avnd_di_msg_send(cb, &msg); >>>>>>>>>>>>> + if (NCSCC_RC_SUCCESS == rc) >>>>>>>>>>>>> + msg.info.avd = 0; >>>>>>>>>>>>> + /* we have completed the SU SI msg processing */ >>>>>>>>>>>>> + if (su_assign_state_is_stable(su)) { >>>>>>>>>>>>> + m_AVND_SU_ASSIGN_PEND_RESET(su); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + m_AVND_SU_ALL_SI_RESET(su); >>>>>>>>>>>>> + } >>>>>>>>>>>>> >>>>>>>>>>>>> /* free the contents of avnd message */ >>>>>>>>>>>>> avnd_msg_content_free(cb, &msg); @@ -1256,14 >>>>>>>>>>>>> +1263,7 >>>>> @@ >>>>>>>>> void >>>>>>>>>>>>> avnd_diq_rec_del(AVND_CB *cb, AVND_ >>>>>>>>>>>>> /* stop the AvD msg response timer */ >>>>>>>>>>>>> if (m_AVND_TMR_IS_ACTIVE(rec->resp_tmr)) { >>>>>>>>>>>>> m_AVND_TMR_MSG_RESP_STOP(cb, *rec); >>>>>>>>>>>>> - // Resend msgs from queue because amfd dropped during >>>>>>>>>>>>> sync >>>>>>>>>>>>> - if ((cb->dnd_list.head != nullptr)) { >>>>>>>>>>>>> - TRACE("retransmit message to amfd"); >>>>>>>>>>>>> - AVND_DND_MSG_LIST *pending_rec = 0; >>>>>>>>>>>>> - for (pending_rec = cb->dnd_list.head; pending_rec != >>>>>>>>>>>>> nullptr; pending_rec = pending_rec->next) { >>>>>>>>>>>>> - avnd_diq_rec_send(cb, pending_rec); >>>>>>>>>>>>> - } >>>>>>>>>>>>> - } >>>>>>>>>>>>> + avnd_diq_rec_send_buffered_msg(cb); >>>>>>>>>>>>> /* resend pg start track */ >>>>>>>>>>>>> avnd_di_resend_pg_start_track(cb); >>>>>>>>>>>>> } >>>>>>>>>>>>> @@ -1276,6 +1276,73 @@ void avnd_diq_rec_del(AVND_CB >>>> *cb, >>>>>>>>>> AVND_ >>>>>>>>>>>>> TRACE_LEAVE(); >>>>>>>>>>>>> return; >>>>>>>>>>>>> } >>>>>>>>>>>>> >> +/************************************************************ >>>>>>>>>>>>> **************** >>>>>>>>>>>>> + Name : avnd_diq_rec_send_buffered_msg >>>>>>>>>>>>> + >>>>>>>>>>>>> + Description : Resend buffered msg >>>>>>>>>>>>> + >>>>>>>>>>>>> + Arguments : cb - ptr to the AvND control block >>>>>>>>>>>>> + >>>>>>>>>>>>> + Return Values : None. >>>>>>>>>>>>> + >>>>>>>>>>>>> + Notes : None. >>>>>>>>>>>>> >> +************************************************************* >>>>>>>>>>>>> ********** >>>>>>>>>>>>> +*******/ void avnd_diq_rec_send_buffered_msg(AVND_CB >>>> *cb) >>>>> { >>>>>>>>>>>>> + TRACE_ENTER(); >>>>>>>>>>>>> + // Resend msgs from queue because amfnd dropped during >>>>>>>>> headless >>>>>>>>>>>>> + // or headless-synchronization >>>>>>>>>>>>> + if ((cb->dnd_list.head != nullptr)) { >>>>>>>>>>>>> + AVND_DND_MSG_LIST *pending_rec = 0; >>>>>>>>>>>>> + TRACE("Attach msg_id of buffered msg"); >>>>>>>>>>>>> + bool found = true; >>>>>>>>>>>>> + while (found) { >>>>>>>>>>>>> + found = false; >>>>>>>>>>>>> + for (pending_rec = cb->dnd_list.head; >>>>>>>>>>>>> + pending_rec != >>>>>>>>>>>>> nullptr; pending_rec = pending_rec->next) { >>>>>>>>>>>>> + if (pending_rec->msg.type == >>>>>>>>>>>>> AVND_MSG_AVD) { >>>>>>>>>>>>> + // At this moment, only oper_state >>>>>>>>>>>>> msg needs to report to director >>>>>>>>>>>>> + if (pending_rec->msg.info.avd- >>>>>>>>>>>>>> msg_type == AVSV_N2D_INFO_SU_SI_ASSIGN_MSG && >>>>>>>>>>>>> + pending_rec->msg.info.avd- >>>>>>>>>>>>>> msg_info.n2d_su_si_assign.msg_id == 0) { >>>>>>>>>>>>> + m_AVND_DIQ_REC_POP(cb, >>>>>>>>>>>>> pending_rec); #if 0 >>>>>>>>>>>>> + // only resend if this SUSI >>>>>>>>>>>>> does exist >>>>>>>>>>>>> + AVND_SU *su = >>>>>>>>>>>>> m_AVND_SUDB_REC_GET(cb->sudb, >>>>>>>>>>>>> + pending_rec- >>>>>>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.su_name); >>>>>>>>>>>>> + if (su != nullptr && su- >>>>>>>>>>>>>> si_list.n_nodes > 0) { #endif >>>>>>>>>>>>> + pending_rec- >>>>>>>>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.msg_id = >>>>>>>>>>>>>> ++(cb->snd_msg_id); >>>>>>>>>>>>> + >>>>>>>>>>>>> m_AVND_DIQ_REC_PUSH(cb, pending_rec); >>>>>>>>>>>>> + LOG_NO("Found and >>>>>>>>>>>>> resend buffered su_si_assign msg for SU:'%s', " >>>>>>>>>>>>> + >>>>>>>>>>>>> "SI:'%s', ha_state:'%u', msg_act:'%u', single_csi:'%u', " >>>>>>>>>>>>> + >>>>>>>>>>>>> "error:'%u', msg_id:'%u'", >>>>>>>>>>>>> + >>>>>>>>>>>>> pending_rec->msg.info.avd- >>>>>>>>>>>>>> msg_info.n2d_su_si_assign.su_name.value, >>>>>>>>>>>>> + >>>>>>>>>>>>> pending_rec->msg.info.avd- >>>>>>>>>>>>>> msg_info.n2d_su_si_assign.si_name.value, >>>>>>>>>>>>> + >>>>>>>>>>>>> >>>>>>>>>>>>> pending_rec->msg.info.avd- >>>>>> msg_info.n2d_su_si_assign.ha_state, >>>>>>>>>>>>> + >>>>>>>>>>>>> >>>>>>>>>>>>> pending_rec->msg.info.avd- >>>>> msg_info.n2d_su_si_assign.msg_act, >>>>>>>>>>>>> + >>>>>>>>>>>>> >>>>>>>>>>>>> pending_rec->msg.info.avd- >>>>>> msg_info.n2d_su_si_assign.single_csi >>>>>>>>>>>>> , >>>>>>>>>>>>> + >>>>>>>>>>>>> >>>>>>>>>>>>> pending_rec->msg.info.avd- >>> msg_info.n2d_su_si_assign.error, >>>>>>>>>>>>> + >>>>>>>>>>>>> >>>>>>>>>>>>> pending_rec->msg.info.avd- >>>>> msg_info.n2d_su_si_assign.msg_id); >>>>>>>>>>>>> + >>>>>>>>>>>>> +#if 0 >>>>>>>>>>>>> + } else { >>>>>>>>>>>>> + >>>>>>>>>>>>> avnd_msg_content_free(cb, &pending_rec->msg); >>>>>>>>>>>>> + delete pending_rec; >>>>>>>>>>>>> + pending_rec = cb- >>>>>>>>>>>>>> dnd_list.head; >>>>>>>>>>>>> + } >>>>>>>>>>>>> +#endif >>>>>>>>>>>>> + found = true; >>>>>>>>>>>>> + } >>>>>>>>>>>>> + } >>>>>>>>>>>>> + } >>>>>>>>>>>>> + } >>>>>>>>>>>>> + TRACE("retransmit message to amfd"); >>>>>>>>>>>>> + for (pending_rec = cb->dnd_list.head; pending_rec >>>>>>>>>>>>> +!= nullptr; >>>>>>>>>>>>> pending_rec = pending_rec->next) { >>>>>>>>>>>>> + avnd_diq_rec_send(cb, pending_rec); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + } >>>>>>>>>>>>> + TRACE_LEAVE(); >>>>>>>>>>>>> + return; >>>>>>>>>>>>> +} >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >> /************************************************************* >>>>>>>>>>>>> *************** >>>>>>>>>>>>> Name : avnd_diq_rec_send >>>>>>>>>>>>> diff --git a/osaf/services/saf/amf/amfnd/include/avnd_di.h >>>>>>>>>>>>> b/osaf/services/saf/amf/amfnd/include/avnd_di.h >>>>>>>>>>>>> --- a/osaf/services/saf/amf/amfnd/include/avnd_di.h >>>>>>>>>>>>> +++ b/osaf/services/saf/amf/amfnd/include/avnd_di.h >>>>>>>>>>>>> @@ -79,6 +79,7 @@ void avnd_di_msg_ack_process(struct >>>> avnd >>>>>>> void >>>>>>>>>>>>> avnd_diq_del(struct avnd_cb_tag *); AVND_DND_MSG_LIST >>>>>>>>>>>>> *avnd_diq_rec_add(struct avnd_cb_tag *cb, AVND_MSG >> *msg); >>>>> void >>>>>>>>>>>>> avnd_diq_rec_del(struct avnd_cb_tag *cb, >>>> AVND_DND_MSG_LIST >>>>>>>>> *rec); >>>>>>>>>>>>> +void avnd_diq_rec_send_buffered_msg(struct avnd_cb_tag >>>> *cb); >>>>>>>>>>>>> uint32_t avnd_diq_rec_send(struct avnd_cb_tag *cb, >>>>>>>>>>>>> AVND_DND_MSG_LIST *rec); uint32_t >>>>>>> avnd_di_reg_su_rsp_snd(struct >>>>>>>>>>>>> avnd_cb_tag *cb, SaNameT *su_name, uint32_t ret_code); >>>>>>>>>>>>> uint32_t avnd_di_ack_nack_msg_send(struct avnd_cb_tag >> *cb, >>>>>>>>>>>>> uint32_t rcv_id, uint32_t view_num); >>>>>>>>> ---------------------------------------------------------------- >>>>>>>>> - >>>>>>>>> -- >>>>>>>>> -- >>>>>>>>> --------- _______________________________________________ >>>>>>>>> Opensaf-devel mailing list >>>>>>>>> Opensaf-devel@lists.sourceforge.net >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >>>> --------------------------------------------------------------------- >>>> --------- _______________________________________________ >>>> Opensaf-devel mailing list >>>> Opensaf-devel@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel