Hi Nagu, The #2162 has two patches. I think your ack is for [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]. Does the other one ([PATCH 1 of 2] AMFD: Fix SC failover during headless sync at INIT_DONE state [#2162]) look ok?
Thanks, Minh On 14/02/17 20:40, Nagendra Kumar wrote: > Ack. > Tested the scenarios. > > Thanks > -Nagu > >> -----Original Message----- >> From: minh chau [mailto:minh.c...@dektech.com.au] >> Sent: 23 January 2017 16:24 >> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya; >> gary....@dektech.com.au >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync >> before standby AMFD comes up [#2162] >> >> Hi Nagu, >> >> I am checking the logs now. >> >> Thanks, Minh >> >> On 23/01/17 17:47, Nagendra Kumar wrote: >>> The logs (Logs-tc.rar) attached in the ticket. >>> >>> Thanks >>> -Nagu >>> >>>> -----Original Message----- >>>> From: minh chau [mailto:minh.c...@dektech.com.au] >>>> Sent: 16 January 2017 05:47 >>>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya; >>>> gary....@dektech.com.au >>>> Cc: opensaf-devel@lists.sourceforge.net >>>> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless >>>> sync before standby AMFD comes up [#2162] >>>> >>>> Hi Nagu, >>>> >>>> I misunderstood your point, and now I get it. >>>> In my test I see it works as expected - SU2 becomes Act and no >>>> assignment for SU1 I guess in your test some how the cluster >>>> initiation timer has not been started on SC2 (new active), there could be a >> missing case in the patch. >>>> Could you please share me the trace? >>>> >>>> Thanks, >>>> Minh >>>> >>>> On 13/01/17 21:48, Nagendra Kumar wrote: >>>>> Hi Minh, >>>>> Please check my response inlined with [Nagu]. >>>>> >>>>> Thanks >>>>> -Nagu >>>>>> -----Original Message----- >>>>>> From: minh chau [mailto:minh.c...@dektech.com.au] >>>>>> Sent: 13 January 2017 03:53 >>>>>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya; >>>>>> gary....@dektech.com.au >>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless >>>>>> sync before standby AMFD comes up [#2162] >>>>>> >>>>>> Hi Nagu, >>>>>> >>>>>> Thanks for reviewing, please see comments inline. >>>>>> >>>>>> Thanks, >>>>>> Minh >>>>>> >>>>>> On 12/01/17 21:48, Nagendra Kumar wrote: >>>>>>> Hi Minh, >>>>>>> Though I am not able to simulate the problem, I tested as >>>>>>> below: >>>>>>> 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and >>>>>>> SU2 on >>>>>> PL-4 as Standby. >>>>>>> 2. Stop SC1 and SC2 and then stop PL-3. >>>>>>> 3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete, stop >>>>>>> SC1. SC2 >>>>>> becomes Act. >>>>>> [M]: As SU1 is on PL3, SU2 is on PL4, and If PL-3 is stopped, then >>>>>> only >>>>>> SU2 has active assignment >>>>> [Nagu]: PL-3 is stopped in step #2. >>>>>>> In this case, SC-2 contains both SU1(Act) and SU2(Standby) >> assignments. >>>>>>> Ideally, SU2 assignments should have been Act and there shouldn't >>>>>>> be >>>>>>> SU1 >>>>>> assignment. >>>>>> [M]: This seems to be another test where SU1 and SU2 are hosted on >>>>>> SC2, then both SU1 and SU2 should get assignment >>>>> [Nagu]: I mean to say command 'amf-state siass' run on SC-1 displays >>>>> both >>>> SU1 and SU2 assignments. >>>>> SU1 and SU2 are hosted on PL-3 and PL-4 respectively. >>>>> This is similar test case, which is mentioned in the ticket? >> safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >>>>>> mo,safApp=AmfDemo1 >>>>>>> saAmfSISUHAState=ACTIVE(1) >>>>>>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) >>>>>>> >> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >>>>>> mo,safApp=AmfDemo1 >>>>>>> saAmfSISUHAState=STANDBY(2) >>>>>>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) >>>>>>> >>>>>>> Please check. >>>>>>> >>>>>>> Thanks >>>>>>> -Nagu >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] >>>>>>>> Sent: 08 November 2016 08:53 >>>>>>>> To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen >> Malviya; >>>>>>>> gary....@dektech.com.au; minh.c...@dektech.com.au >>>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>>> Subject: [PATCH 2 of 2] AMFND: Fix SC failover during headless >>>>>>>> sync before standby AMFD comes up [#2162] >>>>>>>> >>>>>>>> osaf/services/saf/amf/amfnd/di.cc | 7 +++++-- >>>>>>>> osaf/services/saf/amf/amfnd/susm.cc | 6 ++++++ >>>>>>>> 2 files changed, 11 insertions(+), 2 deletions(-) >>>>>>>> >>>>>>>> >>>>>>>> This case of SC failover causes new active AMFD getting stuck in >>>>>>>> node_up messages >>>>>>>> >>>>>>>> Say first active controller is SC1, which goes down during headless >> sync. >>>>>>>> Therefore, the amfnd on SC2 receives mds_down of AVD, then both >>>>>>>> is_avd_down and amfd_sync_required are set to true. When SC2 >>>>>>>> takes over active role, amfnd on SC2 receives mds_up, but only >>>>>>>> is_avd_down is set to false and the variable amfd_sync_required >>>> remains true. >>>>>>>> When amfnd-SC2 finishes initiating middleware SU, it needs to >>>>>>>> send su_oper message to AMFD, but it is failed to send out due to >>>>>> amfd_sync_required. >>>>>>>> In this scenario of SC failover, amfd_sync_required needs to set >>>>>>>> to false when amfnd on SC2 receives su_pres message on >> middleware >>>> SUs. >>>>>>>> That means amfnd on active controller does not need to wait for >>>>>>>> set_leds message, to be informed that cluster initiation is done, >>>>>>>> so that amfnd can sen su_oper messages to AMFD. This logic also >>>>>>>> aligns with normal headless scenario, where amfnd on active >>>>>>>> controller has amfd_sync_required initially marked as false >>>>>>>> because no middleware SUs are initiated. When amfd_sync_required >>>>>>>> is true that means amfnd all middleware SUs are initiated and >>>>>>>> assigned before headless, thus amfnd needs to wait for cluster >>>>>>>> initiation after >>>> headless. >>>>>>>> diff --git a/osaf/services/saf/amf/amfnd/di.cc >>>>>>>> b/osaf/services/saf/amf/amfnd/di.cc >>>>>>>> --- a/osaf/services/saf/amf/amfnd/di.cc >>>>>>>> +++ b/osaf/services/saf/amf/amfnd/di.cc >>>>>>>> @@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb, >>>>>>>> if (avnd_diq_rec_add(cb, &msg) == nullptr) { >>>>>>>> rc = NCSCC_RC_FAILURE; >>>>>>>> } >>>>>>>> - LOG_NO("avnd_di_oper_send() deferred as AMF >> director is >>>>>>>> offline"); >>>>>>>> + LOG_NO("avnd_di_oper_send() deferred as AMF >> director is >>>>>>>> offline(%d)," >>>>>>>> + " or sync is required(%d)", cb->is_avd_down, >>>>>>>> +cb->amfd_sync_required); >>>>>>>> } else { >>>>>>>> // We are in normal cluster, send msg to director >>>>>>>> msg.info.avd->msg_info.n2d_opr_state.msg_id = >> ++(cb- >>>>>>>>> snd_msg_id); @@ -881,7 +882,9 @@ uint32_t >>>>>>>> avnd_di_susi_resp_send(AVND_CB >>>>>>>> rc = NCSCC_RC_FAILURE; >>>>>>>> } >>>>>>>> m_AVND_SU_ALL_SI_RESET(su); >>>>>>>> - LOG_NO("avnd_di_susi_resp_send() deferred as AMF >>>>>> director is >>>>>>>> offline"); >>>>>>>> + LOG_NO("avnd_di_susi_resp_send() deferred as AMF >>>>>>>> + director is >>>>>>>> offline(%d)," >>>>>>>> + " or sync is required(%d)", >>>>>>>> + cb->is_avd_down, >>>>>>>> + cb->amfd_sync_required); >>>>>>>> + >>>>>>>> } else { >>>>>>>> // We are in normal cluster, send msg to director >>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.msg_id = >>>>>>>> ++(cb- >>>>>>>>> snd_msg_id); diff --git a/osaf/services/saf/amf/amfnd/susm.cc >>>>>>>> b/osaf/services/saf/amf/amfnd/susm.cc >>>>>>>> --- a/osaf/services/saf/amf/amfnd/susm.cc >>>>>>>> +++ b/osaf/services/saf/amf/amfnd/susm.cc >>>>>>>> @@ -1345,6 +1345,12 @@ uint32_t >>>> avnd_evt_avd_su_pres_evh(AVND_C >>>>>>>> goto done; >>>>>>>> } >>>>>>>> } else { /* => instantiate the su */ >>>>>>>> + // Do not need to wait for headless sync if there is no >>>>>>>> application SUs >>>>>>>> + // initiated. This is known because here we are >> receiving >>>>>>>> su_pres message >>>>>>>> + // for NCS SUs >>>>>>>> + if (su->is_ncs == true) >>>>>>>> + cb->amfd_sync_required = false; >>>>>>>> + >>>>>>>> AVND_EVT *evt_ir = 0; >>>>>>>> TRACE("Sending to Imm thread."); >>>>>>>> evt_ir = avnd_evt_create(cb, AVND_EVT_IR, 0, >> nullptr, >>>>>>>> &info- >>>>>>>>> su_name, 0, 0); ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel