Hi Nagu, I misunderstood your point, and now I get it. In my test I see it works as expected - SU2 becomes Act and no assignment for SU1 I guess in your test some how the cluster initiation timer has not been started on SC2 (new active), there could be a missing case in the patch. Could you please share me the trace?
Thanks, Minh On 13/01/17 21:48, Nagendra Kumar wrote: > Hi Minh, > Please check my response inlined with [Nagu]. > > Thanks > -Nagu >> -----Original Message----- >> From: minh chau [mailto:minh.c...@dektech.com.au] >> Sent: 13 January 2017 03:53 >> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya; >> gary....@dektech.com.au >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync >> before standby AMFD comes up [#2162] >> >> Hi Nagu, >> >> Thanks for reviewing, please see comments inline. >> >> Thanks, >> Minh >> >> On 12/01/17 21:48, Nagendra Kumar wrote: >>> Hi Minh, >>> Though I am not able to simulate the problem, I tested as below: >>> 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and SU2 on >> PL-4 as Standby. >>> 2. Stop SC1 and SC2 and then stop PL-3. >>> 3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete, stop SC1. SC2 >> becomes Act. >> [M]: As SU1 is on PL3, SU2 is on PL4, and If PL-3 is stopped, then only >> SU2 has active assignment > [Nagu]: PL-3 is stopped in step #2. >>> In this case, SC-2 contains both SU1(Act) and SU2(Standby) assignments. >>> Ideally, SU2 assignments should have been Act and there shouldn't be SU1 >> assignment. >> [M]: This seems to be another test where SU1 and SU2 are hosted on SC2, >> then both SU1 and SU2 should get assignment > [Nagu]: I mean to say command 'amf-state siass' run on SC-1 displays both SU1 > and SU2 assignments. > SU1 and SU2 are hosted on PL-3 and PL-4 respectively. > This is similar test case, which is mentioned in the ticket? >>> >> safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >> mo,safApp=AmfDemo1 >>> saAmfSISUHAState=ACTIVE(1) >>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) >>> >> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe >> mo,safApp=AmfDemo1 >>> saAmfSISUHAState=STANDBY(2) >>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) >>> >>> Please check. >>> >>> Thanks >>> -Nagu >>> >>>> -----Original Message----- >>>> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] >>>> Sent: 08 November 2016 08:53 >>>> To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen Malviya; >>>> gary....@dektech.com.au; minh.c...@dektech.com.au >>>> Cc: opensaf-devel@lists.sourceforge.net >>>> Subject: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync >>>> before standby AMFD comes up [#2162] >>>> >>>> osaf/services/saf/amf/amfnd/di.cc | 7 +++++-- >>>> osaf/services/saf/amf/amfnd/susm.cc | 6 ++++++ >>>> 2 files changed, 11 insertions(+), 2 deletions(-) >>>> >>>> >>>> This case of SC failover causes new active AMFD getting stuck in >>>> node_up messages >>>> >>>> Say first active controller is SC1, which goes down during headless sync. >>>> Therefore, the amfnd on SC2 receives mds_down of AVD, then both >>>> is_avd_down and amfd_sync_required are set to true. When SC2 takes >>>> over active role, amfnd on SC2 receives mds_up, but only is_avd_down >>>> is set to false and the variable amfd_sync_required remains true. >>>> When amfnd-SC2 finishes initiating middleware SU, it needs to send >>>> su_oper message to AMFD, but it is failed to send out due to >> amfd_sync_required. >>>> In this scenario of SC failover, amfd_sync_required needs to set to >>>> false when amfnd on SC2 receives su_pres message on middleware SUs. >>>> That means amfnd on active controller does not need to wait for >>>> set_leds message, to be informed that cluster initiation is done, so >>>> that amfnd can sen su_oper messages to AMFD. This logic also aligns >>>> with normal headless scenario, where amfnd on active controller has >>>> amfd_sync_required initially marked as false because no middleware >>>> SUs are initiated. When amfd_sync_required is true that means amfnd >>>> all middleware SUs are initiated and assigned before headless, thus >>>> amfnd needs to wait for cluster initiation after headless. >>>> >>>> diff --git a/osaf/services/saf/amf/amfnd/di.cc >>>> b/osaf/services/saf/amf/amfnd/di.cc >>>> --- a/osaf/services/saf/amf/amfnd/di.cc >>>> +++ b/osaf/services/saf/amf/amfnd/di.cc >>>> @@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb, >>>> if (avnd_diq_rec_add(cb, &msg) == nullptr) { >>>> rc = NCSCC_RC_FAILURE; >>>> } >>>> - LOG_NO("avnd_di_oper_send() deferred as AMF director is >>>> offline"); >>>> + LOG_NO("avnd_di_oper_send() deferred as AMF director is >>>> offline(%d)," >>>> + " or sync is required(%d)", cb->is_avd_down, >>>> +cb->amfd_sync_required); >>>> } else { >>>> // We are in normal cluster, send msg to director >>>> msg.info.avd->msg_info.n2d_opr_state.msg_id = ++(cb- >>>>> snd_msg_id); @@ -881,7 +882,9 @@ uint32_t >>>> avnd_di_susi_resp_send(AVND_CB >>>> rc = NCSCC_RC_FAILURE; >>>> } >>>> m_AVND_SU_ALL_SI_RESET(su); >>>> - LOG_NO("avnd_di_susi_resp_send() deferred as AMF >> director is >>>> offline"); >>>> + LOG_NO("avnd_di_susi_resp_send() deferred as AMF >>>> + director is >>>> offline(%d)," >>>> + " or sync is required(%d)", cb->is_avd_down, >>>> + cb->amfd_sync_required); >>>> + >>>> } else { >>>> // We are in normal cluster, send msg to director >>>> msg.info.avd->msg_info.n2d_su_si_assign.msg_id = ++(cb- >>>>> snd_msg_id); diff --git a/osaf/services/saf/amf/amfnd/susm.cc >>>> b/osaf/services/saf/amf/amfnd/susm.cc >>>> --- a/osaf/services/saf/amf/amfnd/susm.cc >>>> +++ b/osaf/services/saf/amf/amfnd/susm.cc >>>> @@ -1345,6 +1345,12 @@ uint32_t avnd_evt_avd_su_pres_evh(AVND_C >>>> goto done; >>>> } >>>> } else { /* => instantiate the su */ >>>> + // Do not need to wait for headless sync if there is no >>>> application SUs >>>> + // initiated. This is known because here we are receiving >>>> su_pres message >>>> + // for NCS SUs >>>> + if (su->is_ncs == true) >>>> + cb->amfd_sync_required = false; >>>> + >>>> AVND_EVT *evt_ir = 0; >>>> TRACE("Sending to Imm thread."); >>>> evt_ir = avnd_evt_create(cb, AVND_EVT_IR, 0, nullptr, >>>> &info- >>>>> su_name, 0, 0); ------------------------------------------------------------------------------ Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel