Hi Nagu, This patch is just for a corner case, where failover happens in between AVD_INIT_DONE and AVD_APP_STATE, we still have to reboot the node if out of cold sync happens. So I think we still have to keep that sentence in Compliance Table.
Thanks, Minh On 16/02/17 17:24, Nagendra Kumar wrote: > > Hi Minh, > > Good catch !! Yes, please push, but as such we have documented in > Compliance Table that “Before the timer expiry, failover and > switchover are not supported.” > > Thanks > > -Nagu > > *From:*minh chau [mailto:minh.c...@dektech.com.au] > *Sent:* 16 February 2017 07:36 > *To:* Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya; > gary....@dektech.com.au > *Cc:* opensaf-devel@lists.sourceforge.net > *Subject:* Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless > sync before standby AMFD comes up [#2162] > > Hi Nagu, > > Thanks for reminding, there's one change in the patch that could > affect on upgrade too, it is: > > + // The cb->init_state must be AVD_INIT_DONE or > AVD_APP_STATE > + // If AVD_INIT_DONE, there was a SC failover during > cluster > + // instantiation phase in cluster (after all NCS SU > is assigned) > + // If AVD_APP_STATE, this should be come from 2N-MW > SI swap > *+ if (cb->init_state >= AVD_INIT_DONE) {* > + if (cluster_su_instantiation_done(cb, nullptr) == > true) { > + cluster_startup_expiry_event_generate(cb); > + } else { > + m_AVD_CLINIT_TMR_START(cb); > + } > + } > > So, I would like to make it for AVD_INIT_DONE only, it looks like > > + // The cb->init_state must be AVD_INIT_DONE or > AVD_APP_STATE > + // If AVD_INIT_DONE, there was a SC failover during > cluster > + // instantiation phase in cluster (after all NCS SU > is assigned) > *+ if (cb->init_state == AVD_INIT_DONE) {* > + if (cluster_su_instantiation_done(cb, nullptr) == > true) { > + cluster_startup_expiry_event_generate(cb); > + } else { > + m_AVD_CLINIT_TMR_START(cb); > + } > + } > > If you agree, I can push the patches with new change. > > Thanks, > Minh > > On 15/02/17 15:13, Nagendra Kumar wrote: > > Yes, ack for both the patches. I assume you would have tested upgrade > scenarios. > > Thanks > > -Nagu > > -----Original Message----- > > From: minh chau [mailto:minh.c...@dektech.com.au] > > Sent: 15 February 2017 08:52 > > To: Nagendra Kumar;hans.nordeb...@ericsson.com > <mailto:hans.nordeb...@ericsson.com>; Praveen Malviya; > > gary....@dektech.com.au <mailto:gary....@dektech.com.au> > > Cc:opensaf-devel@lists.sourceforge.net > <mailto:opensaf-devel@lists.sourceforge.net> > > Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless > sync > > before standby AMFD comes up [#2162] > > Hi Nagu, > > The #2162 has two patches. I think your ack is for [PATCH 2 of 2] > AMFND: > > Fix SC failover during headless sync before standby AMFD comes up > [#2162]. > > Does the other one ([PATCH 1 of 2] AMFD: Fix SC failover during > headless > > sync at INIT_DONE state [#2162]) look ok? > > Thanks, > > Minh > > On 14/02/17 20:40, Nagendra Kumar wrote: > > Ack. > > Tested the scenarios. > > Thanks > > -Nagu > > -----Original Message----- > > From: minh chau [mailto:minh.c...@dektech.com.au] > > Sent: 23 January 2017 16:24 > > To: Nagendra Kumar;hans.nordeb...@ericsson.com > <mailto:hans.nordeb...@ericsson.com>; Praveen Malviya; > > gary....@dektech.com.au <mailto:gary....@dektech.com.au> > > Cc:opensaf-devel@lists.sourceforge.net > <mailto:opensaf-devel@lists.sourceforge.net> > > Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during > headless > > sync before standby AMFD comes up [#2162] > > Hi Nagu, > > I am checking the logs now. > > Thanks, Minh > > On 23/01/17 17:47, Nagendra Kumar wrote: > > The logs (Logs-tc.rar) attached in the ticket. > > Thanks > > -Nagu > > -----Original Message----- > > From: minh chau [mailto:minh.c...@dektech.com.au] > > Sent: 16 January 2017 05:47 > > To: Nagendra Kumar;hans.nordeb...@ericsson.com > <mailto:hans.nordeb...@ericsson.com>; Praveen Malviya; > > gary....@dektech.com.au > <mailto:gary....@dektech.com.au> > > Cc:opensaf-devel@lists.sourceforge.net > <mailto:opensaf-devel@lists.sourceforge.net> > > Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover > during headless > > sync before standby AMFD comes up [#2162] > > Hi Nagu, > > I misunderstood your point, and now I get it. > > In my test I see it works as expected - SU2 becomes > Act and no > > assignment for SU1 I guess in your test some how the > cluster > > initiation timer has not been started on SC2 (new > active), there > > could be a > > missing case in the patch. > > Could you please share me the trace? > > Thanks, > > Minh > > On 13/01/17 21:48, Nagendra Kumar wrote: > > Hi Minh, > > Please check my response inlined with [Nagu]. > > Thanks > > -Nagu > > -----Original Message----- > > From: minh chau > [mailto:minh.c...@dektech.com.au] > > Sent: 13 January 2017 03:53 > > To: Nagendra Kumar;hans.nordeb...@ericsson.com > <mailto:hans.nordeb...@ericsson.com>; Praveen > > Malviya; > > gary....@dektech.com.au > <mailto:gary....@dektech.com.au> > > Cc:opensaf-devel@lists.sourceforge.net > <mailto:opensaf-devel@lists.sourceforge.net> > > Subject: Re: [PATCH 2 of 2] AMFND: Fix SC > failover during > > headless sync before standby AMFD comes up > [#2162] > > Hi Nagu, > > Thanks for reviewing, please see comments > inline. > > Thanks, > > Minh > > On 12/01/17 21:48, Nagendra Kumar wrote: > > Hi Minh, > > Though I am not able to simulate > the problem, I tested as > > below: > > 1. Start SC1, SC2, PL-3 and PL-4. > Configure SU1 on PL-3 as Act > > and > > SU2 on > > PL-4 as Standby. > > 2. Stop SC1 and SC2 and then stop PL-3. > > 3. Start SC-1 and SC-2. When SC-2 prints > Cold sync complete, > > stop SC1. SC2 > > becomes Act. > > [M]: As SU1 is on PL3, SU2 is on PL4, and If > PL-3 is stopped, > > then only > > SU2 has active assignment > > [Nagu]: PL-3 is stopped in step #2. > > In this case, SC-2 contains both SU1(Act) > and SU2(Standby) > > assignments. > > Ideally, SU2 assignments should have been > Act and there > > shouldn't be > > SU1 > > assignment. > > [M]: This seems to be another test where SU1 > and SU2 are hosted > > on SC2, then both SU1 and SU2 should get > assignment > > [Nagu]: I mean to say command 'amf-state siass' > run on SC-1 > > displays both > > SU1 and SU2 assignments. > > SU1 and SU2 are hosted on > PL-3 and PL-4 respectively. > > This is similar test case, which is mentioned in > the ticket? > > safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe > > mo,safApp=AmfDemo1 > > saAmfSISUHAState=ACTIVE(1) > > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe > > mo,safApp=AmfDemo1 > > saAmfSISUHAState=STANDBY(2) > > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > Please check. > > Thanks > > -Nagu > > -----Original Message----- > > From: Minh Hon Chau > [mailto:minh.c...@dektech.com.au] > > Sent: 08 November 2016 08:53 > > To:hans.nordeb...@ericsson.com > <mailto:hans.nordeb...@ericsson.com>; > Nagendra Kumar; Praveen > > Malviya; > > gary....@dektech.com.au > > <mailto:gary....@dektech.com.au>;minh.c...@dektech.com.au > <mailto:minh.c...@dektech.com.au> > > Cc:opensaf-devel@lists.sourceforge.net > > <mailto:opensaf-devel@lists.sourceforge.net> > > Subject: [PATCH 2 of 2] AMFND: Fix SC > failover during headless > > sync before standby AMFD comes up > [#2162] > > > osaf/services/saf/amf/amfnd/di.cc | 7 +++++-- > > > osaf/services/saf/amf/amfnd/susm.cc | 6 ++++++ > > 2 files changed, 11 > insertions(+), 2 deletions(-) > > This case of SC failover causes new > active AMFD getting stuck > > in node_up messages > > Say first active controller is SC1, > which goes down during > > headless > > sync. > > Therefore, the amfnd on SC2 receives > mds_down of AVD, then > > both > > is_avd_down and amfd_sync_required > are set to true. When SC2 > > takes over active role, amfnd on SC2 > receives mds_up, but only > > is_avd_down is set to false and the > variable amfd_sync_required > > remains true. > > When amfnd-SC2 finishes initiating > middleware SU, it needs to > > send su_oper message to AMFD, but it > is failed to send out due > > to > > amfd_sync_required. > > In this scenario of SC failover, > amfd_sync_required needs to > > set to false when amfnd on SC2 > receives su_pres message on > > middleware > > SUs. > > That means amfnd on active controller > does not need to wait for > > set_leds message, to be informed that > cluster initiation is > > done, so that amfnd can sen su_oper > messages to AMFD. This > > logic also aligns with normal > headless scenario, where amfnd on > > active controller has > amfd_sync_required initially marked as > > false because no middleware SUs are > initiated. When > > amfd_sync_required is true that means > amfnd all middleware SUs > > are initiated and assigned before > headless, thus amfnd needs to > > wait for cluster initiation after > > headless. > > diff --git > a/osaf/services/saf/amf/amfnd/di.cc > > b/osaf/services/saf/amf/amfnd/di.cc > > --- > a/osaf/services/saf/amf/amfnd/di.cc > > +++ > b/osaf/services/saf/amf/amfnd/di.cc > > @@ -748,7 +748,8 @@ uint32_t > avnd_di_oper_send(AVND_CB > > *cb, > > if > (avnd_diq_rec_add(cb, &msg) == nullptr) { > > rc = > NCSCC_RC_FAILURE; > > } > > - > LOG_NO("avnd_di_oper_send() deferred as AMF > > director is > > offline"); > > + > LOG_NO("avnd_di_oper_send() deferred as AMF > > director is > > offline(%d)," > > + " or sync is > required(%d)", cb->is_avd_down, > > +cb->amfd_sync_required); > > } else { > > // We are in normal > cluster, send msg to director > > > msg.info.avd->msg_info.n2d_opr_state.msg_id = > > ++(cb- > > snd_msg_id); @@ -881,7 +882,9 @@ > uint32_t > > avnd_di_susi_resp_send(AVND_CB > > rc = > NCSCC_RC_FAILURE; > > } > > > m_AVND_SU_ALL_SI_RESET(su); > > - > LOG_NO("avnd_di_susi_resp_send() deferred as AMF > > director is > > offline"); > > + > LOG_NO("avnd_di_susi_resp_send() deferred as > > + AMF director is > > offline(%d)," > > + " or sync is > required(%d)", > > + cb->is_avd_down, > > + cb->amfd_sync_required); > > + > > } else { > > // We are in > normal cluster, send msg to director > > > msg.info.avd->msg_info.n2d_su_si_assign.msg_id = > > ++(cb- > > snd_msg_id); diff --git > a/osaf/services/saf/amf/amfnd/susm.cc > > b/osaf/services/saf/amf/amfnd/susm.cc > > --- > a/osaf/services/saf/amf/amfnd/susm.cc > > +++ > b/osaf/services/saf/amf/amfnd/susm.cc > > @@ -1345,6 +1345,12 @@ uint32_t > > avnd_evt_avd_su_pres_evh(AVND_C > > goto done; > > } > > } else { /* => instantiate > the su */ > > + // Do not need to wait for > headless sync if there is no > > application SUs > > + // initiated. This is > known because here we are > > receiving > > su_pres message > > + // for NCS SUs > > + if (su->is_ncs == true) > > + > cb->amfd_sync_required = false; > > + > > AVND_EVT *evt_ir = 0; > > TRACE("Sending to Imm > thread."); > > evt_ir = > avnd_evt_create(cb, AVND_EVT_IR, 0, > > nullptr, > > &info- > > su_name, 0, 0); > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel