Hi Minh, Good catch !! Yes, please push, but as such we have documented in Compliance Table that "Before the timer expiry, failover and switchover are not supported."
Thanks -Nagu From: minh chau [mailto:minh.c...@dektech.com.au] Sent: 16 February 2017 07:36 To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya; gary....@dektech.com.au Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162] Hi Nagu, Thanks for reminding, there's one change in the patch that could affect on upgrade too, it is: + // The cb->init_state must be AVD_INIT_DONE or AVD_APP_STATE + // If AVD_INIT_DONE, there was a SC failover during cluster + // instantiation phase in cluster (after all NCS SU is assigned) + // If AVD_APP_STATE, this should be come from 2N-MW SI swap + if (cb->init_state >= AVD_INIT_DONE) { + if (cluster_su_instantiation_done(cb, nullptr) == true) { + cluster_startup_expiry_event_generate(cb); + } else { + m_AVD_CLINIT_TMR_START(cb); + } + } So, I would like to make it for AVD_INIT_DONE only, it looks like + // The cb->init_state must be AVD_INIT_DONE or AVD_APP_STATE + // If AVD_INIT_DONE, there was a SC failover during cluster + // instantiation phase in cluster (after all NCS SU is assigned) + if (cb->init_state == AVD_INIT_DONE) { + if (cluster_su_instantiation_done(cb, nullptr) == true) { + cluster_startup_expiry_event_generate(cb); + } else { + m_AVD_CLINIT_TMR_START(cb); + } + } If you agree, I can push the patches with new change. Thanks, Minh On 15/02/17 15:13, Nagendra Kumar wrote: Yes, ack for both the patches. I assume you would have tested upgrade scenarios. Thanks -Nagu -----Original Message----- From: minh chau [mailto:minh.c...@dektech.com.au] Sent: 15 February 2017 08:52 To: Nagendra Kumar; HYPERLINK "mailto:hans.nordeb...@ericsson.com"hans.nordeb...@ericsson.com; Praveen Malviya; HYPERLINK "mailto:gary....@dektech.com.au"gary....@dektech.com.au Cc: HYPERLINK "mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162] Hi Nagu, The #2162 has two patches. I think your ack is for [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]. Does the other one ([PATCH 1 of 2] AMFD: Fix SC failover during headless sync at INIT_DONE state [#2162]) look ok? Thanks, Minh On 14/02/17 20:40, Nagendra Kumar wrote: Ack. Tested the scenarios. Thanks -Nagu -----Original Message----- From: minh chau [mailto:minh.c...@dektech.com.au] Sent: 23 January 2017 16:24 To: Nagendra Kumar; HYPERLINK "mailto:hans.nordeb...@ericsson.com"hans.nordeb...@ericsson.com; Praveen Malviya; HYPERLINK "mailto:gary....@dektech.com.au"gary....@dektech.com.au Cc: HYPERLINK "mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162] Hi Nagu, I am checking the logs now. Thanks, Minh On 23/01/17 17:47, Nagendra Kumar wrote: The logs (Logs-tc.rar) attached in the ticket. Thanks -Nagu -----Original Message----- From: minh chau [mailto:minh.c...@dektech.com.au] Sent: 16 January 2017 05:47 To: Nagendra Kumar; HYPERLINK "mailto:hans.nordeb...@ericsson.com"hans.nordeb...@ericsson.com; Praveen Malviya; HYPERLINK "mailto:gary....@dektech.com.au"gary....@dektech.com.au Cc: HYPERLINK "mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162] Hi Nagu, I misunderstood your point, and now I get it. In my test I see it works as expected - SU2 becomes Act and no assignment for SU1 I guess in your test some how the cluster initiation timer has not been started on SC2 (new active), there could be a missing case in the patch. Could you please share me the trace? Thanks, Minh On 13/01/17 21:48, Nagendra Kumar wrote: Hi Minh, Please check my response inlined with [Nagu]. Thanks -Nagu -----Original Message----- From: minh chau [mailto:minh.c...@dektech.com.au] Sent: 13 January 2017 03:53 To: Nagendra Kumar; HYPERLINK "mailto:hans.nordeb...@ericsson.com"hans.nordeb...@ericsson.com; Praveen Malviya; HYPERLINK "mailto:gary....@dektech.com.au"gary....@dektech.com.au Cc: HYPERLINK "mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162] Hi Nagu, Thanks for reviewing, please see comments inline. Thanks, Minh On 12/01/17 21:48, Nagendra Kumar wrote: Hi Minh, Though I am not able to simulate the problem, I tested as below: 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and SU2 on PL-4 as Standby. 2. Stop SC1 and SC2 and then stop PL-3. 3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete, stop SC1. SC2 becomes Act. [M]: As SU1 is on PL3, SU2 is on PL4, and If PL-3 is stopped, then only SU2 has active assignment [Nagu]: PL-3 is stopped in step #2. In this case, SC-2 contains both SU1(Act) and SU2(Standby) assignments. Ideally, SU2 assignments should have been Act and there shouldn't be SU1 assignment. [M]: This seems to be another test where SU1 and SU2 are hosted on SC2, then both SU1 and SU2 should get assignment [Nagu]: I mean to say command 'amf-state siass' run on SC-1 displays both SU1 and SU2 assignments. SU1 and SU2 are hosted on PL-3 and PL-4 respectively. This is similar test case, which is mentioned in the ticket? safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe mo,safApp=AmfDemo1 saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe mo,safApp=AmfDemo1 saAmfSISUHAState=STANDBY(2) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) Please check. Thanks -Nagu -----Original Message----- From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] Sent: 08 November 2016 08:53 To: HYPERLINK "mailto:hans.nordeb...@ericsson.com"hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen Malviya; HYPERLINK "mailto:gary....@dektech.com.au"gary....@dektech.com.au; HYPERLINK "mailto:minh.c...@dektech.com.au"minh.c...@dektech.com.au Cc: HYPERLINK "mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net Subject: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162] osaf/services/saf/amf/amfnd/di.cc | 7 +++++-- osaf/services/saf/amf/amfnd/susm.cc | 6 ++++++ 2 files changed, 11 insertions(+), 2 deletions(-) This case of SC failover causes new active AMFD getting stuck in node_up messages Say first active controller is SC1, which goes down during headless sync. Therefore, the amfnd on SC2 receives mds_down of AVD, then both is_avd_down and amfd_sync_required are set to true. When SC2 takes over active role, amfnd on SC2 receives mds_up, but only is_avd_down is set to false and the variable amfd_sync_required remains true. When amfnd-SC2 finishes initiating middleware SU, it needs to send su_oper message to AMFD, but it is failed to send out due to amfd_sync_required. In this scenario of SC failover, amfd_sync_required needs to set to false when amfnd on SC2 receives su_pres message on middleware SUs. That means amfnd on active controller does not need to wait for set_leds message, to be informed that cluster initiation is done, so that amfnd can sen su_oper messages to AMFD. This logic also aligns with normal headless scenario, where amfnd on active controller has amfd_sync_required initially marked as false because no middleware SUs are initiated. When amfd_sync_required is true that means amfnd all middleware SUs are initiated and assigned before headless, thus amfnd needs to wait for cluster initiation after headless. diff --git a/osaf/services/saf/amf/amfnd/di.cc b/osaf/services/saf/amf/amfnd/di.cc --- a/osaf/services/saf/amf/amfnd/di.cc +++ b/osaf/services/saf/amf/amfnd/di.cc @@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb, if (avnd_diq_rec_add(cb, &msg) == nullptr) { rc = NCSCC_RC_FAILURE; } - LOG_NO("avnd_di_oper_send() deferred as AMF director is offline"); + LOG_NO("avnd_di_oper_send() deferred as AMF director is offline(%d)," + " or sync is required(%d)", cb->is_avd_down, +cb->amfd_sync_required); } else { // We are in normal cluster, send msg to director msg.info.avd->msg_info.n2d_opr_state.msg_id = ++(cb- snd_msg_id); @@ -881,7 +882,9 @@ uint32_t avnd_di_susi_resp_send(AVND_CB rc = NCSCC_RC_FAILURE; } m_AVND_SU_ALL_SI_RESET(su); - LOG_NO("avnd_di_susi_resp_send() deferred as AMF director is offline"); + LOG_NO("avnd_di_susi_resp_send() deferred as + AMF director is offline(%d)," + " or sync is required(%d)", + cb->is_avd_down, + cb->amfd_sync_required); + } else { // We are in normal cluster, send msg to director msg.info.avd->msg_info.n2d_su_si_assign.msg_id = ++(cb- snd_msg_id); diff --git a/osaf/services/saf/amf/amfnd/susm.cc b/osaf/services/saf/amf/amfnd/susm.cc --- a/osaf/services/saf/amf/amfnd/susm.cc +++ b/osaf/services/saf/amf/amfnd/susm.cc @@ -1345,6 +1345,12 @@ uint32_t avnd_evt_avd_su_pres_evh(AVND_C goto done; } } else { /* => instantiate the su */ + // Do not need to wait for headless sync if there is no application SUs + // initiated. This is known because here we are receiving su_pres message + // for NCS SUs + if (su->is_ncs == true) + cb->amfd_sync_required = false; + AVND_EVT *evt_ir = 0; TRACE("Sending to Imm thread."); evt_ir = avnd_evt_create(cb, AVND_EVT_IR, 0, nullptr, &info- su_name, 0, 0); ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel