Hi Minh,
         Though I am not able to simulate the problem, I tested as below:
1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and SU2 on PL-4 
as Standby.
2. Stop SC1 and SC2 and then stop PL-3.
3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete, stop SC1. SC2 
becomes Act.

In this case, SC-2 contains both SU1(Act) and SU2(Standby) assignments. 
Ideally, SU2 assignments should have been Act and there shouldn't be SU1 
assignment.

safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1
        saAmfSISUHAState=ACTIVE(1)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1
        saAmfSISUHAState=STANDBY(2)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)

Please check.

Thanks
-Nagu

> -----Original Message-----
> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
> Sent: 08 November 2016 08:53
> To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen Malviya;
> gary....@dektech.com.au; minh.c...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before
> standby AMFD comes up [#2162]
> 
>  osaf/services/saf/amf/amfnd/di.cc   |  7 +++++--
>  osaf/services/saf/amf/amfnd/susm.cc |  6 ++++++
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> 
> This case of SC failover causes new active AMFD getting stuck in node_up
> messages
> 
> Say first active controller is SC1, which goes down during headless sync.
> Therefore, the amfnd on SC2 receives mds_down of AVD, then both
> is_avd_down and amfd_sync_required are set to true. When SC2 takes over
> active role, amfnd on SC2 receives mds_up, but only is_avd_down is set to
> false and the variable amfd_sync_required remains true.
> When amfnd-SC2 finishes initiating middleware SU, it needs to send su_oper
> message to AMFD, but it is failed to send out due to amfd_sync_required.
> 
> In this scenario of SC failover, amfd_sync_required needs to set to false
> when amfnd on SC2 receives su_pres message on middleware SUs. That
> means amfnd on active controller does not need to wait for set_leds
> message, to be informed that cluster initiation is done, so that amfnd can
> sen su_oper messages to AMFD. This logic also aligns with normal headless
> scenario, where amfnd on active controller has amfd_sync_required initially
> marked as false because no middleware SUs are initiated. When
> amfd_sync_required is true that means amfnd all middleware SUs are
> initiated and assigned before headless, thus amfnd needs to wait for cluster
> initiation after headless.
> 
> diff --git a/osaf/services/saf/amf/amfnd/di.cc
> b/osaf/services/saf/amf/amfnd/di.cc
> --- a/osaf/services/saf/amf/amfnd/di.cc
> +++ b/osaf/services/saf/amf/amfnd/di.cc
> @@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb,
>               if (avnd_diq_rec_add(cb, &msg) == nullptr) {
>                       rc = NCSCC_RC_FAILURE;
>               }
> -             LOG_NO("avnd_di_oper_send() deferred as AMF director is
> offline");
> +             LOG_NO("avnd_di_oper_send() deferred as AMF director is
> offline(%d),"
> +                     " or sync is required(%d)", cb->is_avd_down,
> +cb->amfd_sync_required);
>       } else {
>               // We are in normal cluster, send msg to director
>               msg.info.avd->msg_info.n2d_opr_state.msg_id = ++(cb-
> >snd_msg_id); @@ -881,7 +882,9 @@ uint32_t
> avnd_di_susi_resp_send(AVND_CB
>                       rc = NCSCC_RC_FAILURE;
>               }
>               m_AVND_SU_ALL_SI_RESET(su);
> -             LOG_NO("avnd_di_susi_resp_send() deferred as AMF director is
> offline");
> +                LOG_NO("avnd_di_susi_resp_send() deferred as AMF director is
> offline(%d),"
> +                        " or sync is required(%d)", cb->is_avd_down,
> + cb->amfd_sync_required);
> +
>          } else {
>               // We are in normal cluster, send msg to director
>               msg.info.avd->msg_info.n2d_su_si_assign.msg_id = ++(cb-
> >snd_msg_id); diff --git a/osaf/services/saf/amf/amfnd/susm.cc
> b/osaf/services/saf/amf/amfnd/susm.cc
> --- a/osaf/services/saf/amf/amfnd/susm.cc
> +++ b/osaf/services/saf/amf/amfnd/susm.cc
> @@ -1345,6 +1345,12 @@ uint32_t avnd_evt_avd_su_pres_evh(AVND_C
>                               goto done;
>               }
>       } else { /* => instantiate the su */
> +             // Do not need to wait for headless sync if there is no
> application SUs
> +             // initiated. This is known because here we are receiving
> su_pres message
> +             // for NCS SUs
> +             if (su->is_ncs == true)
> +                     cb->amfd_sync_required = false;
> +
>               AVND_EVT *evt_ir = 0;
>               TRACE("Sending to Imm thread.");
>               evt_ir = avnd_evt_create(cb, AVND_EVT_IR, 0, nullptr, &info-
> >su_name, 0, 0);

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to