Hi Nagu,

Thanks for reviewing, please see comments inline.

Thanks,
Minh

On 12/01/17 21:48, Nagendra Kumar wrote:
> Hi Minh,
>        Though I am not able to simulate the problem, I tested as below:
> 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and SU2 on 
> PL-4 as Standby.
> 2. Stop SC1 and SC2 and then stop PL-3.
> 3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete, stop SC1. SC2 
> becomes Act.
[M]: As SU1 is on PL3, SU2 is on PL4, and If PL-3 is stopped, then only 
SU2 has active assignment
>
> In this case, SC-2 contains both SU1(Act) and SU2(Standby) assignments.
> Ideally, SU2 assignments should have been Act and there shouldn't be SU1 
> assignment.
[M]: This seems to be another test where SU1 and SU2 are hosted on SC2, 
then both SU1 and SU2 should get assignment
>
> safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1
>          saAmfSISUHAState=ACTIVE(1)
>          saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1
>          saAmfSISUHAState=STANDBY(2)
>          saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> Please check.
>
> Thanks
> -Nagu
>
>> -----Original Message-----
>> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
>> Sent: 08 November 2016 08:53
>> To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen Malviya;
>> gary....@dektech.com.au; minh.c...@dektech.com.au
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before
>> standby AMFD comes up [#2162]
>>
>>   osaf/services/saf/amf/amfnd/di.cc   |  7 +++++--
>>   osaf/services/saf/amf/amfnd/susm.cc |  6 ++++++
>>   2 files changed, 11 insertions(+), 2 deletions(-)
>>
>>
>> This case of SC failover causes new active AMFD getting stuck in node_up
>> messages
>>
>> Say first active controller is SC1, which goes down during headless sync.
>> Therefore, the amfnd on SC2 receives mds_down of AVD, then both
>> is_avd_down and amfd_sync_required are set to true. When SC2 takes over
>> active role, amfnd on SC2 receives mds_up, but only is_avd_down is set to
>> false and the variable amfd_sync_required remains true.
>> When amfnd-SC2 finishes initiating middleware SU, it needs to send su_oper
>> message to AMFD, but it is failed to send out due to amfd_sync_required.
>>
>> In this scenario of SC failover, amfd_sync_required needs to set to false
>> when amfnd on SC2 receives su_pres message on middleware SUs. That
>> means amfnd on active controller does not need to wait for set_leds
>> message, to be informed that cluster initiation is done, so that amfnd can
>> sen su_oper messages to AMFD. This logic also aligns with normal headless
>> scenario, where amfnd on active controller has amfd_sync_required initially
>> marked as false because no middleware SUs are initiated. When
>> amfd_sync_required is true that means amfnd all middleware SUs are
>> initiated and assigned before headless, thus amfnd needs to wait for cluster
>> initiation after headless.
>>
>> diff --git a/osaf/services/saf/amf/amfnd/di.cc
>> b/osaf/services/saf/amf/amfnd/di.cc
>> --- a/osaf/services/saf/amf/amfnd/di.cc
>> +++ b/osaf/services/saf/amf/amfnd/di.cc
>> @@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb,
>>              if (avnd_diq_rec_add(cb, &msg) == nullptr) {
>>                      rc = NCSCC_RC_FAILURE;
>>              }
>> -            LOG_NO("avnd_di_oper_send() deferred as AMF director is
>> offline");
>> +            LOG_NO("avnd_di_oper_send() deferred as AMF director is
>> offline(%d),"
>> +                    " or sync is required(%d)", cb->is_avd_down,
>> +cb->amfd_sync_required);
>>      } else {
>>              // We are in normal cluster, send msg to director
>>              msg.info.avd->msg_info.n2d_opr_state.msg_id = ++(cb-
>>> snd_msg_id); @@ -881,7 +882,9 @@ uint32_t
>> avnd_di_susi_resp_send(AVND_CB
>>                      rc = NCSCC_RC_FAILURE;
>>              }
>>              m_AVND_SU_ALL_SI_RESET(su);
>> -            LOG_NO("avnd_di_susi_resp_send() deferred as AMF director is
>> offline");
>> +                LOG_NO("avnd_di_susi_resp_send() deferred as AMF director is
>> offline(%d),"
>> +                        " or sync is required(%d)", cb->is_avd_down,
>> + cb->amfd_sync_required);
>> +
>>           } else {
>>              // We are in normal cluster, send msg to director
>>              msg.info.avd->msg_info.n2d_su_si_assign.msg_id = ++(cb-
>>> snd_msg_id); diff --git a/osaf/services/saf/amf/amfnd/susm.cc
>> b/osaf/services/saf/amf/amfnd/susm.cc
>> --- a/osaf/services/saf/amf/amfnd/susm.cc
>> +++ b/osaf/services/saf/amf/amfnd/susm.cc
>> @@ -1345,6 +1345,12 @@ uint32_t avnd_evt_avd_su_pres_evh(AVND_C
>>                              goto done;
>>              }
>>      } else { /* => instantiate the su */
>> +            // Do not need to wait for headless sync if there is no
>> application SUs
>> +            // initiated. This is known because here we are receiving
>> su_pres message
>> +            // for NCS SUs
>> +            if (su->is_ncs == true)
>> +                    cb->amfd_sync_required = false;
>> +
>>              AVND_EVT *evt_ir = 0;
>>              TRACE("Sending to Imm thread.");
>>              evt_ir = avnd_evt_create(cb, AVND_EVT_IR, 0, nullptr, &info-
>>> su_name, 0, 0);


------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to