Hi Nagu,

The #2162 has two patches. I think your ack is for [PATCH 2 of 2] AMFND: 
Fix SC failover during headless sync before standby AMFD comes up [#2162].
Does the other one ([PATCH 1 of 2] AMFD: Fix SC failover during headless 
sync at INIT_DONE state [#2162]) look ok?

Thanks,
Minh
On 14/02/17 20:40, Nagendra Kumar wrote:
> Ack.
> Tested the scenarios.
>
> Thanks
> -Nagu
>
>> -----Original Message-----
>> From: minh chau [mailto:minh.c...@dektech.com.au]
>> Sent: 23 January 2017 16:24
>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
>> gary....@dektech.com.au
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
>> before standby AMFD comes up [#2162]
>>
>> Hi Nagu,
>>
>> I am checking the logs now.
>>
>> Thanks, Minh
>>
>> On 23/01/17 17:47, Nagendra Kumar wrote:
>>> The logs (Logs-tc.rar) attached in the ticket.
>>>
>>> Thanks
>>> -Nagu
>>>
>>>> -----Original Message-----
>>>> From: minh chau [mailto:minh.c...@dektech.com.au]
>>>> Sent: 16 January 2017 05:47
>>>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
>>>> gary....@dektech.com.au
>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
>>>> sync before standby AMFD comes up [#2162]
>>>>
>>>> Hi Nagu,
>>>>
>>>> I misunderstood your point, and now I get it.
>>>> In my test I see it works as expected - SU2 becomes Act and no
>>>> assignment for SU1 I guess in your test some how the cluster
>>>> initiation timer has not been started on SC2 (new active), there could be a
>> missing case in the patch.
>>>> Could you please share me the trace?
>>>>
>>>> Thanks,
>>>> Minh
>>>>
>>>> On 13/01/17 21:48, Nagendra Kumar wrote:
>>>>> Hi Minh,
>>>>>   Please check my response inlined with [Nagu].
>>>>>
>>>>> Thanks
>>>>> -Nagu
>>>>>> -----Original Message-----
>>>>>> From: minh chau [mailto:minh.c...@dektech.com.au]
>>>>>> Sent: 13 January 2017 03:53
>>>>>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
>>>>>> gary....@dektech.com.au
>>>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>>>> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
>>>>>> sync before standby AMFD comes up [#2162]
>>>>>>
>>>>>> Hi Nagu,
>>>>>>
>>>>>> Thanks for reviewing, please see comments inline.
>>>>>>
>>>>>> Thanks,
>>>>>> Minh
>>>>>>
>>>>>> On 12/01/17 21:48, Nagendra Kumar wrote:
>>>>>>> Hi Minh,
>>>>>>>          Though I am not able to simulate the problem, I tested as 
>>>>>>> below:
>>>>>>> 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and
>>>>>>> SU2 on
>>>>>> PL-4 as Standby.
>>>>>>> 2. Stop SC1 and SC2 and then stop PL-3.
>>>>>>> 3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete, stop
>>>>>>> SC1. SC2
>>>>>> becomes Act.
>>>>>> [M]: As SU1 is on PL3, SU2 is on PL4, and If PL-3 is stopped, then
>>>>>> only
>>>>>> SU2 has active assignment
>>>>> [Nagu]: PL-3 is stopped in step #2.
>>>>>>> In this case, SC-2 contains both SU1(Act) and SU2(Standby)
>> assignments.
>>>>>>> Ideally, SU2 assignments should have been Act and there shouldn't
>>>>>>> be
>>>>>>> SU1
>>>>>> assignment.
>>>>>> [M]: This seems to be another test where SU1 and SU2 are hosted on
>>>>>> SC2, then both SU1 and SU2 should get assignment
>>>>> [Nagu]: I mean to say command 'amf-state siass' run on SC-1 displays
>>>>> both
>>>> SU1 and SU2 assignments.
>>>>>                    SU1 and SU2 are hosted on PL-3 and PL-4 respectively.
>>>>> This is similar test case, which is mentioned in the ticket?
>> safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
>>>>>> mo,safApp=AmfDemo1
>>>>>>>             saAmfSISUHAState=ACTIVE(1)
>>>>>>>             saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>>>>>>>
>> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
>>>>>> mo,safApp=AmfDemo1
>>>>>>>             saAmfSISUHAState=STANDBY(2)
>>>>>>>             saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>>>>>>>
>>>>>>> Please check.
>>>>>>>
>>>>>>> Thanks
>>>>>>> -Nagu
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
>>>>>>>> Sent: 08 November 2016 08:53
>>>>>>>> To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen
>> Malviya;
>>>>>>>> gary....@dektech.com.au; minh.c...@dektech.com.au
>>>>>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>>>>>> Subject: [PATCH 2 of 2] AMFND: Fix SC failover during headless
>>>>>>>> sync before standby AMFD comes up [#2162]
>>>>>>>>
>>>>>>>>      osaf/services/saf/amf/amfnd/di.cc   |  7 +++++--
>>>>>>>>      osaf/services/saf/amf/amfnd/susm.cc |  6 ++++++
>>>>>>>>      2 files changed, 11 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>>
>>>>>>>> This case of SC failover causes new active AMFD getting stuck in
>>>>>>>> node_up messages
>>>>>>>>
>>>>>>>> Say first active controller is SC1, which goes down during headless
>> sync.
>>>>>>>> Therefore, the amfnd on SC2 receives mds_down of AVD, then both
>>>>>>>> is_avd_down and amfd_sync_required are set to true. When SC2
>>>>>>>> takes over active role, amfnd on SC2 receives mds_up, but only
>>>>>>>> is_avd_down is set to false and the variable amfd_sync_required
>>>> remains true.
>>>>>>>> When amfnd-SC2 finishes initiating middleware SU, it needs to
>>>>>>>> send su_oper message to AMFD, but it is failed to send out due to
>>>>>> amfd_sync_required.
>>>>>>>> In this scenario of SC failover, amfd_sync_required needs to set
>>>>>>>> to false when amfnd on SC2 receives su_pres message on
>> middleware
>>>> SUs.
>>>>>>>> That means amfnd on active controller does not need to wait for
>>>>>>>> set_leds message, to be informed that cluster initiation is done,
>>>>>>>> so that amfnd can sen su_oper messages to AMFD. This logic also
>>>>>>>> aligns with normal headless scenario, where amfnd on active
>>>>>>>> controller has amfd_sync_required initially marked as false
>>>>>>>> because no middleware SUs are initiated. When amfd_sync_required
>>>>>>>> is true that means amfnd all middleware SUs are initiated and
>>>>>>>> assigned before headless, thus amfnd needs to wait for cluster
>>>>>>>> initiation after
>>>> headless.
>>>>>>>> diff --git a/osaf/services/saf/amf/amfnd/di.cc
>>>>>>>> b/osaf/services/saf/amf/amfnd/di.cc
>>>>>>>> --- a/osaf/services/saf/amf/amfnd/di.cc
>>>>>>>> +++ b/osaf/services/saf/amf/amfnd/di.cc
>>>>>>>> @@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb,
>>>>>>>>                if (avnd_diq_rec_add(cb, &msg) == nullptr) {
>>>>>>>>                        rc = NCSCC_RC_FAILURE;
>>>>>>>>                }
>>>>>>>> -              LOG_NO("avnd_di_oper_send() deferred as AMF
>> director is
>>>>>>>> offline");
>>>>>>>> +              LOG_NO("avnd_di_oper_send() deferred as AMF
>> director is
>>>>>>>> offline(%d),"
>>>>>>>> +                      " or sync is required(%d)", cb->is_avd_down,
>>>>>>>> +cb->amfd_sync_required);
>>>>>>>>        } else {
>>>>>>>>                // We are in normal cluster, send msg to director
>>>>>>>>                msg.info.avd->msg_info.n2d_opr_state.msg_id =
>> ++(cb-
>>>>>>>>> snd_msg_id); @@ -881,7 +882,9 @@ uint32_t
>>>>>>>> avnd_di_susi_resp_send(AVND_CB
>>>>>>>>                        rc = NCSCC_RC_FAILURE;
>>>>>>>>                }
>>>>>>>>                m_AVND_SU_ALL_SI_RESET(su);
>>>>>>>> -              LOG_NO("avnd_di_susi_resp_send() deferred as AMF
>>>>>> director is
>>>>>>>> offline");
>>>>>>>> +                LOG_NO("avnd_di_susi_resp_send() deferred as AMF
>>>>>>>> + director is
>>>>>>>> offline(%d),"
>>>>>>>> +                        " or sync is required(%d)",
>>>>>>>> + cb->is_avd_down,
>>>>>>>> + cb->amfd_sync_required);
>>>>>>>> +
>>>>>>>>              } else {
>>>>>>>>                // We are in normal cluster, send msg to director
>>>>>>>>                msg.info.avd->msg_info.n2d_su_si_assign.msg_id =
>>>>>>>> ++(cb-
>>>>>>>>> snd_msg_id); diff --git a/osaf/services/saf/amf/amfnd/susm.cc
>>>>>>>> b/osaf/services/saf/amf/amfnd/susm.cc
>>>>>>>> --- a/osaf/services/saf/amf/amfnd/susm.cc
>>>>>>>> +++ b/osaf/services/saf/amf/amfnd/susm.cc
>>>>>>>> @@ -1345,6 +1345,12 @@ uint32_t
>>>> avnd_evt_avd_su_pres_evh(AVND_C
>>>>>>>>                                goto done;
>>>>>>>>                }
>>>>>>>>        } else { /* => instantiate the su */
>>>>>>>> +              // Do not need to wait for headless sync if there is no
>>>>>>>> application SUs
>>>>>>>> +              // initiated. This is known because here we are
>> receiving
>>>>>>>> su_pres message
>>>>>>>> +              // for NCS SUs
>>>>>>>> +              if (su->is_ncs == true)
>>>>>>>> +                      cb->amfd_sync_required = false;
>>>>>>>> +
>>>>>>>>                AVND_EVT *evt_ir = 0;
>>>>>>>>                TRACE("Sending to Imm thread.");
>>>>>>>>                evt_ir = avnd_evt_create(cb, AVND_EVT_IR, 0,
>> nullptr,
>>>>>>>> &info-
>>>>>>>>> su_name, 0, 0);


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to