Hi Minh chau,
On 4/26/2017 5:43 PM, minh chau wrote:
>
> - Stop both SCs, amfnd receives 2 NCSMDS_DOWN, one is Adest, one is Vdest
I don't seen unnatural events from MDS, as amfnd might have subsided for
them.
Currently transport (MDS) functionality doesn't provide event
differently for
headless or non-headless and it is completely invisible to MDS.
I will go through this AMF case and will get back to you.
-AVM
On 4/26/2017 5:43 PM, minh chau wrote:
> Hi Mahesh,
>
> The sequence is going like this:
>
> - Stop both SCs, amfnd receives 2 NCSMDS_DOWN, one is Adest, one is
> Vdest. I guess at this point MDS tells that both standby and active
> amfd are down?
> 2017-04-26 21:13:52 PL-4 osafamfnd[413]: WA AMF director
> unexpectedly crashed
>
> - Leave cluster in headless about 3 mins, amfnd receives another
> NCSMDS_DOWN with Vdest, so MDS is telling no active amfd again?
> syslog:
> 2017-04-26 21:16:52 PL-4 osafamfnd[413]: WA AMF director
> unexpectedly crashed
>
> mds log:
> <143>1 2017-04-26T21:16:52.873168+10:00 PL-4 osafamfnd 413 mds.log
> [meta sequenceId="9881"] >> mds_mcm_await_active_tmr_expiry
> <142>1 2017-04-26T21:16:52.873183+10:00 PL-4 osafamfnd 413 mds.log
> [meta sequenceId="9882"] MCM:API: await_active_tmr expired for svc_id
> = AVND(13) Subscribed to svc_id = AVD(12) on VDEST id = 1
> <143>1 2017-04-26T21:16:52.9453+10:00 PL-4 osafclmna 405 mds.log
> [meta sequenceId="938"] >> mds_mcm_await_active_tmr_expiry
> <142>1 2017-04-26T21:16:52.945309+10:00 PL-4 osafclmna 405 mds.log
> [meta sequenceId="939"] MCM:API: await_active_tmr expired for svc_id =
> CLMNA(36) Subscribed to svc_id = CLMS(34) on VDEST id = 16
> <142>1 2017-04-26T21:16:52.945452+10:00 PL-4 osafsmfnd 454 mds.log
> [meta sequenceId="620"] MCM:API: svc_down : await_active_tmr_expiry :
> svc_id = SMFND(31) on DEST id = 65535 got DOWN for svc_id = SMFD(30)
> on VDEST id = 15
> <143>1 2017-04-26T21:16:52.945462+10:00 PL-4 osafsmfnd 454 mds.log
> [meta sequenceId="621"] << mds_mcm_await_active_tmr_expiry
> <143>1 2017-04-26T21:16:52.945938+10:00 PL-4 osafckptnd 432
> mds.log [meta sequenceId="1547"] >> mds_mcm_await_active_tmr_expiry
> <142>1 2017-04-26T21:16:52.945947+10:00 PL-4 osafckptnd 432
> mds.log [meta sequenceId="1548"] MCM:API: await_active_tmr expired for
> svc_id = CPND(17) Subscribed to svc_id = CPD(16) on VDEST id = 9
> <142>1 2017-04-26T21:16:52.946064+10:00 PL-4 osafckptnd 432
> mds.log [meta sequenceId="1558"] MCM:API: svc_down :
> await_active_tmr_expiry : svc_id = CPND(17) on DEST id = 65535 got
> DOWN for svc_id = CPD(16) on VDEST id = 9
> <143>1 2017-04-26T21:16:52.946074+10:00 PL-4 osafckptnd 432
> mds.log [meta sequenceId="1559"] << mds_mcm_await_active_tmr_expiry
> <143>1 2017-04-26T21:16:52.94611+10:00 PL-4 osafckptnd 432 mds.log
> [meta sequenceId="1562"] >> mds_mcm_await_active_tmr_expiry
> <142>1 2017-04-26T21:16:52.946118+10:00 PL-4 osafckptnd 432
> mds.log [meta sequenceId="1563"] MCM:API: await_active_tmr expired for
> svc_id = CLMA(35) Subscribed to svc_id = CLMS(34) on VDEST id = 16
> <143>1 2017-04-26T21:16:52.955692+10:00 PL-4 osafimmnd 395 mds.log
> [meta sequenceId="30048"] >> mds_mcm_await_active_tmr_expiry
> <142>1 2017-04-26T21:16:52.955698+10:00 PL-4 osafimmnd 395 mds.log
> [meta sequenceId="30049"] MCM:API: await_active_tmr expired for svc_id
> = CLMA(35) Subscribed to svc_id = CLMS(34) on VDEST id = 16
> <142>1 2017-04-26T21:16:52.955765+10:00 PL-4 osafimmnd 395 mds.log
> [meta sequenceId="30059"] MCM:API: svc_down : await_active_tmr_expiry
> : svc_id = CLMA(35) on DEST id = 65535 got DOWN for svc_id = CLMS(34)
> on VDEST id = 16
> <143>1 2017-04-26T21:16:52.955775+10:00 PL-4 osafimmnd 395 mds.log
> [meta sequenceId="30060"] << mds_mcm_await_active_tmr_expiry
>
> I guess the other node-director services also receive the 2nd
> NCSMDS_DOWN(Vdest), but those services have no problem because of
> service's logic (or likely ckptnd checks cb->is_cpd_up == true), so I
> thought it would be AMF problem, until I see the points from
> Suryanarayana. So the await_active_tmr is working as expected?
>
> thanks,
> Minh
>
> On 26/04/17 17:11, A V Mahesh wrote:
>> Hi Minh Chau,
>>
>> On 4/26/2017 12:05 PM, minh chau wrote:
>>> amfnd will receive another NCSMDS_DOWN
>>
>> you mean amfnd is receiving NCSMDS_DOWN for same amfd twice ?
>> or amfnd is receiving NCSMDS_DOWN for both active amfd & standby
>> amfd ?
>>
>> -AVM
>>
>> On 4/26/2017 12:05 PM, minh chau wrote:
>>>
>>> @Suryanarayana: I think this fix makes AMFND a bit defensive, but
>>> let's see Mahesh's comments
>>> @Mahesh: If getting NCSMDS_DOWN, then there's no active to wait, so
>>> MDS should stop this timer?
>>>
>>>
>>> On 26/04/17 15:45, Suryanarayana.Garlapati wrote:
>>>> Might be i guess this fix needs to be done at the MDS level, not at
>>>> the AMFND, taking into consideration that the cluster
>>>>
>>>> has only two Controllers.
>>>>
>>>> Timer which is getting started at MDS should not be started(if
>>>> started should be stopped) in case of getting the down for both of
>>>> the amfd's.
>>>>
>>>>
>>>>
>>>> On Wednesday 26 April 2017 10:53 AM, Minh Chau wrote:
>>>>> If cluster goes into headless stage and wait up to 3 mins
>>>>> which is currently the timeout of MDS_AWAIT_ACTIVE_TMR_VAL,
>>>>> amfnd will receive another NCSMDS_DOWN, and then delete
>>>>> all buffered messages. As a result, the headless recovery
>>>>> is impossible because these buffered messages are deleted.
>>>>>
>>>>> Patch ignores the second NCSMDS_DOWN.
>>>>> ---
>>>>> src/amf/amfnd/di.cc | 7 +++++++
>>>>> 1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
>>>>> index 627b31853..e06b9260d 100644
>>>>> --- a/src/amf/amfnd/di.cc
>>>>> +++ b/src/amf/amfnd/di.cc
>>>>> @@ -638,6 +638,13 @@ uint32_t avnd_evt_mds_avd_dn_evh(AVND_CB *cb,
>>>>> AVND_EVT *evt) {
>>>>> }
>>>>> }
>>>>> + // Ignore the second NCSMDS_DOWN which comes from timeout of
>>>>> + // MDS_AWAIT_ACTIVE_TMR_VAL
>>>>> + if (cb->is_avd_down == true) {
>>>>> + TRACE_LEAVE();
>>>>> + return rc;
>>>>> + }
>>>>> +
>>>>> m_AVND_CB_AVD_UP_RESET(cb);
>>>>> cb->active_avd_adest = 0;
>>>>
>>>>
>>>
>>
>>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel