@Suryanarayana: I think this fix makes AMFND a bit defensive, but let's
see Mahesh's comments
@Mahesh: If getting NCSMDS_DOWN, then there's no active to wait, so MDS
should stop this timer?
On 26/04/17 15:45, Suryanarayana.Garlapati wrote:
> Might be i guess this fix needs to be done at the MDS level, not at
> the AMFND, taking into consideration that the cluster
>
> has only two Controllers.
>
> Timer which is getting started at MDS should not be started(if started
> should be stopped) in case of getting the down for both of the amfd's.
>
>
>
> On Wednesday 26 April 2017 10:53 AM, Minh Chau wrote:
>> If cluster goes into headless stage and wait up to 3 mins
>> which is currently the timeout of MDS_AWAIT_ACTIVE_TMR_VAL,
>> amfnd will receive another NCSMDS_DOWN, and then delete
>> all buffered messages. As a result, the headless recovery
>> is impossible because these buffered messages are deleted.
>>
>> Patch ignores the second NCSMDS_DOWN.
>> ---
>> src/amf/amfnd/di.cc | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
>> index 627b31853..e06b9260d 100644
>> --- a/src/amf/amfnd/di.cc
>> +++ b/src/amf/amfnd/di.cc
>> @@ -638,6 +638,13 @@ uint32_t avnd_evt_mds_avd_dn_evh(AVND_CB *cb,
>> AVND_EVT *evt) {
>> }
>> }
>> + // Ignore the second NCSMDS_DOWN which comes from timeout of
>> + // MDS_AWAIT_ACTIVE_TMR_VAL
>> + if (cb->is_avd_down == true) {
>> + TRACE_LEAVE();
>> + return rc;
>> + }
>> +
>> m_AVND_CB_AVD_UP_RESET(cb);
>> cb->active_avd_adest = 0;
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel