@Suryanarayana: I think this fix makes AMFND a bit defensive, but let's 
see Mahesh's comments
@Mahesh: If getting NCSMDS_DOWN, then there's no active to wait, so MDS 
should stop this timer?


On 26/04/17 15:45, Suryanarayana.Garlapati wrote:
> Might be i guess this fix needs to be done at the MDS level, not at 
> the AMFND, taking into consideration that the cluster
>
> has only two Controllers.
>
> Timer which is getting started at MDS should not be started(if started 
> should be stopped) in case of getting the down for both of the amfd's.
>
>
>
> On Wednesday 26 April 2017 10:53 AM, Minh Chau wrote:
>> If cluster goes into headless stage and wait up to 3 mins
>> which is currently the timeout of MDS_AWAIT_ACTIVE_TMR_VAL,
>> amfnd will receive another NCSMDS_DOWN, and then delete
>> all buffered messages. As a result, the headless recovery
>> is impossible because these buffered messages are deleted.
>>
>> Patch ignores the second NCSMDS_DOWN.
>> ---
>>   src/amf/amfnd/di.cc | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
>> index 627b31853..e06b9260d 100644
>> --- a/src/amf/amfnd/di.cc
>> +++ b/src/amf/amfnd/di.cc
>> @@ -638,6 +638,13 @@ uint32_t avnd_evt_mds_avd_dn_evh(AVND_CB *cb, 
>> AVND_EVT *evt) {
>>       }
>>     }
>>   +  // Ignore the second NCSMDS_DOWN which comes from timeout of
>> +  // MDS_AWAIT_ACTIVE_TMR_VAL
>> +  if (cb->is_avd_down == true) {
>> +    TRACE_LEAVE();
>> +    return rc;
>> +  }
>> +
>>     m_AVND_CB_AVD_UP_RESET(cb);
>>     cb->active_avd_adest = 0;
>
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to