Might be i guess this fix needs to be done at the MDS level, not at the 
AMFND, taking into consideration that the cluster

has only two Controllers.

Timer which is getting started at MDS should not be started(if started 
should be stopped) in case of getting the down for both of the amfd's.



On Wednesday 26 April 2017 10:53 AM, Minh Chau wrote:
> If cluster goes into headless stage and wait up to 3 mins
> which is currently the timeout of MDS_AWAIT_ACTIVE_TMR_VAL,
> amfnd will receive another NCSMDS_DOWN, and then delete
> all buffered messages. As a result, the headless recovery
> is impossible because these buffered messages are deleted.
>
> Patch ignores the second NCSMDS_DOWN.
> ---
>   src/amf/amfnd/di.cc | 7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
> index 627b31853..e06b9260d 100644
> --- a/src/amf/amfnd/di.cc
> +++ b/src/amf/amfnd/di.cc
> @@ -638,6 +638,13 @@ uint32_t avnd_evt_mds_avd_dn_evh(AVND_CB *cb, AVND_EVT 
> *evt) {
>       }
>     }
>   
> +  // Ignore the second NCSMDS_DOWN which comes from timeout of
> +  // MDS_AWAIT_ACTIVE_TMR_VAL
> +  if (cb->is_avd_down == true) {
> +    TRACE_LEAVE();
> +    return rc;
> +  }
> +
>     m_AVND_CB_AVD_UP_RESET(cb);
>     cb->active_avd_adest = 0;
>   


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to