Hi Minh, Nagu,

Ok, I send out new version V5 as your suggestion.

Best Regards,
Thuan

-----Original Message-----
From: Minh Hon Chau <minh.c...@dektech.com.au> 
Sent: Thursday, July 12, 2018 1:36 PM
To: thuan.tran <thuan.t...@dektech.com.au>; nagen...@hasolutions.in; 
hans.nordeb...@ericsson.com; gary....@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] amf: change the way amfd handle amfnd down [#2891]

Hi Thuan,

I think what Nagu suggested is sufficiently good to fix the issue in this 
ticket.

Regarding the @synced_headless you add, what I see in your patch for now, if 
active amfd has not created synced assignments, and mds down comes, amfd does 
not call avd_node_failover. That could cause a problem, because the 
send/receive counters could be changed during the sync window, and those need 
to be reset. Generally, the avd_node_failover() should be handling and checking 
the states of several node/su/si/... and most of the cases the node coming 
down, this function should be called.

This may (or may not) be a problem that you have found, I think you could 
create another ticket if you think it's a problem, since it looks quite 
separated and the scenario is significant.

Thanks

Minh


On 11/07/18 17:37, thuan.tran wrote:
> There is a case that after AMFD send reboot order due to “out of sync window”.
> AMFD receive CLM track callback but node is not AMF member yet and delete 
> node.
> Later AMFND MDS down will do nothing since it cannot find the node.
> When node reboot up, AMFD continue use old msg_id counter send to 
> AMFND cause messasge ID mismatch in AMFND then AMFND order reboot itself node.
>
> Also, if AMFND already synced info after headless to active AMFD, then 
> node failover actions need consider for this AMFND down.
>
> Use a flag synced_headless for node, turn it true if susi recreate, 
> then in AMFND down handler, searching the node_id in node_name_db.
> If found, check if need do node failover base on synced_headless flag.
> ---
>   src/amf/amfd/ndfsm.cc | 21 ++++++++++++++++++++-
>   src/amf/amfd/node.cc  |  1 +
>   src/amf/amfd/node.h   |  1 +
>   src/amf/amfd/siass.cc |  1 +
>   4 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc index 
> 9d54df13d..6323d3a73 100644
> --- a/src/amf/amfd/ndfsm.cc
> +++ b/src/amf/amfd/ndfsm.cc
> @@ -767,6 +767,7 @@ void avd_mds_avnd_up_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
>    
> **********************************************************************
> ****/
>   
>   void avd_mds_avnd_down_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
> +  bool node_failover = true;
>     AVD_AVND *node = avd_node_find_nodeid(evt->info.node_id);
>   
>     TRACE_ENTER2("%x, %p", evt->info.node_id, node); @@ -775,6 +776,20 
> @@ void avd_mds_avnd_down_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
>     nds_mds_ver_db.erase(evt->info.node_id);
>     amfnd_svc_db->erase(evt->info.node_id);
>   
> +  if (node == nullptr) {
> +    for (const auto &value : *node_name_db) {
> +      AVD_AVND *avnd = value.second;
> +      if (avnd->node_info.nodeId == evt->info.node_id) {
> +        node_failover = false;
> +        node = avnd;
> +        if (node->synced_headless) {
> +          node_failover = true;
> +        }
> +        break;
> +      }
> +    }
> +  }
> +
>     if (node != nullptr) {
>       // Do nothing if the local node goes down. Most likely due to system
>       // shutdown. If node director goes down due to a bug, the AMF 
> watchdog will @@ -784,7 +799,9 @@ void avd_mds_avnd_down_evh(AVD_CL_CB *cb, 
> AVD_EVT *evt) {
>       }
>   
>       if (avd_cb->avail_state_avd == SA_AMF_HA_ACTIVE) {
> -      avd_node_failover(node);
> +      if (node_failover) {
> +        avd_node_failover(node);
> +      }
>         // Update standby out of sync if standby sc goes down
>         if (avd_cb->node_id_avd_other == node->node_info.nodeId) {
>           cb->stby_sync_state = AVD_STBY_OUT_OF_SYNC; @@ -802,6 +819,7 
> @@ void avd_mds_avnd_down_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
>         node->recvr_fail_sw = false;
>         node->node_info.initialViewNumber = 0;
>         node->node_info.member = SA_FALSE;
> +      node->synced_headless = false;
>       }
>     }
>   
> @@ -1122,6 +1140,7 @@ void avd_node_mark_absent(AVD_AVND *node) {
>   
>     node->node_info.initialViewNumber = 0;
>     node->node_info.member = SA_FALSE;
> +  node->synced_headless = false;
>   
>     /* Increment node failfast counter */
>     avd_cb->nodes_exit_cnt++;
> diff --git a/src/amf/amfd/node.cc b/src/amf/amfd/node.cc index 
> 0ffcfb782..f421e68de 100644
> --- a/src/amf/amfd/node.cc
> +++ b/src/amf/amfd/node.cc
> @@ -94,6 +94,7 @@ void AVD_AVND::initialize() {
>     node_name = {};
>     node_info = {};
>     node_info.member = SA_FALSE;
> +  synced_headless = false;
>     adest = {};
>     saAmfNodeClmNode = {};
>     saAmfNodeCapacity = {};
> diff --git a/src/amf/amfd/node.h b/src/amf/amfd/node.h index 
> e64bf8c93..02b15bca8 100644
> --- a/src/amf/amfd/node.h
> +++ b/src/amf/amfd/node.h
> @@ -145,6 +145,7 @@ class AVD_AVND {
>     uint16_t node_up_msg_count;    /* to count of node_up msg that director 
> had
>                                       received from this node */
>     bool reboot;
> +  bool synced_headless;
>     bool is_campaign_set_for_all_sus() const;
>     // Member functions.
>     void node_sus_termstate_set(bool term_state) const; diff --git 
> a/src/amf/amfd/siass.cc b/src/amf/amfd/siass.cc index 
> 267c55c07..f23c5510e 100644
> --- a/src/amf/amfd/siass.cc
> +++ b/src/amf/amfd/siass.cc
> @@ -1136,6 +1136,7 @@ SaAisErrorT 
> avd_susi_recreate(AVSV_N2D_ND_SISU_STATE_MSG_INFO *info) {
>       return SA_AIS_ERR_NOT_EXIST;
>     }
>   
> +  node->synced_headless = true;
>     for (su_state = info->su_list; su_state != nullptr;
>          su_state = su_state->next) {
>       AVD_SU *su = su_db->find(Amf::to_string(&su_state->safSU));



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to