When the PBE hung, amfd can process the events with
below order when a node was started then stop then started
- clm_track_cb for node down event
- clm_track_cb for second node up event
- avd_mds_avnd_down_evh was called to process amfnd down event

And it cause the node can not join the cluster.
---
 src/amf/amfd/ndfsm.cc | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc
index ee47582de..23c516262 100644
--- a/src/amf/amfd/ndfsm.cc
+++ b/src/amf/amfd/ndfsm.cc
@@ -800,6 +800,15 @@ void avd_mds_avnd_down_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
       daemon_exit();
     }
 
+    if ((node->node_state == AVD_AVND_STATE_ABSENT) &&
+        (node->node_info.member == SA_TRUE)) {
+      // Ignore amfnd down event handle in late after clm cb node joined
+      TRACE("Ignore '%s' amfnd down event since node state absent",
+            node->node_name.c_str());
+      TRACE_LEAVE();
+      return;
+    }
+
     if (cb->failover_list.find(evt->info.node_id) != cb->failover_list.end()) {
       std::shared_ptr<NodeStateMachine> failed_node =
         cb->failover_list.at(evt->info.node_id);
-- 
2.17.1



_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to