There is a case that after AMFD send reboot order due to “out of sync window”.
AMFD receive CLM track callback but node is not member yet and delete node.
Later AMFND MDS down will not reset msg_id counter since it cannot find node.
When node reboot up, AMFD continue use current msg_id counter send to AMFND
cause messasge ID mismatch in AMFND then AMFND order reboot itself node.
---
 src/amf/amfd/clm.cc | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc
index e113a65f9..4a15d5ad7 100644
--- a/src/amf/amfd/clm.cc
+++ b/src/amf/amfd/clm.cc
@@ -316,9 +316,14 @@ static void clm_track_cb(
                    __FUNCTION__, node_name.c_str());
             goto done;
           } else if (node->node_state == AVD_AVND_STATE_ABSENT) {
-            LOG_IN("%s: CLM node '%s' is not an AMF cluster member; MDS down 
received",
+            LOG_IN("%s: CLM node '%s' is ABSENT; MDS down received",
                    __FUNCTION__, node_name.c_str());
             avd_node_delete_nodeid(node);
+            /* Reset msg_id because AVND MDS down may come later
+              and cannot find node to reset these, cause message ID mismatch. 
*/
+            node->rcv_msg_id = 0;
+            node->snd_msg_id = 0;
+            node->node_info.member = SA_FALSE;
             goto done;
           }
           TRACE(" Node Left: rootCauseEntity %s for node %u",
-- 
2.18.0


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to