The connection between the standby SC and that PL was dropped
(disconnect the reconnect ), but that PL still connected with
the active SC. It led the standby SC considered that PL absented
regardless the connection was established after that. During failover,
the standby SC will notify all recorded absent nodes left cluster.
It causes PL left cluster from AMF view but still connect to active.

This scenario is a kind of split-brain use case and amfd should order
PL reboot to recovery the issue.
---
 src/amf/amfd/main.cc | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/amf/amfd/main.cc b/src/amf/amfd/main.cc
index 6487a6b54..59e9bf723 100644
--- a/src/amf/amfd/main.cc
+++ b/src/amf/amfd/main.cc
@@ -436,6 +436,13 @@ static void handle_event_in_failover_state(AVD_EVT *evt) {
       if (AVD_AVND_STATE_ABSENT == node->node_state &&
           cb->failover_list.find(node->node_info.nodeId) == 
cb->failover_list.end()) {
         bool fover_done = false;
+        if (amfnd_svc_db->find(node->node_info.nodeId) !=
+            amfnd_svc_db->end()) {
+          LOG_WA("Node %x reconnect before failover,"
+                "order reboot node", node->node_info.nodeId);
+          LOG_WA("Sending node reboot order");
+          avd_d2n_reboot_snd(node);
+        }
         /* Check whether this node failover has been
            performed or not. */
         for (const auto &i_su : node->list_of_ncs_su) {
-- 
2.25.1



_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to