Hi Gary,

a question, this patch builds on logic introduced in ticket #2151 to handle the case when a controller

has not completely been stopped and the other controller is started. The if stmt

if ((fm_cb->role == PCS_RDA_ACTIVE) && (fm_cb->csi_assigned == false))

perhaps can be removed (and more code related to #2151?) and instead release the etcd lock as last operation in opensafd stop?

/HansN

On 03/09/2018 06:57 AM, Gary Lee wrote:
If we have a 'tied election' and split-brain prevention is enabled,
then the 'old active' is fenced, or the 'old active' will self-reboot
when it is notified a new node is active.

We need to disable this redundant check in fmd. Otherwise, the 'new active'
will also reboot, along with the 'old active'.
---
  src/fm/fmd/fm_main.cc | 19 ++++++++++++-------
  1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/src/fm/fmd/fm_main.cc b/src/fm/fmd/fm_main.cc
index 1244c2347..73c9b9ccd 100644
--- a/src/fm/fmd/fm_main.cc
+++ b/src/fm/fmd/fm_main.cc
@@ -600,13 +600,18 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
         * progress of shutdown (i.e., amfd/immd is still alive).
         */
        if ((fm_cb->role == PCS_RDA_ACTIVE) && (fm_cb->csi_assigned == false)) {
-        LOG_WA(
-            "Two active controllers observed in a cluster, newActive: %x and "
-            "old-Active: %x",
-            unsigned(fm_cb->node_id), unsigned(fm_cb->peer_node_id));
-        opensaf_reboot(0, NULL,
-                       "Received svc up from peer node (old-active is not "
-                       "fully DOWN), hence rebooting the new Active");
+        Consensus consensus_service;
+        if (consensus_service.IsEnabled() == false) {
+          // If split-brain prevention is enabled, then the 'old active' has
+          // already initiated a self-reboot, or it is fenced.
+          LOG_WA(
+              "Two active controllers observed in a cluster, newActive: %x and 
"
+              "old-Active: %x",
+              unsigned(fm_cb->node_id), unsigned(fm_cb->peer_node_id));
+          opensaf_reboot(0, NULL,
+                         "Received svc up from peer node (old-active is not "
+                         "fully DOWN), hence rebooting the new Active");
+        }
        }
/* Peer fm came up so sending ee_id of this node */


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to