lock-in [#2835]

Hans Nordeback Fri, 20 Apr 2018 01:01:19 -0700

Hi Alex,

please see below for some comments/questions.


/Regards HansN


On 04/18/2018 03:41 PM, Alex Jones wrote:

When using PLM an AMF node mapped to a CLM node mapped to a PLM EE, can get
stuck in locked state when rebooting, or going through a PLM EE lock/unlock.

When amfd receives a START step from CLM tracking it attempts to gracefully
shutdown the AMF node using AMF admin operations lock/lock-in. When PLM is
involved this doesn't always work correctly because PLM is also shutting down
the node by calling "opensafd stop". There is a race condition between PLM
using "opensafd stop", and amfd using the admin operations to bring down the
node, so that sometimes the AMF node gets stuck in locked state.

If the rootCauseEntity in the CLM tracking is a PLM entity then don't do
anything, as "opensafd stop" is already being called.
---
  src/amf/amfd/clm.cc | 25 ++++++++++++++++++++++++-
  1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc
index 2bcea2db0..7f675d8e9 100644
--- a/src/amf/amfd/clm.cc
+++ b/src/amf/amfd/clm.cc
@@ -274,6 +274,27 @@ static void clm_track_cb(
              TRACE_3("Already got callback for start of this change.");
              continue;
            }
+
+          if (strncmp(osaf_extended_name_borrow(rootCauseEntity),
+                      "safEE=",
+                      sizeof("safEE=") - 1) == 0 ||
+              strncmp(osaf_extended_name_borrow(rootCauseEntity),
+                      "safHE=",
+                      sizeof("safHE=") - 1) == 0) {
+            // PLM will take care of calling opensafd stop
+            TRACE("rootCause: %s from PLM operation so skipping %u",
+                  osaf_extended_name_borrow(rootCauseEntity),
+                  notifItem->clusterNode.nodeId);
+
+            SaAisErrorT rc(saClmResponse_4(avd_cb->clmHandle,
+                                           invocation,
+                                           SA_CLM_CALLBACK_RESPONSE_OK));

[HansN] perhaps use:

SaAisErrorT rc = saClmResponse_4 or SaAisErrorT rc{saClmResponse_4 instead?

+            if (rc != SA_AIS_OK)
+              LOG_ER("saClmResponse_4 failed: %i", rc);
+

[HansN] I think the amf operational state has to be checked and set todisabled? And should

break be used instead of continue?

+            continue;
+          }
+
            /* invocation to be used by pending clm response */
            node->clm_pend_inv = invocation;
            clm_node_exit_start(node, notifItem->clusterChange);
@@ -304,7 +325,9 @@ static void clm_track_cb(
                  osaf_extended_name_borrow(rootCauseEntity),
                  notifItem->clusterNode.nodeId);
            if (strncmp(osaf_extended_name_borrow(rootCauseEntity),
-                      "safEE=", 6) == 0) {
+                      "safEE=", 6) == 0 ||
+              strncmp(osaf_extended_name_borrow(rootCauseEntity),
+                      "safHE=", 6) == 0) {

[HansN] sizeof("safHE=") as above

              /* This callback is because of operation on PLM, so we need to 
mark
                 the node absent, because PLCD will anyway call opensafd stop.*/
              AVD_AVND *node =


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfd: if rootCauseEntity is PLM entity don't engage lock/lock-in [#2835]

Reply via email to