Hi Thang,

ACK from me.

Best Regards,
Thien

-----Original Message-----
From: Thang Duc Nguyen <thang.d.ngu...@dektech.com.au> 
Sent: Tuesday, June 25, 2024 8:56 AM
To: Thien Minh Huynh <thien.m.hu...@dektech.com.au>; Dat Tran Quoc Phan 
<dat.tq.p...@dektech.com.au>
Cc: opensaf-devel@lists.sourceforge.net; Thang Duc Nguyen 
<thang.d.ngu...@dektech.com.au>
Subject: [PATCH 1/1] smf: fix one step upgrade failed [#3354]

In large cluster or system under high load, during one step upgrade, SMF orders 
AMF to lock node group(NG). There are many request to IMM to update attribute 
and it causes the timeout respond from IMM to AMF. SMF receives timeout then 
retry lock again and again while the first lock still on going. When the first 
lock is successful and the request lock again from SMF will receive NO_OP error 
from AMF.

In this case, NO_OP should be considered as a success.
---
 src/smf/smfd/SmfAdminState.cc | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/smf/smfd/SmfAdminState.cc b/src/smf/smfd/SmfAdminState.cc 
index 958b7ae82..c20df8d74 100755
--- a/src/smf/smfd/SmfAdminState.cc
+++ b/src/smf/smfd/SmfAdminState.cc
@@ -926,6 +926,9 @@ bool SmfAdminStateHandler::nodeGroupAdminOperation(
           saImmOmAdminOperationInvoke_2(ownerHandle_, &nodeGroupName, 0,
                                         adminOp, params, &oi_rc,
                                         smfd_cb->adminOpTimeout);
+      if ((imm_rc != SA_AIS_OK) || (oi_rc != SA_AIS_OK))
+        LOG_WA("%s: imm_rc: %s, oi_rc: %s", __FUNCTION__,
+            saf_error(imm_rc), saf_error(oi_rc));
       if ((imm_rc == SA_AIS_ERR_TRY_AGAIN) ||
           (imm_rc == SA_AIS_OK && oi_rc == SA_AIS_ERR_TRY_AGAIN)) {
         base::Sleep(base::MillisToTimespec(2000));
@@ -933,7 +936,8 @@ bool SmfAdminStateHandler::nodeGroupAdminOperation(
       } else if (imm_rc == SA_AIS_ERR_TIMEOUT) {
         // Retry
         continue;
-      } else if (imm_rc == SA_AIS_ERR_NO_OP) {
+      } else if ((imm_rc == SA_AIS_ERR_NO_OP) ||
+                (oi_rc == SA_AIS_ERR_NO_OP)) {
         // If an admin operation is already performed SA_AIS_ERR_NO_OP
         // is returned. Treat this as OK, just log it and return
         // operation success
--
2.25.1



_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to