Re: [devel] Review request imm: update PR documentation regarding unresponsive file system [#3024]

2019-05-08 Thread Vu Minh Nguyen
Hi all,

 

Any comments on this document update? I will push it on tomorrow if no
comments. Thanks.

 

Regards, Vu

 

From: Vu Minh Nguyen  
Sent: Monday, April 22, 2019 10:46 AM
To: 'hans.nordeb...@ericsson.com' ;
'lennart.l...@ericsson.com' ;
'gary@dektech.com.au' 
Cc: 'opensaf-devel@lists.sourceforge.net'

Subject: [devel] Review request imm: update PR documentation regarding
unresponsive file system [#3024]

 

Hi all,

 

This is IMM PR documentation update regarding the enhancement in the ticket
#3019 - imm: return try-again on write requests if FS is unresponsive.

 

To view all changes, open the doc with LibreOffice Writer then click
Edit/Trace Changes/Manage.

 

Please help to review it. Thanks.

 

Regards, Vu

 


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amfnd: don't attempt su failover if active controller is rebooting [#3035]

2019-05-08 Thread Nagendra Kumar
Hi Alex,

The patch looks good to me. Ack.

 

Thanks

-Nagendra, +91-9866424860

High Availability Solutions

(OpenSAF Support and Services)

  www.hasolutions.in

  cont...@hasolutions.in

Delaware, USA: +1 508-422-7725|Hyderabad, India: +91 798-992-5293 

 

 

From: Jones, Alex [mailto:ajo...@rbbn.com] 
Sent: 08 May 2019 01:17
To: hans.nordeb...@ericsson.com; nagen...@hasolutions.in; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net; Jones, Alex
Subject: [PATCH 1/1] amfnd: don't attempt su failover if active controller is 
rebooting [#3035]

 

In N+M model CSI-remove responses can get lost if active controller reboots.
In this case SG will be stuck in unstable state, and standby will never get
assignments.

We are the active controller, active for N+M, SU failover is set, and
failfast on termination failure is set for the nodes. If a component in the
SU crashes, and another component fails during cleanup, the node does
failfast. It currently attempts to do su failover in this case, but the
csi-remove responses from the payload can get lost because we are rebooting.
They eventually show up on the new active, but we get message-id errors.

Set a flag when the active controller is about to reboot. If the flag is set,
then don't do SU failover. Let the new active take care of the failover.
---
src/amf/amfd/node.cc | 1 +
src/amf/amfd/node.h | 1 +
src/amf/amfd/sgproc.cc | 7 +++
src/amf/amfd/util.cc | 3 +++
4 files changed, 12 insertions(+)

diff --git a/src/amf/amfd/node.cc b/src/amf/amfd/node.cc
index 7fc764f22..b8d8a7d77 100644
--- a/src/amf/amfd/node.cc
+++ b/src/amf/amfd/node.cc
@@ -121,6 +121,7 @@ void AVD_AVND::initialize() {
clm_pend_inv = {};
clm_change_start_preceded = {};
recvr_fail_sw = {};
+ actv_ctrl_reboot_in_progress = {};
admin_ng = {};
}

diff --git a/src/amf/amfd/node.h b/src/amf/amfd/node.h
index ecee5c591..dbe48dc43 100644
--- a/src/amf/amfd/node.h
+++ b/src/amf/amfd/node.h
@@ -140,6 +140,7 @@ class AVD_AVND {
CLM completed cb. */
bool recvr_fail_sw; /* to indicate there was node reboot because of node
failover/switchover.*/
+ bool actv_ctrl_reboot_in_progress;
AVD_AMF_NG *admin_ng; /* points to the nodegroup on which admin operation is
going on.*/
uint16_t node_up_msg_count; /* to count of node_up msg that director had
diff --git a/src/amf/amfd/sgproc.cc b/src/amf/amfd/sgproc.cc
index 1537acac3..7c8d9a558 100644
--- a/src/amf/amfd/sgproc.cc
+++ b/src/amf/amfd/sgproc.cc
@@ -478,6 +478,13 @@ static uint32_t sg_su_failover_func(AVD_SU *su) {
goto done;
}

+ if (su->su_on_node->actv_ctrl_reboot_in_progress) {
+ TRACE("'%s' is already going down, so not doing SU failover",
+ su->name.c_str());
+ rc = NCSCC_RC_SUCCESS;
+ goto done;
+ }
+
su->set_oper_state(SA_AMF_OPERATIONAL_DISABLED);
su->set_readiness_state(SA_AMF_READINESS_OUT_OF_SERVICE);
if (su->saAmfSUAdminState == SA_AMF_ADMIN_LOCKED)
diff --git a/src/amf/amfd/util.cc b/src/amf/amfd/util.cc
index 14a4e0485..0dc3e99e3 100644
--- a/src/amf/amfd/util.cc
+++ b/src/amf/amfd/util.cc
@@ -1802,6 +1802,9 @@ void avd_d2n_reboot_snd(AVD_AVND *node) {
if (avd_d2n_msg_snd(avd_cb, node, d2n_msg) != NCSCC_RC_SUCCESS) {
LOG_ER("%s: snd to %x failed", __FUNCTION__, node->node_info.nodeId);
d2n_msg_free(d2n_msg);
+ } else if (node->node_info.nodeId == avd_cb->node_id_avd) {
+ TRACE("rebooting active amf director which is ourself");
+ node->actv_ctrl_reboot_in_progress = true;
}
}

-- 
2.17.2



  _  

Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that is confidential and/or proprietary for the sole 
use of the intended recipient. Any review, disclosure, reliance or distribution 
by others or forwarding without express permission is strictly prohibited. If 
you are not the intended recipient, please notify the sender immediately and 
then delete all copies, including any attachments.

  _  


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel