Here is the latest version of the AMF patch for ticket [#79]. I have now made the policy for the minimum number of system controller nodes configurable (in amfd.conf; the IMM API is a bit too heavy-weight for me to use this late in the review process, and I leave its implementation as an exercise for anyone interested).

Another big change is that the patch is now re-based so that it can be applied on the default branch after ticket [#1620] was pushed. Apart from just solving merge conflicts, the patch also contains a few other modifications needed for ticket [#79] to work properly together with ticket [#1620].

Finally, I have put back the code for setting the sync state to "out of sync" after losing contact with the standby. After some digging in the archives, it appears the reason for this change was to make sure an si-swap cannot be performed until the (new) standby is in sync.

regards,
Anders Widell

On 04/01/2016 06:13 PM, Mathivanan Naickan Palanivelu wrote:
I think controlling this behavior at run time should be okay. (I didn't intend 
to say that introducing a build option is the only way,
And the aim was not for saving memory either :-))

So, to start with let's then introduce a global runtime environment variable(?) 
or IMM attribute - that would
control this standalone configuration changes to succeed.

We could think of adding restrictions to cluster-size later on perhaps.

Cheers,
Mathi.


-----Original Message-----
From: Anders Widell [mailto:[email protected]]
Sent: Friday, April 01, 2016 7:28 PM
To: Mathivanan Naickan Palanivelu; Praveen Malviya;
[email protected]; [email protected]; Nagendra Kumar
Cc: [email protected]
Subject: Re: [devel] [PATCH 1 of 1] amf: Support AMF configurations
containing more than two OpenSAF 2N SUs [#79]

Now you are talking about the possibility to save some memory on a 1-node
system by e.g. introducing #ifdefs in the code to disable code paths that deal
with redundancy. I can agree that in some very resource-constrained
embedded system this could make sense, but on the other hand, this is not
good from a testing perspective. You would have two separate builds that
needs to be tested. From a developer's perspective, it is not good either.
#ifdefs will mess up the code, and the developers also need to build and test
the code twice. And what happens when you get the second feature, which is
also enabled and disabled using #ifdefs. You will get four possible builds. With
the third feature, eight possible builds.

On the other hand, a redundant system must work also on a single node.
Otherwise it wouldn't be redundant - i.e. if you /require/ two nodes for the
system to operate then you have a 2-node system which is less available than
just one single node. If any of the two nodes fails then the whole system fails.
So we must in any case test that we can run on a single node without
problems. Unless we have a very strong use-case for saving those extra bytes
of code I would not like to introduce a build-time option for disabling
redundancy.

We could introduce a run-time option for disallowing single-node
configurations. In fact, I would argue in such a case that it would be an option
to set the minimum allowed number of nodes in the system - i.e. the
minimum size you can scale down to. It doesn't necessarily have to be two;
maybe someone wishes to disallow any configuration with less than five
nodes.

regards,
Anders Widell

On 04/01/2016 03:05 PM, Mathivanan Naickan Palanivelu wrote:
We could tie the symmetry to the 'deploy-standlone' option chosen.
i.e. if 'deploy-standone' feature is enabled don't expect(wherever)
standbys including to not expect things like setting up shared file system and
all code flows that expect a second/alternate to take over, etc!
I think this has to be thought through and not just considered as a
matter of 'allowing a configuration change' for a tool to work.
It is another thing that we should discussed this when the asymmetry in
configuration got created!
If it's about reflecting the 'state' then there is sufficient states
to convey the status of that application.


-----Original Message-----
From: Anders Widell [mailto:[email protected]]
Sent: Friday, April 01, 2016 5:49 PM
To: Mathivanan Naickan Palanivelu; Praveen Malviya;
[email protected]; [email protected]; Nagendra
Kumar
Cc: [email protected]
Subject: Re: [devel] [PATCH 1 of 1] amf: Support AMF configurations
containing more than two OpenSAF 2N SUs [#79]

Today it is perfectly possible to configure a 1-node system without
using any build-time option. The problem today is that if you expand
a 1-node system to a 2-node system, then you cannot later shrink it
back into a 1-node system again. This asymmetry that you cannot go
back to where you came from makes little sense to me. Suppose that
the second node has been physically removed, so that you really have
only one single node. What harm would it do to update the OpenSAF
configuration to reflect this fact of reality?
regards,
Anders Widell

On 04/01/2016 12:54 PM, Mathivanan Naickan Palanivelu wrote:
Ofcourse there could be applications(in different problem domains)
that
could be configured to run in standalone mode or in HA mode.
Such applications could still be configured(in AMF) to be run on
just 1
OpenSAF node, nothing in OpenSAF stops standalone applications today!
But, I don't think it is necessary to trickle down an application's
standalone
configuration into OpenSAF's configuration too!
Somehow this topic has come up again, and I don't understand the
requirement behind why a user's tool expects HA middleware also to
be
configured in standalone mode!
If we still want to satisfy such(adhoc?) tools, then we could do
that by bringing in a 'standalone' deployment option in OpenSAF, i.e.
./configure --deploy-standone. When this option is enable then
immxml-
tools, AMF and any other OpenSAF service would not expect a mandatory
STANDBY to exist, thereby also allow deletion or any other operation
of its interest!
Cheers,
Mathi.

-----Original Message-----
From: praveen malviya
Sent: Friday, April 01, 2016 3:52 PM
To: Anders Widell; [email protected];
[email protected]; Nagendra Kumar
Cc: [email protected]
Subject: Re: [devel] [PATCH 1 of 1] amf: Support AMF configurations
containing more than two OpenSAF 2N SUs [#79]

Please see my comments inline below:

On 31-Mar-16 6:50 PM, Anders Widell wrote:
Yes I added the check in the admin op as you suggested. But I
don't fully agree that the same check should be done when removing
system controller nodes. With the introduction of this feature, we
are starting to move away from the concept of different node types
(controller / payload). Indeed, I removed the "type" member
variable from the node class in the latest patch.
[Praveen]
Yes, all nodes can be controller nodes or rather all nodes are the
same except their roles. But that does not change the architecture
of OpenSAF w.r.t the redundancy model. i.e. The configuration is
still 2N
redundancy model.
i.e. The smallest opensaf sized cluster (without payloads) would
still be configured either of the two options as below:
(a) immxml-clustersize -s 2 -p0
During scale out, for spare addition atleast one standy has to
exist before proceeding to configure the rest of the cluster nodes
as standbys
(OR)
(b) immxml-clustersize -s 3 -p0
Obviously the 3rd node and all other nodes added after the two
nodes would act as spares.

To preserve backwards compatibility, I can agree to have this
check on systems that are configured with both system controller
nodes and payload nodes (as in your example with
immxml-clustersize). This would mean that AMF will reject removal
of any of the last two system controller nodes - IF the cluster
has payload nodes. What do you think
[Praveen]
That is the normal case anyhow today. Only difference we have to
make now is to allow deletion of spare nodes whether there are
payloads or not.
Thanks,
Praveen.

about this approach? I am not sure how easy this would be to
implement, but I can give it a try.

regards,
Anders Widell

On 03/31/2016 03:05 PM, praveen malviya wrote:
Hi,

In the diff patch, I have seen that admin operation on MW 2N SU
is not allowed when more than 2 SUs are configured which
translates to the fact that system is running with spare controllers.

A similar type of check is needed for deletion of controller
configuration from AMF. The check would not allow deletion of
controller if only two of them remained. It is inline with the
current OpenSAF implementation with the fact that we do not allow
any payload to get configured when user tries to generate imm.xml
with single controller and a given no. of payload because such a
configuration does not provide redundancy to payloads.

Note: ./immxml-clustersize -s 1 -p 1
error: Two SC's is required for clusters with payloads. Exiting!


Thanks,
Praveen

On 31-Mar-16 2:05 AM, Anders Widell wrote:
Here is a patch that addresses the review comments from Hans and
Praveen. It should be applied on top of the AMF patch that was
sent out for review.

thanks,
Anders Widell

On 03/30/2016 04:35 PM, Anders Widell wrote:
Hi!

See my replies inline, marked [AndersW].

regards,
Anders Widell

On 03/17/2016 11:32 AM, praveen malviya wrote:
Hi Anders,

Please find some comments and queries inline with [Praveen]

Thanks,
Praveen


On 29-Feb-16 8:44 PM, Anders Widell wrote:
osaf/services/saf/amf/amfd/clm.cc | 21 +++++-
     osaf/services/saf/amf/amfd/include/amfd.h |   2 +
     osaf/services/saf/amf/amfd/include/cb.h   |   1 +
     osaf/services/saf/amf/amfd/include/role.h |   2 +
     osaf/services/saf/amf/amfd/main.cc        |  78
++++++---------------------
     osaf/services/saf/amf/amfd/ndfsm.cc       |   9 ++-
     osaf/services/saf/amf/amfd/node.cc        |   7 +--
     osaf/services/saf/amf/amfd/role.cc        |  86
++++++++++++++++++++++++++++++-
     osaf/services/saf/amf/amfd/sgproc.cc      |  11 +++-
     osaf/services/saf/amf/amfnd/clm.cc        |  27 ++++++--
     10 files changed, 160 insertions(+), 84 deletions(-)


Add support for configuring the system with more than two
OpenSAF
2N
SUs. In
particular, this means that all OpenSAF directors must
support starting up and running without (initially) getting
any assignment from AMF.
Locking of
an OpenSAF 2N SU is currently not supported on a system
configured with more than two OpenSAF 2N SUs.
[Praveen] This patch does not contain any change for any
restricton on locking of OpenSAF 2N SU as mentioned above.
[AndersW] No. This restriction will be documented. Do you think
we need to add checks for this case in the admin op as well?
diff --git a/osaf/services/saf/amf/amfd/clm.cc
b/osaf/services/saf/amf/amfd/clm.cc
--- a/osaf/services/saf/amf/amfd/clm.cc
+++ b/osaf/services/saf/amf/amfd/clm.cc
@@ -21,8 +21,7 @@
     #include <amfd.h>
     #include <clm.h>
     #include <node.h>
-
-static SaVersionT clmVersion = { 'B', 4, 1 };
+#include "osaf_time.h"

     static void clm_node_join_complete(AVD_AVND *node)
     {
@@ -392,9 +391,21 @@ SaAisErrorT avd_clm_init(void)
             SaAisErrorT error = SA_AIS_OK;

         TRACE_ENTER();
-    error = saClmInitialize_4(&avd_cb->clmHandle,
&clm_callbacks,
&clmVersion);
-    if (SA_AIS_OK != error) {
-        LOG_ER("Failed to initialize with CLM %u", error);
+    for (;;) {
+        SaVersionT Version = { 'B', 4, 1 };
+        error = saClmInitialize_4(&avd_cb->clmHandle,
&clm_callbacks, &Version);
+        if (error == SA_AIS_ERR_TRY_AGAIN ||
+            error == SA_AIS_ERR_TIMEOUT ||
+                    error == SA_AIS_ERR_UNAVAILABLE) {
+            if (error != SA_AIS_ERR_TRY_AGAIN) {
+                LOG_WA("saClmInitialize_4 returned %u",
+                       (unsigned) error);
+            }
+            osaf_nanosleep(&kHundredMilliseconds);
+            continue;
+        }
+        if (error == SA_AIS_OK) break;
+        LOG_ER("Failed to Initialize with CLM: %u", error);
             goto done;
         }
         error = saClmSelectionObjectGet(avd_cb->clmHandle,
&avd_cb->clm_sel_obj);
diff --git a/osaf/services/saf/amf/amfd/include/amfd.h
b/osaf/services/saf/amf/amfd/include/amfd.h
--- a/osaf/services/saf/amf/amfd/include/amfd.h
+++ b/osaf/services/saf/amf/amfd/include/amfd.h
@@ -33,6 +33,7 @@
     #ifndef AVD_H
     #define AVD_H

+#include <stdint.h>
     #include "logtrace.h"

     #include "amf.h"
@@ -65,5 +66,6 @@
     #include "ckpt_msg.h"
     #include "ckpt_edu.h"
     #include "ckpt_updt.h"
+#include "saAmf.h"

     #endif
diff --git a/osaf/services/saf/amf/amfd/include/cb.h
b/osaf/services/saf/amf/amfd/include/cb.h
--- a/osaf/services/saf/amf/amfd/include/cb.h
+++ b/osaf/services/saf/amf/amfd/include/cb.h
@@ -207,6 +207,7 @@ typedef struct cl_cb_tag {
         SaClmHandleT clmHandle;
         SaSelectionObjectT clm_sel_obj;

+    bool fully_initialized;
         bool swap_switch; /* true - In middle of role switch.
*/

         /** true when active services (IMM, LOG, NTF, etc.)
exist diff --git a/osaf/services/saf/amf/amfd/include/role.h
b/osaf/services/saf/amf/amfd/include/role.h
--- a/osaf/services/saf/amf/amfd/include/role.h
+++ b/osaf/services/saf/amf/amfd/include/role.h
@@ -34,6 +34,8 @@ extern uint32_t
amfd_switch_qsd_stdby(AV
     extern uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb);
     extern uint32_t amfd_switch_qsd_actv(AVD_CL_CB *cb);
     extern uint32_t amfd_switch_actv_qsd(AVD_CL_CB *cb);
+extern uint32_t initialize_for_assignment(cl_cb_tag* cb,
+                                          SaAmfHAStateT
+ha_state);

     #endif /* ROLE_H */

diff --git a/osaf/services/saf/amf/amfd/main.cc
b/osaf/services/saf/amf/amfd/main.cc
--- a/osaf/services/saf/amf/amfd/main.cc
+++ b/osaf/services/saf/amf/amfd/main.cc
@@ -56,6 +56,7 @@
     #include <sutcomptype.h>
     #include <sutype.h>
     #include <su.h>
+#include "osaf_utility.h"

     static const char* internal_version_id_  __attribute__
((used)) =
"@(#) $Id: " INTERNAL_VERSION_ID " $";

@@ -445,7 +446,8 @@ static void rda_cb(uint32_t notused, PCS

         if (((avd_cb->avail_state_avd == SA_AMF_HA_STANDBY) ||
              (avd_cb->avail_state_avd == SA_AMF_HA_QUIESCED))
&&
-        (cb_info->info.io_role == PCS_RDA_ACTIVE)) {
+        (cb_info->info.io_role == PCS_RDA_ACTIVE ||
+        cb_info->info.io_role == PCS_RDA_STANDBY)) {

             uint32_t rc;
             AVD_EVT *evt;
@@ -474,7 +476,6 @@ static uint32_t initialize(void)
     {
         AVD_CL_CB *cb = avd_cb;
         int rc = NCSCC_RC_FAILURE;
-    SaVersionT ntfVersion = { 'A', 0x01, 0x01 };
         SaAmfHAStateT role;
         char *val;

@@ -524,8 +525,13 @@ static uint32_t initialize(void)
         }

         cb->init_state = AVD_INIT_BGN;
+    cb->mbcsv_sel_obj = -1;
+    cb->imm_sel_obj = -1;
+    cb->clm_sel_obj = -1;
+    cb->fully_initialized = false;
         cb->swap_switch = false;
         cb->active_services_exist = true;
+    cb->mbcsv_sel_obj = -1;
[Praveen] Duplicate initialization, already done above.
[Anders W] Will remove.
         cb->stby_sync_state = AVD_STBY_IN_SYNC;
         cb->sync_required = true;

@@ -544,67 +550,20 @@ static uint32_t initialize(void)
         /* get the node id of the node on which the AVD is running.
*/
         cb->node_id_avd = m_NCS_GET_NODE_ID;

-    if (avd_mds_init(cb) != NCSCC_RC_SUCCESS) {
-        LOG_ER("avd_mds_init FAILED");
-        goto done;
-    }
-
-    if (NCSCC_RC_FAILURE == avsv_mbcsv_register(cb)) {
-        LOG_ER("avsv_mbcsv_register FAILED");
-        goto done;
-    }
-
-    if (avd_clm_init() != SA_AIS_OK) {
-        LOG_EM("avd_clm_init FAILED");
-        goto done;
-    }
-
-    if (avd_imm_init(cb) != SA_AIS_OK) {
-        LOG_ER("avd_imm_init FAILED");
-        goto done;
-    }
-
-    if ((rc = saNtfInitialize(&cb->ntfHandle, nullptr,
&ntfVersion)) != SA_AIS_OK) {
-        LOG_ER("saNtfInitialize Failed (%u)", rc);
-        rc = NCSCC_RC_FAILURE;
-        goto done;
-    }
-
         if ((rc = rda_get_role(&role)) != NCSCC_RC_SUCCESS) {
             LOG_ER("rda_get_role FAILED");
             goto done;
         }

-    cb->avail_state_avd = role;
-
-    if (NCSCC_RC_SUCCESS != avd_mds_set_vdest_role(cb,
role)) {
-        LOG_ER("avd_mds_set_vdest_role FAILED");
-        goto done;
-    }
-
-    if (NCSCC_RC_SUCCESS != avsv_set_ckpt_role(cb, role)) {
-        LOG_ER("avsv_set_ckpt_role FAILED");
-        goto done;
-    }
-
         if ((rc = rda_register_callback(0, rda_cb)) !=
NCSCC_RC_SUCCESS) {
             LOG_ER("rda_register_callback FAILED %u", rc);
             goto done;
         }

-    if (role == SA_AMF_HA_ACTIVE) {
-        rc = avd_active_role_initialization(cb, role);
-        if (rc != NCSCC_RC_SUCCESS) {
-            LOG_ER("avd_active_role_initialization FAILED");
-            goto done;
-        }
-    }
-    else {
-        rc = avd_standby_role_initialization(cb);
-        if (rc != NCSCC_RC_SUCCESS) {
-            LOG_ER("avd_standby_role_initialization FAILED");
-            goto done;
-        }
+    if ((rc = initialize_for_assignment(cb, role))
+        != NCSCC_RC_SUCCESS) {
+        LOG_ER("initialize_for_assignment FAILED %u",
+ (unsigned)
rc);
+        goto done;
         }

         rc = NCSCC_RC_SUCCESS; @@ -647,14 +606,13 @@ static
void main_loop(void)
         fds[FD_TERM].events = POLLIN;
         fds[FD_MBX].fd = mbx_fd.rmv_obj;
         fds[FD_MBX].events = POLLIN;
-    fds[FD_MBCSV].fd = cb->mbcsv_sel_obj;
-    fds[FD_MBCSV].events = POLLIN;
-    fds[FD_CLM].fd = cb->clm_sel_obj;
-    fds[FD_CLM].events = POLLIN;
-    fds[FD_IMM].fd = cb->imm_sel_obj; // IMM fd must be last
in
array
-    fds[FD_IMM].events = POLLIN;
-
         while (1) {
+        fds[FD_MBCSV].fd = cb->mbcsv_sel_obj;
+        fds[FD_MBCSV].events = POLLIN;
+        fds[FD_CLM].fd = cb->clm_sel_obj;
+        fds[FD_CLM].events = POLLIN;
+        fds[FD_IMM].fd = cb->imm_sel_obj; // IMM fd must be
+ last in
array
+        fds[FD_IMM].events = POLLIN;

             if (cb->immOiHandle != 0) {
                 fds[FD_IMM].fd = cb->imm_sel_obj; diff --git
a/osaf/services/saf/amf/amfd/ndfsm.cc
b/osaf/services/saf/amf/amfd/ndfsm.cc
--- a/osaf/services/saf/amf/amfd/ndfsm.cc
+++ b/osaf/services/saf/amf/amfd/ndfsm.cc
@@ -190,8 +190,9 @@ void
avd_nd_ncs_su_assigned(AVD_CL_CB
*c
         TRACE_ENTER();

         for (const auto& ncs_su : avnd->list_of_ncs_su) {
-        if ((ncs_su->list_of_susi == AVD_SU_SI_REL_NULL) ||
-            (ncs_su->list_of_susi->fsm !=
AVD_SU_SI_STATE_ASGND))
{
+        if ((ncs_su->sg_of_su->curr_assigned_sus() < 2) &&
+            ((ncs_su->list_of_susi == AVD_SU_SI_REL_NULL) ||
+            (ncs_su->list_of_susi->fsm !=
+ AVD_SU_SI_STATE_ASGND))) {
                 TRACE_LEAVE();
                 /* this is an unassigned SU so no need to
scan further return here. */
                 return;
@@ -328,6 +329,10 @@ void
avd_mds_avnd_down_evh(AVD_CL_CB
*cb
             if (avd_cb->avail_state_avd == SA_AMF_HA_ACTIVE) {
                 avd_node_failover(node);
+            // Update standby out of sync if standby sc goes down
+            if (eravd_cb->node_id_avd_oth ==
node->node_info.nodeId) {
+                cb->stby_sync_state = AVD_STBY_OUT_OF_SYNC;
+            }
[Praveen] Deep down in a function call in avd_node_failover(),
AMF marks standby status IN_SYNC in some special case. AMF
marks
out of sync status whenever it gets sync request in chkop.cc
so it is not needed here.
[AndersW] I don't remember why this was added. Will check and
remove
if it doesn't cause any problems. Isn't it a bit strange
though, so say that the standby is in sync when in fact it is down?
             } else {
                 /* Remove dynamic info for node but keep in
nodeid tree.
                  * Possibly used at the end of controller
failover to diff --git a/osaf/services/saf/amf/amfd/node.cc
b/osaf/services/saf/amf/amfd/node.cc
--- a/osaf/services/saf/amf/amfd/node.cc
+++ b/osaf/services/saf/amf/amfd/node.cc
@@ -120,7 +120,7 @@ void AVD_AVND::initialize() {
       pg_csi_list = {};
       pg_csi_list.order = NCS_DBLIST_ANY_ORDER;
       pg_csi_list.cmp_cookie = avsv_dblist_uns32_cmp;
-  type = AVSV_AVND_CARD_PAYLOAD;
+  type = AVSV_AVND_CARD_SYS_CON;
[Praveen] Initially keeping as PAYLOAD, AMFD later changes
node type of controller AMFNDs to SYS_CON after evaluating
some
parameters.
Above default is set to SYS_CON, there is no change in this
patch which sets payload as payload. In this way node type of
a payload will also become SYS_CON.
[AndersW] I think actually the "type" member variable ought to
be removed, since it is not needed. I can update the patch to
remove this variable completely.
       rcv_msg_id = {};
       snd_msg_id = {};
       cluster_list_node_next = {}; @@ -486,11 +486,6 @@
static SaAisErrorT node_ccb_completed_de
             return SA_AIS_ERR_BAD_OPERATION;
         }

-    if (node->type == AVSV_AVND_CARD_SYS_CON) {
-        report_ccb_validation_error(opdata, "Cannot remove
controller node");
-        return SA_AIS_ERR_BAD_OPERATION;
-    }
-
[Praveen] Based on some earliar discussion, I remember
deletion of sys controller is restricted. So check should
modified to reject the operation if there are only two system
controlers in the
system.
[AndersW] Now you can create a cluster with many system
controller nodes (all nodes could be controllers), so we can't
have this restriction any longer. I don't see any reason to
treat the two-node case in a special way. You can create a
cluster consisting of only one single node, and then scale it
out to a two-node cluster. Why shouldn't it be possible to
scale it back to one
single node again?
         /* Check to see that the node is in admin locked
state before delete */
         if (node->saAmfNodeAdminState !=
SA_AMF_ADMIN_LOCKED_INSTANTIATION) {
             report_ccb_validation_error(opdata, "Node '%s' is
not locked instantiation", opdata->objectName.value); diff
--git a/osaf/services/saf/amf/amfd/role.cc
b/osaf/services/saf/amf/amfd/role.cc
--- a/osaf/services/saf/amf/amfd/role.cc
+++ b/osaf/services/saf/amf/amfd/role.cc
@@ -46,6 +46,7 @@
     #include <si_dep.h>
     #include "osaf_utility.h"
     #include "role.h"
+#include "nid_api.h"

     extern pthread_mutex_t imm_reinit_mutex;

@@ -73,7 +74,15 @@ void avd_role_change_evh(AVD_CL_CB
*cb,
         AVD_ROLE_CHG_CAUSE_T cause =
msg->msg_info.d2d_chg_role_req.cause;
         SaAmfHAStateT role =
msg->msg_info.d2d_chg_role_req.role;

-    TRACE_ENTER2("cause=%u, role=%u", cause, role);
+    TRACE_ENTER2("cause=%u, role=%u, current_role=%u",
+ cause,
role,
+        cb->avail_state_avd);
+
+    if ((status = initialize_for_assignment(cb, role))
+        != NCSCC_RC_SUCCESS) {
+        LOG_ER("initialize_for_assignment FAILED %u",
+            (unsigned) status);
+        _exit(EXIT_FAILURE);
+    }

         if (cb->avail_state_avd == role) {
             goto done;
@@ -128,6 +137,13 @@ void avd_role_change_evh(AVD_CL_CB
*cb,
         }

         if ((cause == AVD_FAIL_OVER) &&
+        (cb->avail_state_avd == SA_AMF_HA_QUIESCED) && (role
+ ==
SA_AMF_HA_STANDBY)) {
+        /* Fail-over Quiesced to Active */
+        status = NCSCC_RC_SUCCESS;
+        goto done;
+    }
[Praveen] A quiesced controller never goes from quiesced to
stanby only in swichover and not in failover.
So comment must be /*Fail-over Quiesced to standby (spare
controller role change)*/
[AndersW] Will update the comment.
+
+    if ((cause == AVD_FAIL_OVER) &&
             (cb->avail_state_avd == SA_AMF_HA_QUIESCED) &&
(role ==
SA_AMF_HA_ACTIVE)) {
             /* Fail-over Quiesced to Active */
             status = avd_role_failover_qsd_actv(cb, role); @@
-155,7 +171,73 @@ void avd_role_change_evh(AVD_CL_CB *cb,
         return;
     }

-
/**********************************************************
******************\
+uint32_t initialize_for_assignment(cl_cb_tag* cb,
+SaAmfHAStateT
ha_state)
+{
+    TRACE_ENTER2("ha_state = %d", static_cast<int>(ha_state));
+    SaVersionT ntfVersion = {'A', 0x01, 0x01};
+    uint32_t rc = NCSCC_RC_SUCCESS;
+    SaAisErrorT error;
+    if (cb->fully_initialized) goto done;
+    cb->avail_state_avd = ha_state;
+    if (ha_state == SA_AMF_HA_QUIESCED) {
+        if ((rc = nid_notify(const_cast<char*>("AMFD"),
+                     NCSCC_RC_SUCCESS, nullptr)) !=
NCSCC_RC_SUCCESS) {
+            LOG_ER("nid_notify failed");
+        }
+        goto done;
+    }
+    if ((rc = avd_mds_init(cb)) != NCSCC_RC_SUCCESS) {
+        LOG_ER("avd_mds_init FAILED");
+        goto done;
+    }
+    if ((rc = avsv_mbcsv_register(cb)) != NCSCC_RC_SUCCESS) {
+        LOG_ER("avsv_mbcsv_register FAILED");
+        goto done;
+    }
+    if (avd_clm_init() != SA_AIS_OK) {
+        LOG_EM("avd_clm_init FAILED");
+        rc = NCSCC_RC_FAILURE;
+        goto done;
+    }
+    if (avd_imm_init(cb) != SA_AIS_OK) {
+        LOG_ER("avd_imm_init FAILED");
+        rc = NCSCC_RC_FAILURE;
+        goto done;
+    }
+    if ((error = saNtfInitialize(&cb->ntfHandle, nullptr,
&ntfVersion)) !=
+        SA_AIS_OK) {
+        LOG_ER("saNtfInitialize Failed (%u)", error);
+        rc = NCSCC_RC_FAILURE;
+        goto done;
+    }
+    if ((rc = avd_mds_set_vdest_role(cb, ha_state)) !=
NCSCC_RC_SUCCESS) {
+        LOG_ER("avd_mds_set_vdest_role FAILED");
+        goto done;
+    }
+    if ((rc = avsv_set_ckpt_role(cb, ha_state)) !=
NCSCC_RC_SUCCESS) {
+        LOG_ER("avsv_set_ckpt_role FAILED");
+        goto done;
+    }
+    if (ha_state == SA_AMF_HA_ACTIVE) {
+        rc = avd_active_role_initialization(cb, ha_state);
+        if (rc != NCSCC_RC_SUCCESS) {
+            LOG_ER("avd_active_role_initialization FAILED");
+            goto done;
+        }
+    } else if (ha_state == SA_AMF_HA_STANDBY) {
+        rc = avd_standby_role_initialization(cb);
+        if (rc != NCSCC_RC_SUCCESS) {
+            LOG_ER("avd_standby_role_initialization FAILED");
+            goto done;
+        }
+    }
+    cb->fully_initialized = true;
+done:
+    TRACE_LEAVE2("rc = %u", rc);
+     return rc;
+}
+

+/*********************************************************
*******************
\
      * Function: avd_init_role_set
      *
      * Purpose:  AVSV function to handle AVD's initial role setting.
diff --git a/osaf/services/saf/amf/amfd/sgproc.cc
b/osaf/services/saf/amf/amfd/sgproc.cc
--- a/osaf/services/saf/amf/amfd/sgproc.cc
+++ b/osaf/services/saf/amf/amfd/sgproc.cc
@@ -1390,7 +1390,16 @@ void
avd_su_si_assign_evh(AVD_CL_CB
*cb,
                         /* Since a NCS SU has been assigned
trigger the node FSM. */
                         /* For (ncs_spec == SA_TRUE), su will
not be external, so su
                                will have node attached. */
-                    avd_nd_ncs_su_assigned(cb,
susi->su->su_on_node);
+                    for (AmfDb<uint32_t,
+ AVD_AVND>::const_iterator
it = node_id_db->begin();
+                        it != node_id_db->end(); it++) {
+                        AVD_AVND *node =
const_cast<AVD_AVND*>((*it).second);
+
+                        if (node->node_state ==
AVD_AVND_STATE_NCS_INIT && node->adest != 0) {
+                            avd_nd_ncs_su_assigned(cb, node);
+                        } else {
+                            TRACE("Node_state: %u adest: %"
+ PRIx64
" node not ready for assignments", node->node_state, node-
adest);
+                        }
+                    }
[Praveen] Could not understand this change? Since spare
controllers are also payloads, in which case adest can be 0.
Is this for headless case, there comp and su assignment
information comes before node
up
message.
[AndersW] This loop is needed to ensure set_leds is performed.
It is sufficient that we have an active and a standby
assignment for the OpenSAF 2N SU, so once we have that we need
to loop over all nodes
and
perform set_leds if possible.

I think the check that adest is non-zero was added by Hans, to
fix some problem that might be related to headless if I
remember correctly. @Hans, do you remember why this check was
needed?
                     }
                 }
             } else {
diff --git a/osaf/services/saf/amf/amfnd/clm.cc
b/osaf/services/saf/amf/amfnd/clm.cc
--- a/osaf/services/saf/amf/amfnd/clm.cc
+++ b/osaf/services/saf/amf/amfnd/clm.cc
@@ -37,6 +37,7 @@
     #include "avnd.h"
     #include "mds_pvt.h"
     #include "nid_api.h"
+#include "osaf_time.h"

     static void clm_node_left(SaClmNodeIdT node_id)
     {
@@ -166,7 +167,6 @@ uint32_t
avnd_evt_avd_node_up_evh(AVND_C
         info = &evt->info.avd->msg_info.d2n_node_up;

         /*** update this node with the supplied parameters ***/
-    cb->type = info->node_type;
         cb->su_failover_max = info->su_failover_max;
         cb->su_failover_prob = info->su_failover_prob;

@@ -249,8 +249,6 @@ done:
         return;
     }

-static SaVersionT Version = { 'B', 4, 1 };
-
     static const SaClmCallbacksT_4 callbacks = {
             0,
             /*.saClmClusterTrackCallback =*/ clm_track_cb @@
-263,11 +261,24 @@ SaAisErrorT avnd_clm_init(void)

         TRACE_ENTER();
         avnd_cb->first_time_up = true;
[Praveen] Did not get the reason for its removal?
Not being updated any where else in the patch.
[AndersW] Nothing is removed here. The CLM initialization loop
has been enhanced to handle more error codes (TIMEOUT &
UNAVAILABLE).
-    error = saClmInitialize_4(&avnd_cb->clmHandle, &callbacks,
&Version);
-        if (SA_AIS_OK != error) {
-                LOG_ER("Failed to Initialize with CLM: %u", error);
-                goto done;
-        }
+    for (;;) {
+        SaVersionT Version = { 'B', 4, 1 };
+        error = saClmInitialize_4(&avnd_cb->clmHandle,
&callbacks,
+                      &Version);
+        if (error == SA_AIS_ERR_TRY_AGAIN ||
+            error == SA_AIS_ERR_TIMEOUT ||
+                    error == SA_AIS_ERR_UNAVAILABLE) {
+            if (error != SA_AIS_ERR_TRY_AGAIN) {
+                LOG_WA("saClmInitialize_4 returned %u",
+                       (unsigned) error);
+            }
+            osaf_nanosleep(&kHundredMilliseconds);
+            continue;
+        }
+        if (error == SA_AIS_OK) break;
+        LOG_ER("Failed to Initialize with CLM: %u", error);
+        goto done;
+    }
         error = saClmSelectionObjectGet(avnd_cb->clmHandle,
&avnd_cb->clm_sel_obj);
             if (SA_AIS_OK != error) {
                     LOG_ER("Failed to get CLM selectionObject:
%u", error);

-------------------------------------------------------------------
--
---------
Transform Data into Opportunity.
Accelerate data analysis in your applications with Intel Data
Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

# HG changeset patch
# Parent edf2dcd3884ed14760f42316c51850f52661a936
# Parent  d931c52adcfe6ec9d297f84a3dcc969189a64ec6
amf: Support AMF configurations containing more than two OpenSAF 2N SUs [#79]

Add support for configuring the system with more than two OpenSAF 2N SUs. In
particular, this means that all OpenSAF directors must support starting up and
running without (initially) getting any assignment from AMF. The directors on a
spare system controller (i.e. a controller that initially doesn't get an active
or standby assignment) will start up and register with AMF, but other than that
they will not do anything and just wait for a role change. When a role change
happens, they will continue with the rest of the start-up sequence and initalize
MDS so that they become visible on the network.

Locking of an OpenSAF 2N SU is currently not supported on a system configured
with more than two OpenSAF 2N SUs. This means that a controller node can start
up as a spare, but once it has received an active or standby assignment it
cannot go back to the spare role without a node reboot.

diff --git a/osaf/services/saf/amf/amfd/ckpt_dec.cc b/osaf/services/saf/amf/amfd/ckpt_dec.cc
--- a/osaf/services/saf/amf/amfd/ckpt_dec.cc
+++ b/osaf/services/saf/amf/amfd/ckpt_dec.cc
@@ -273,7 +273,8 @@ void decode_node_config(NCS_UBAID *ub,
 	osaf_decode_uint32(ub, reinterpret_cast<uint32_t*>(&avnd->saAmfNodeAdminState));
 	osaf_decode_uint32(ub, reinterpret_cast<uint32_t*>(&avnd->saAmfNodeOperState));
 	osaf_decode_uint32(ub, reinterpret_cast<uint32_t*>(&avnd->node_state));
-	osaf_decode_uint32(ub, reinterpret_cast<uint32_t*>(&avnd->type));
+        uint32_t node_type;
+	osaf_decode_uint32(ub, &node_type);
 	osaf_decode_uint32(ub, &avnd->rcv_msg_id);
 	osaf_decode_uint32(ub, &avnd->snd_msg_id);	
 }
diff --git a/osaf/services/saf/amf/amfd/ckpt_enc.cc b/osaf/services/saf/amf/amfd/ckpt_enc.cc
--- a/osaf/services/saf/amf/amfd/ckpt_enc.cc
+++ b/osaf/services/saf/amf/amfd/ckpt_enc.cc
@@ -289,7 +289,7 @@ void encode_node_config(NCS_UBAID *ub, c
 	osaf_encode_uint32(ub, avnd->saAmfNodeAdminState);
 	osaf_encode_uint32(ub, avnd->saAmfNodeOperState);
 	osaf_encode_uint32(ub, avnd->node_state);
-	osaf_encode_uint32(ub, avnd->type);
+	osaf_encode_uint32(ub, AVSV_AVND_CARD_SYS_CON);
 	osaf_encode_uint32(ub, avnd->rcv_msg_id);
 	osaf_encode_uint32(ub, avnd->snd_msg_id);
 }
diff --git a/osaf/services/saf/amf/amfd/ckpt_updt.cc b/osaf/services/saf/amf/amfd/ckpt_updt.cc
--- a/osaf/services/saf/amf/amfd/ckpt_updt.cc
+++ b/osaf/services/saf/amf/amfd/ckpt_updt.cc
@@ -64,7 +64,6 @@ uint32_t avd_ckpt_node(AVD_CL_CB *cb, AV
 	node->node_state = ckpt_node->node_state;
 	node->rcv_msg_id = ckpt_node->rcv_msg_id;
 	node->snd_msg_id = ckpt_node->snd_msg_id;
-	node->type = ckpt_node->type;
 	node->node_info.member = ckpt_node->node_info.member;
 	node->node_info.bootTimestamp = ckpt_node->node_info.bootTimestamp;
 	node->node_info.initialViewNumber = ckpt_node->node_info.initialViewNumber;
diff --git a/osaf/services/saf/amf/amfd/clm.cc b/osaf/services/saf/amf/amfd/clm.cc
--- a/osaf/services/saf/amf/amfd/clm.cc
+++ b/osaf/services/saf/amf/amfd/clm.cc
@@ -21,8 +21,7 @@
 #include <amfd.h>
 #include <clm.h>
 #include <node.h>
-
-static SaVersionT clmVersion = { 'B', 4, 1 };
+#include "osaf_time.h"
 
 static void clm_node_join_complete(AVD_AVND *node)
 {
@@ -392,9 +391,21 @@ SaAisErrorT avd_clm_init(void)
         SaAisErrorT error = SA_AIS_OK;
 
 	TRACE_ENTER();
-	error = saClmInitialize_4(&avd_cb->clmHandle, &clm_callbacks, &clmVersion);
-	if (SA_AIS_OK != error) {
-		LOG_ER("Failed to initialize with CLM %u", error);
+	for (;;) {
+		SaVersionT Version = { 'B', 4, 1 };
+		error = saClmInitialize_4(&avd_cb->clmHandle, &clm_callbacks, &Version);
+		if (error == SA_AIS_ERR_TRY_AGAIN ||
+		    error == SA_AIS_ERR_TIMEOUT ||
+                    error == SA_AIS_ERR_UNAVAILABLE) {
+			if (error != SA_AIS_ERR_TRY_AGAIN) {
+				LOG_WA("saClmInitialize_4 returned %u",
+				       (unsigned) error);
+			}
+			osaf_nanosleep(&kHundredMilliseconds);
+			continue;
+		}
+		if (error == SA_AIS_OK) break;
+		LOG_ER("Failed to Initialize with CLM: %u", error);
 		goto done;
 	}
 	error = saClmSelectionObjectGet(avd_cb->clmHandle, &avd_cb->clm_sel_obj);
diff --git a/osaf/services/saf/amf/amfd/include/amfd.h b/osaf/services/saf/amf/amfd/include/amfd.h
--- a/osaf/services/saf/amf/amfd/include/amfd.h
+++ b/osaf/services/saf/amf/amfd/include/amfd.h
@@ -33,6 +33,7 @@
 #ifndef AVD_H
 #define AVD_H
 
+#include <cstdint>
 #include "logtrace.h"
 
 #include "amf.h"
@@ -65,5 +66,6 @@
 #include "ckpt_msg.h"
 #include "ckpt_edu.h"
 #include "ckpt_updt.h"
+#include "saAmf.h"
 
 #endif
diff --git a/osaf/services/saf/amf/amfd/include/cb.h b/osaf/services/saf/amf/amfd/include/cb.h
--- a/osaf/services/saf/amf/amfd/include/cb.h
+++ b/osaf/services/saf/amf/amfd/include/cb.h
@@ -33,6 +33,7 @@
 #ifndef AVD_CB_H
 #define AVD_CB_H
 
+#include <cstdint>
 #include <saImmOi.h>
 #include <saClm.h>
 
@@ -185,6 +186,7 @@ typedef struct cl_cb_tag {
 	AVD_TMR node_sync_tmr;	/* The timer for reception of all node_up from all PLs. */
 	AVD_TMR heartbeat_tmr;	/* The timer for sending heart beats to nd. */
 	SaTimeT heartbeat_tmr_period;
+	uint32_t minimum_cluster_size;
 
 	uint32_t nodes_exit_cnt;	/* The counter to identifies the number
 				   of nodes that have exited the membership
@@ -208,6 +210,7 @@ typedef struct cl_cb_tag {
 	SaClmHandleT clmHandle;
 	SaSelectionObjectT clm_sel_obj;
 
+	bool fully_initialized;
 	bool swap_switch; /* true - In middle of role switch. */
 
 	/** true when active services (IMM, LOG, NTF, etc.) exist
diff --git a/osaf/services/saf/amf/amfd/include/node.h b/osaf/services/saf/amf/amfd/include/node.h
--- a/osaf/services/saf/amf/amfd/include/node.h
+++ b/osaf/services/saf/amf/amfd/include/node.h
@@ -125,11 +125,6 @@ class AVD_AVND {
   NCS_DB_LINK_LIST pg_csi_list;	/* list of csis for which pg is tracked 
 					 * from this node */
 
-  AVSV_AVND_CARD type;	/* field that describes if this node is sytem
-                         * controller or not.
-                         * Checkpointing - Sent as a one time update.
-                         */
-
   uint32_t rcv_msg_id;	/* The receive message id counter 
                          * Checkpointing - Sent independent update 
                          */
diff --git a/osaf/services/saf/amf/amfd/include/role.h b/osaf/services/saf/amf/amfd/include/role.h
--- a/osaf/services/saf/amf/amfd/include/role.h
+++ b/osaf/services/saf/amf/amfd/include/role.h
@@ -34,6 +34,8 @@ extern uint32_t amfd_switch_qsd_stdby(AV
 extern uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb);
 extern uint32_t amfd_switch_qsd_actv(AVD_CL_CB *cb);
 extern uint32_t amfd_switch_actv_qsd(AVD_CL_CB *cb);
+extern uint32_t initialize_for_assignment(cl_cb_tag* cb,
+                                          SaAmfHAStateT ha_state);
 
 #endif /* ROLE_H */
 
diff --git a/osaf/services/saf/amf/amfd/main.cc b/osaf/services/saf/amf/amfd/main.cc
--- a/osaf/services/saf/amf/amfd/main.cc
+++ b/osaf/services/saf/amf/amfd/main.cc
@@ -56,6 +56,8 @@
 #include <sutcomptype.h>
 #include <sutype.h>
 #include <su.h>
+#include "osaf_utility.h"
+#include "base/getenv.h"
 
 static const char* internal_version_id_  __attribute__ ((used)) = "@(#) $Id: " INTERNAL_VERSION_ID " $";
 
@@ -455,7 +457,8 @@ static void rda_cb(uint32_t notused, PCS
 
 	if (((avd_cb->avail_state_avd == SA_AMF_HA_STANDBY) ||
 	     (avd_cb->avail_state_avd == SA_AMF_HA_QUIESCED)) &&
-	    (cb_info->info.io_role == PCS_RDA_ACTIVE)) {
+	    (cb_info->info.io_role == PCS_RDA_ACTIVE ||
+		cb_info->info.io_role == PCS_RDA_STANDBY)) {
 
 		uint32_t rc;
 		AVD_EVT *evt;
@@ -484,7 +487,6 @@ static uint32_t initialize(void)
 {
 	AVD_CL_CB *cb = avd_cb;
 	int rc = NCSCC_RC_FAILURE;
-	SaVersionT ntfVersion = { 'A', 0x01, 0x01 };
 	SaAmfHAStateT role;
 	char *val;
 
@@ -534,6 +536,10 @@ static uint32_t initialize(void)
 	}
 
 	cb->init_state = AVD_INIT_BGN;
+	cb->mbcsv_sel_obj = -1;
+	cb->imm_sel_obj = -1;
+	cb->clm_sel_obj = -1;
+	cb->fully_initialized = false;
 	cb->swap_switch = false;
 	cb->active_services_exist = true;
 	cb->stby_sync_state = AVD_STBY_IN_SYNC;
@@ -553,78 +559,27 @@ static uint32_t initialize(void)
 			cb->heartbeat_tmr_period = AVSV_DEF_HB_PERIOD;
 		}
 	}
-	node_list_db = new AmfDb<uint32_t, AVD_FAIL_OVER_NODE>;
+        cb->minimum_cluster_size =
+            base::GetEnv("AVSV_MINIMUM_CLUSTER_SIZE", uint32_t{2});
+
+        node_list_db = new AmfDb<uint32_t, AVD_FAIL_OVER_NODE>;
 	/* get the node id of the node on which the AVD is running. */
 	cb->node_id_avd = m_NCS_GET_NODE_ID;
 
-	if (avd_mds_init(cb) != NCSCC_RC_SUCCESS) {
-		LOG_ER("avd_mds_init FAILED");
-		goto done;
-	}
-
-	if (NCSCC_RC_FAILURE == avsv_mbcsv_register(cb)) {
-		LOG_ER("avsv_mbcsv_register FAILED");
-		goto done;
-	}
-
-	if (avd_clm_init() != SA_AIS_OK) {
-		LOG_EM("avd_clm_init FAILED");
-		goto done;
-	}
-
-	if (avd_imm_init(cb) != SA_AIS_OK) {
-		LOG_ER("avd_imm_init FAILED");
-		goto done;
-	}
-
-	if ((rc = saNtfInitialize(&cb->ntfHandle, nullptr, &ntfVersion)) != SA_AIS_OK) {
-		LOG_ER("saNtfInitialize Failed (%u)", rc);
-		rc = NCSCC_RC_FAILURE;
-		goto done;
-	}
-
 	if ((rc = rda_get_role(&role)) != NCSCC_RC_SUCCESS) {
 		LOG_ER("rda_get_role FAILED");
 		goto done;
 	}
 
-	cb->avail_state_avd = role;
-
-	if (NCSCC_RC_SUCCESS != avd_mds_set_vdest_role(cb, role)) {
-		LOG_ER("avd_mds_set_vdest_role FAILED");
-		goto done;
-	}
-
-	if (NCSCC_RC_SUCCESS != avsv_set_ckpt_role(cb, role)) {
-		LOG_ER("avsv_set_ckpt_role FAILED");
-		goto done;
-	}
-
 	if ((rc = rda_register_callback(0, rda_cb)) != NCSCC_RC_SUCCESS) {
 		LOG_ER("rda_register_callback FAILED %u", rc);
 		goto done;
 	}
 
-	if (role == SA_AMF_HA_ACTIVE) {
-		rc = avd_active_role_initialization(cb, role);
-		if (rc != NCSCC_RC_SUCCESS) {
-			LOG_ER("avd_active_role_initialization FAILED");
-			goto done;
-		}
-
-		/* in a normal cluster start there will be no assignments object found so
-		 * nothing happens. Used to cleanup cached RTAs after SCs recover after
-		 * being headless.
-		 */
-		avd_susi_cleanup();
-		avd_compcsi_cleanup();
-	}
-	else {
-		rc = avd_standby_role_initialization(cb);
-		if (rc != NCSCC_RC_SUCCESS) {
-			LOG_ER("avd_standby_role_initialization FAILED");
-			goto done;
-		}
+	if ((rc = initialize_for_assignment(cb, role))
+		!= NCSCC_RC_SUCCESS) {
+		LOG_ER("initialize_for_assignment FAILED %u", (unsigned) rc);
+		goto done;
 	}
 
 	rc = NCSCC_RC_SUCCESS;
@@ -667,14 +622,13 @@ static void main_loop(void)
 	fds[FD_TERM].events = POLLIN;
 	fds[FD_MBX].fd = mbx_fd.rmv_obj;
 	fds[FD_MBX].events = POLLIN;
-	fds[FD_MBCSV].fd = cb->mbcsv_sel_obj;
-	fds[FD_MBCSV].events = POLLIN;
-	fds[FD_CLM].fd = cb->clm_sel_obj;
-	fds[FD_CLM].events = POLLIN;
-	fds[FD_IMM].fd = cb->imm_sel_obj; // IMM fd must be last in array
-	fds[FD_IMM].events = POLLIN;
-
 	while (1) {
+		fds[FD_MBCSV].fd = cb->mbcsv_sel_obj;
+		fds[FD_MBCSV].events = POLLIN;
+		fds[FD_CLM].fd = cb->clm_sel_obj;
+		fds[FD_CLM].events = POLLIN;
+		fds[FD_IMM].fd = cb->imm_sel_obj; // IMM fd must be last in array
+		fds[FD_IMM].events = POLLIN;
 		
 		if (cb->immOiHandle != 0) {
 			fds[FD_IMM].fd = cb->imm_sel_obj;
diff --git a/osaf/services/saf/amf/amfd/ndfsm.cc b/osaf/services/saf/amf/amfd/ndfsm.cc
--- a/osaf/services/saf/amf/amfd/ndfsm.cc
+++ b/osaf/services/saf/amf/amfd/ndfsm.cc
@@ -54,7 +54,7 @@ void avd_process_state_info_queue(AVD_CL
 
 	TRACE_ENTER();
 
-	TRACE("queue_size before processing: %lu", queue_size);
+	TRACE("queue_size before processing: %lu", (unsigned long) queue_size);
 
 	// recover assignments from state info
 	for(i=0 ; i<queue_size ; i++) {
@@ -115,7 +115,7 @@ void avd_process_state_info_queue(AVD_CL
 			}
 		}
 	}
-	TRACE("queue_size after processing: %lu", cb->evt_queue.size());
+	TRACE("queue_size after processing: %lu", (unsigned long) cb->evt_queue.size());
 	TRACE_LEAVE();
 }
 /*****************************************************************************
@@ -288,7 +288,7 @@ void avd_node_up_evh(AVD_CL_CB *cb, AVD_
 		uint32_t rc_node_up;
 		avnd->node_up_msg_count++;
 		rc_node_up = avd_count_node_up(cb);
-		if (rc_node_up == sync_nd_size) {
+		if (rc_node_up == sync_nd_size-1) {
 			if (cb->node_sync_tmr.is_active) {
 				avd_stop_tmr(cb, &cb->node_sync_tmr);
 				TRACE("stop NodeSync timer");
@@ -349,11 +349,6 @@ void avd_node_up_evh(AVD_CL_CB *cb, AVD_
 		goto done;
 	}
 
-	/* Identify if this AVND is running on the same node as AVD */
-	if ((avnd->node_info.nodeId == cb->node_id_avd) || (avnd->node_info.nodeId == cb->node_id_avd_other)) {
-		avnd->type = AVSV_AVND_CARD_SYS_CON;
-	}
-
 	/* send the node up message to the node. */
 	if (avd_snd_node_up_msg(cb, avnd, avnd->rcv_msg_id) != NCSCC_RC_SUCCESS) {
 		/* log error that the director is not able to send the message */
@@ -401,12 +396,21 @@ void avd_node_up_evh(AVD_CL_CB *cb, AVD_
 			// this node is already up
 			avd_node_state_set(avnd, AVD_AVND_STATE_PRESENT);
 			avd_node_oper_state_set(avnd, SA_AMF_OPERATIONAL_ENABLED);
-
+			
 			// Update readiness state of all SUs which are waiting for node
 			// oper state
-			for (const auto& su :avnd->list_of_ncs_su) {
+			for (const auto& su : avnd->list_of_ncs_su) {
 				su->set_readiness_state(SA_AMF_READINESS_IN_SERVICE);
+				if (su->sg_of_su->sg_redundancy_model == SA_AMF_2N_REDUNDANCY_MODEL) {
+					if (su->sg_of_su->su_insvc(cb, su) == NCSCC_RC_FAILURE) {
+						LOG_ER("%s:%d %s", __FUNCTION__, __LINE__, su->name.value);
+						su->set_readiness_state(SA_AMF_READINESS_OUT_OF_SERVICE);
+						goto done;
+					}
+					avd_node_state_set(avnd, AVD_AVND_STATE_NCS_INIT);
+				}
 			}
+			
 			for (const auto& su :avnd->list_of_su) {
 				if (su->is_in_service())
 					su->set_readiness_state(SA_AMF_READINESS_IN_SERVICE);
@@ -467,10 +471,11 @@ done:
  * Function: avd_nd_ncs_su_assigned
  *
  * Purpose:  This function is the handler for node director event when a
- *           NCS SU is assigned with a SI. It verifies that all the
- *           NCS SUs are assigned and calls the SG module instantiation
- *           function for each of the SUs on the node. It will also change the 
- *           node FSM state to present and call the AvD state machine.
+ *           NCS SU is assigned with a SI or when a spare NCS 2N SU is
+ *           instantiated. It verifies that all the NCS SUs are assigned
+ *           and all the spare SUs are instantiated then calls the SG module
+ *           instantiation function for each of the SUs on the node. It will also
+ *           change the node FSM state to present and call the AvD state machine.
  *
  * Input: cb - the AVD control block
  *        avnd - The AvND which has sent the ack for all the component additions.
@@ -487,11 +492,22 @@ void avd_nd_ncs_su_assigned(AVD_CL_CB *c
 	TRACE_ENTER();
 
 	for (const auto& ncs_su : avnd->list_of_ncs_su) {
-		if ((ncs_su->list_of_susi == AVD_SU_SI_REL_NULL) ||
-		    (ncs_su->list_of_susi->fsm != AVD_SU_SI_STATE_ASGND)) {
-			TRACE_LEAVE();
-			/* this is an unassigned SU so no need to scan further return here. */
-			return;
+		if (ncs_su->list_of_susi == AVD_SU_SI_REL_NULL ||
+			ncs_su->list_of_susi->fsm != AVD_SU_SI_STATE_ASGND) {
+			if (ncs_su->sg_of_su->sg_redundancy_model == SA_AMF_NO_REDUNDANCY_MODEL) {
+				/* This is an unassigned nored ncs SU so no need to scan further, return here. */
+				TRACE_LEAVE();
+				return;
+			}
+			else if (ncs_su->sg_of_su->sg_redundancy_model == SA_AMF_2N_REDUNDANCY_MODEL &&
+					(ncs_su->sg_of_su->curr_assigned_sus() < 2 ||
+					ncs_su->saAmfSUPresenceState != SA_AMF_PRESENCE_INSTANTIATED)) {
+				/* This is an unassigned ncs 2N SU or not yet instantiated ncs spare 2N SU
+				 * so no need to scan further, return here.
+				 */
+				TRACE_LEAVE();
+				return;
+			}
 		}
 	}
 
@@ -625,6 +641,10 @@ void avd_mds_avnd_down_evh(AVD_CL_CB *cb
 
 		if (avd_cb->avail_state_avd == SA_AMF_HA_ACTIVE) {
 			avd_node_failover(node);
+			// Update standby out of sync if standby sc goes down
+			if (avd_cb->node_id_avd_other == node->node_info.nodeId) {
+				cb->stby_sync_state = AVD_STBY_OUT_OF_SYNC;
+			}
 		} else {
 			/* Remove dynamic info for node but keep in nodeid tree.
 			 * Possibly used at the end of controller failover to
diff --git a/osaf/services/saf/amf/amfd/ndproc.cc b/osaf/services/saf/amf/amfd/ndproc.cc
--- a/osaf/services/saf/amf/amfd/ndproc.cc
+++ b/osaf/services/saf/amf/amfd/ndproc.cc
@@ -995,6 +995,17 @@ void avd_data_update_req_evh(AVD_CL_CB *
 
 					su->set_pres_state(static_cast<SaAmfPresenceStateT>(l_val));
 
+					/* In the Quiesced node, ncs 2N SU is the spare SU so it's not assigned.
+					 * After its instantiation, avd_nd_ncs_su_assigned should be called
+					 * to see if all the ncs SUs on this node are ready.
+					 */
+					if (node->node_state == AVD_AVND_STATE_NCS_INIT && node->adest != 0 &&
+							l_val == SA_AMF_PRESENCE_INSTANTIATED &&
+							su->sg_of_su->sg_ncs_spec == true &&
+							su->sg_of_su->sg_redundancy_model == SA_AMF_2N_REDUNDANCY_MODEL) {
+						avd_nd_ncs_su_assigned(avd_cb, node);
+					}
+
 					if (su->su_on_node->admin_ng != nullptr)
 						process_su_si_response_for_ng(su, SA_AIS_OK);
 
diff --git a/osaf/services/saf/amf/amfd/node.cc b/osaf/services/saf/amf/amfd/node.cc
--- a/osaf/services/saf/amf/amfd/node.cc
+++ b/osaf/services/saf/amf/amfd/node.cc
@@ -121,7 +121,6 @@ void AVD_AVND::initialize() {
   pg_csi_list = {};
   pg_csi_list.order = NCS_DBLIST_ANY_ORDER;
   pg_csi_list.cmp_cookie = avsv_dblist_uns32_cmp;
-  type = AVSV_AVND_CARD_PAYLOAD;
   rcv_msg_id = {};
   snd_msg_id = {};
   cluster_list_node_next = {};
@@ -543,9 +542,21 @@ static SaAisErrorT node_ccb_completed_de
 		return SA_AIS_ERR_BAD_OPERATION;
 	}
 
-	if (node->type == AVSV_AVND_CARD_SYS_CON) {
-		report_ccb_validation_error(opdata, "Cannot remove controller node");
-		return SA_AIS_ERR_BAD_OPERATION;
+ 	for (const auto& ncs_su : node->list_of_ncs_su) {
+		if (ncs_su->sg_of_su->sg_redundancy_model ==
+		    SA_AMF_2N_REDUNDANCY_MODEL) {
+			if (ncs_su->sg_of_su->list_of_su.size() <=
+			    avd_cb->minimum_cluster_size) {
+				report_ccb_validation_error(
+					opdata, "Configured minimum cluster "
+					"size is %u (see "
+					"AVSV_MINIMUM_CLUSTER_SIZE in "
+					"amfd.conf)",
+					avd_cb->minimum_cluster_size);
+				return SA_AIS_ERR_BAD_OPERATION;
+			}
+			break;
+		}
 	}
 
 	/* Check to see that the node is in admin locked state before delete */
diff --git a/osaf/services/saf/amf/amfd/role.cc b/osaf/services/saf/amf/amfd/role.cc
--- a/osaf/services/saf/amf/amfd/role.cc
+++ b/osaf/services/saf/amf/amfd/role.cc
@@ -46,6 +46,7 @@
 #include <si_dep.h>
 #include "osaf_utility.h"
 #include "role.h"
+#include "nid_api.h"
 
 extern pthread_mutex_t imm_reinit_mutex;
 
@@ -73,7 +74,15 @@ void avd_role_change_evh(AVD_CL_CB *cb, 
 	AVD_ROLE_CHG_CAUSE_T cause = msg->msg_info.d2d_chg_role_req.cause;
 	SaAmfHAStateT role = msg->msg_info.d2d_chg_role_req.role;
 
-	TRACE_ENTER2("cause=%u, role=%u", cause, role);
+	TRACE_ENTER2("cause=%u, role=%u, current_role=%u", cause, role,
+		cb->avail_state_avd);
+
+	if ((status = initialize_for_assignment(cb, role))
+		!= NCSCC_RC_SUCCESS) {
+		LOG_ER("initialize_for_assignment FAILED %u",
+			(unsigned) status);
+		_exit(EXIT_FAILURE);
+	}
 
 	if (cb->avail_state_avd == role) {
 		goto done;
@@ -128,6 +137,13 @@ void avd_role_change_evh(AVD_CL_CB *cb, 
 	}
 
 	if ((cause == AVD_FAIL_OVER) &&
+	    (cb->avail_state_avd == SA_AMF_HA_QUIESCED) && (role == SA_AMF_HA_STANDBY)) {
+		/* Fail-over Quiesced to standby (spare controller role change) */
+		status = NCSCC_RC_SUCCESS;
+		goto done;
+	}
+
+	if ((cause == AVD_FAIL_OVER) &&
 	    (cb->avail_state_avd == SA_AMF_HA_QUIESCED) && (role == SA_AMF_HA_ACTIVE)) {
 		/* Fail-over Quiesced to Active */
 		status = avd_role_failover_qsd_actv(cb, role);
@@ -155,7 +171,80 @@ void avd_role_change_evh(AVD_CL_CB *cb, 
 	return;
 }
 
-/****************************************************************************\
+uint32_t initialize_for_assignment(cl_cb_tag* cb, SaAmfHAStateT ha_state)
+{
+	TRACE_ENTER2("ha_state = %d", static_cast<int>(ha_state));
+	SaVersionT ntfVersion = {'A', 0x01, 0x01};
+	uint32_t rc = NCSCC_RC_SUCCESS;
+	SaAisErrorT error;
+	if (cb->fully_initialized) goto done;
+	cb->avail_state_avd = ha_state;
+	if (ha_state == SA_AMF_HA_QUIESCED) {
+		if ((rc = nid_notify(const_cast<char*>("AMFD"),
+				     NCSCC_RC_SUCCESS, nullptr)) != NCSCC_RC_SUCCESS) {
+			LOG_ER("nid_notify failed");
+		}
+		goto done;
+	}
+	if ((rc = avd_mds_init(cb)) != NCSCC_RC_SUCCESS) {
+		LOG_ER("avd_mds_init FAILED");
+		goto done;
+	}
+	if ((rc = avsv_mbcsv_register(cb)) != NCSCC_RC_SUCCESS) {
+		LOG_ER("avsv_mbcsv_register FAILED");
+		goto done;
+	}
+	if (avd_clm_init() != SA_AIS_OK) {
+		LOG_EM("avd_clm_init FAILED");
+		rc = NCSCC_RC_FAILURE;
+		goto done;
+	}
+	if (avd_imm_init(cb) != SA_AIS_OK) {
+		LOG_ER("avd_imm_init FAILED");
+		rc = NCSCC_RC_FAILURE;
+		goto done;
+	}
+	if ((error = saNtfInitialize(&cb->ntfHandle, nullptr, &ntfVersion)) !=
+	    SA_AIS_OK) {
+		LOG_ER("saNtfInitialize Failed (%u)", error);
+		rc = NCSCC_RC_FAILURE;
+		goto done;
+	}
+	if ((rc = avd_mds_set_vdest_role(cb, ha_state)) != NCSCC_RC_SUCCESS) {
+		LOG_ER("avd_mds_set_vdest_role FAILED");
+		goto done;
+	}
+	if ((rc = avsv_set_ckpt_role(cb, ha_state)) != NCSCC_RC_SUCCESS) {
+		LOG_ER("avsv_set_ckpt_role FAILED");
+		goto done;
+	}
+	if (ha_state == SA_AMF_HA_ACTIVE) {
+		rc = avd_active_role_initialization(cb, ha_state);
+		if (rc != NCSCC_RC_SUCCESS) {
+			LOG_ER("avd_active_role_initialization FAILED");
+			goto done;
+		}
+
+		/* in a normal cluster start there will be no assignments object found so
+		 * nothing happens. Used to cleanup cached RTAs after SCs recover after
+		 * being headless.
+		 */
+		avd_susi_cleanup();
+		avd_compcsi_cleanup();
+	} else if (ha_state == SA_AMF_HA_STANDBY) {
+		rc = avd_standby_role_initialization(cb);
+		if (rc != NCSCC_RC_SUCCESS) {
+			LOG_ER("avd_standby_role_initialization FAILED");
+			goto done;
+		}
+	}
+	cb->fully_initialized = true;
+done:
+	TRACE_LEAVE2("rc = %u", rc);
+ 	return rc;
+}
+
+/**************************************************************************** \
  * Function: avd_init_role_set
  *
  * Purpose:  AVSV function to handle AVD's initial role setting. 
diff --git a/osaf/services/saf/amf/amfd/sg_2n_fsm.cc b/osaf/services/saf/amf/amfd/sg_2n_fsm.cc
--- a/osaf/services/saf/amf/amfd/sg_2n_fsm.cc
+++ b/osaf/services/saf/amf/amfd/sg_2n_fsm.cc
@@ -1696,7 +1696,7 @@ uint32_t SG_2N::susi_success_sg_realign(
 				}
 			}
 
-			if ((state == SA_AMF_HA_ACTIVE) && (su->su_on_node->type == AVSV_AVND_CARD_SYS_CON) &&
+			if ((state == SA_AMF_HA_ACTIVE) &&
 			    (cb->node_id_avd == su->su_on_node->node_info.nodeId)) {
 				/* This is as a result of failover, start CLM tracking*/
 				(void) avd_clm_track_start();
diff --git a/osaf/services/saf/amf/amfd/sgproc.cc b/osaf/services/saf/amf/amfd/sgproc.cc
--- a/osaf/services/saf/amf/amfd/sgproc.cc
+++ b/osaf/services/saf/amf/amfd/sgproc.cc
@@ -698,7 +698,7 @@ void avd_su_oper_state_evh(AVD_CL_CB *cb
 	    (n2d_msg->msg_info.n2d_opr_state.rec_rcvr.saf_amf == SA_AMF_NODE_FAILFAST)) {
 		/* as of now do the same opearation as ncs su failure */
 		su->set_oper_state(SA_AMF_OPERATIONAL_DISABLED);
-		if ((node->type == AVSV_AVND_CARD_SYS_CON) && (node->node_info.nodeId == cb->node_id_avd)) {
+		if (node->node_info.nodeId == cb->node_id_avd) {
 			TRACE("Component in %s requested FAILFAST", su->name.value);
 		}
 
@@ -1416,7 +1416,16 @@ void avd_su_si_assign_evh(AVD_CL_CB *cb,
 					/* Since a NCS SU has been assigned trigger the node FSM. */
 					/* For (ncs_spec == SA_TRUE), su will not be external, so su
 						   will have node attached. */
-					avd_nd_ncs_su_assigned(cb, susi->su->su_on_node);
+					for (AmfDb<uint32_t, AVD_AVND>::const_iterator it = node_id_db->begin();
+						it != node_id_db->end(); it++) {
+						AVD_AVND *node = (*it).second;
+
+						if (node->node_state == AVD_AVND_STATE_NCS_INIT && node->adest != 0) {
+							avd_nd_ncs_su_assigned(cb, node);
+						} else {
+							TRACE("Node_state: %u adest: %" PRIx64 " node not ready for assignments", node->node_state, node->adest);
+						}
+					}
 				}
 			}
 		} else {
diff --git a/osaf/services/saf/amf/amfd/su.cc b/osaf/services/saf/amf/amfd/su.cc
--- a/osaf/services/saf/amf/amfd/su.cc
+++ b/osaf/services/saf/amf/amfd/su.cc
@@ -882,7 +882,19 @@ void AVD_SU::lock(SaImmOiHandleT immoi_h
 	bool is_oper_successful = true;
 
 	TRACE_ENTER2("'%s'", name.value);
-	/* Change the admin state to lock and return as cluster timer haven't expired.*/
+
+	if (sg_of_su->sg_ncs_spec == true &&
+	    sg_of_su->sg_redundancy_model == SA_AMF_2N_REDUNDANCY_MODEL &&
+	    sg_of_su->list_of_su.size() > 2) {
+		report_admin_op_error(immoi_handle, invocation,
+				      SA_AIS_ERR_NOT_SUPPORTED, nullptr,
+				      "Locking OpenSAF 2N SU is currently not "
+				      "supported when more than two SUs are "
+				      "configured");
+		goto done;
+	}
+
+        /* Change the admin state to lock and return as cluster timer haven't expired.*/
 	if (avd_cb->init_state == AVD_INIT_DONE) {
 		set_readiness_state(SA_AMF_READINESS_OUT_OF_SERVICE);
 		set_admin_state(SA_AMF_ADMIN_LOCKED);
diff --git a/osaf/services/saf/amf/amfd/tests/test_ckpt_enc_dec.cc b/osaf/services/saf/amf/amfd/tests/test_ckpt_enc_dec.cc
--- a/osaf/services/saf/amf/amfd/tests/test_ckpt_enc_dec.cc
+++ b/osaf/services/saf/amf/amfd/tests/test_ckpt_enc_dec.cc
@@ -338,7 +338,6 @@ TEST_F(CkptEncDecTest, testEncDecAvdNode
   avnd.saAmfNodeAdminState = SA_AMF_ADMIN_UNLOCKED;
   avnd.saAmfNodeOperState = SA_AMF_OPERATIONAL_ENABLED;
   avnd.node_state = AVD_AVND_STATE_NCS_INIT;
-  avnd.type = AVSV_AVND_CARD_SYS_CON;
   avnd.rcv_msg_id = 0xA;
   avnd.snd_msg_id = 0xB;
 
@@ -374,7 +373,6 @@ TEST_F(CkptEncDecTest, testEncDecAvdNode
   ASSERT_EQ(avnd.saAmfNodeAdminState, SA_AMF_ADMIN_UNLOCKED);
   ASSERT_EQ(avnd.saAmfNodeOperState, SA_AMF_OPERATIONAL_ENABLED);
   ASSERT_EQ(avnd.node_state, AVD_AVND_STATE_NCS_INIT);
-  ASSERT_EQ(avnd.type, AVSV_AVND_CARD_SYS_CON);
   ASSERT_EQ(avnd.rcv_msg_id, 0xA);
   ASSERT_EQ(avnd.snd_msg_id, 0xB);
 }
diff --git a/osaf/services/saf/amf/amfd/util.cc b/osaf/services/saf/amf/amfd/util.cc
--- a/osaf/services/saf/amf/amfd/util.cc
+++ b/osaf/services/saf/amf/amfd/util.cc
@@ -262,7 +262,7 @@ uint32_t avd_snd_node_up_msg(AVD_CL_CB *
 	/* prepare the message */
 	d2n_msg->msg_type = AVSV_D2N_NODE_UP_MSG;
 	d2n_msg->msg_info.d2n_node_up.node_id = avnd->node_info.nodeId;
-	d2n_msg->msg_info.d2n_node_up.node_type = avnd->type;
+	d2n_msg->msg_info.d2n_node_up.node_type = AVSV_AVND_CARD_SYS_CON;
 	d2n_msg->msg_info.d2n_node_up.su_failover_max = avnd->saAmfNodeSuFailoverMax;
 	d2n_msg->msg_info.d2n_node_up.su_failover_prob = avnd->saAmfNodeSuFailOverProb;
 
@@ -1180,13 +1180,6 @@ uint32_t avd_snd_set_leds_msg(AVD_CL_CB 
 		return NCSCC_RC_FAILURE;
 	}
 
-	/* If we have wrongly identified the card type of avnd, its time to correct it */
-	if ((cb->node_id_avd_other == avnd->node_info.nodeId) && (avnd->type != AVSV_AVND_CARD_SYS_CON)) {
-		avnd->type = AVSV_AVND_CARD_SYS_CON;
-		/* checkpoint this information */
-		m_AVSV_SEND_CKPT_UPDT_ASYNC_UPDT(cb, avnd, AVSV_CKPT_AVD_NODE_CONFIG);
-	}
-
 	/* prepare the message. */
 	d2n_msg = new AVSV_DND_MSG();
 
@@ -1431,7 +1424,6 @@ int amfd_file_dump(const char *filename)
 		fprintf(f, "    saAmfNodeOperState: %s\n",
 				avd_oper_state_name[node->saAmfNodeOperState]);
 		fprintf(f, "    node_state: %u\n", node->node_state);
-		fprintf(f, "    type: %u\n", node->type);
 		fprintf(f, "    adest:%" PRIx64 "\n", node->adest);
 		fprintf(f, "    rcv_msg_id: %u\n", node->rcv_msg_id);
 		fprintf(f, "    snd_msg_id: %u\n", node->snd_msg_id);
diff --git a/osaf/services/saf/amf/amfnd/clm.cc b/osaf/services/saf/amf/amfnd/clm.cc
--- a/osaf/services/saf/amf/amfnd/clm.cc
+++ b/osaf/services/saf/amf/amfnd/clm.cc
@@ -38,6 +38,8 @@
 #include "mds_pvt.h"
 #include "nid_api.h"
 #include "amf_si_assign.h"
+#include "osaf_time.h"
+
 
 static void clm_node_left(SaClmNodeIdT node_id)
 {
@@ -173,7 +175,6 @@ uint32_t avnd_evt_avd_node_up_evh(AVND_C
 	info = &evt->info.avd->msg_info.d2n_node_up;
 
 	/*** update this node with the supplied parameters ***/
-	cb->type = info->node_type;
 	cb->su_failover_max = info->su_failover_max;
 	cb->su_failover_prob = info->su_failover_prob;
 
@@ -258,37 +259,78 @@ done:
 	return;
 }
 
-static SaVersionT Version = { 'B', 4, 1 };
-
 static const SaClmCallbacksT_4 callbacks = {
         0,
         /*.saClmClusterTrackCallback =*/ clm_track_cb
 };
 
-SaAisErrorT avnd_clm_init(void)
+SaAisErrorT avnd_clm_init(AVND_CB* cb)
 {
         SaAisErrorT error = SA_AIS_OK;
-        SaUint8T trackFlags = SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY;
+        SaUint8T trackFlags = SA_TRACK_CURRENT | SA_TRACK_CHANGES_ONLY;
 
 	TRACE_ENTER();
-	avnd_cb->first_time_up = true;
-	error = saClmInitialize_4(&avnd_cb->clmHandle, &callbacks, &Version);
-        if (SA_AIS_OK != error) {
-                LOG_ER("Failed to Initialize with CLM: %u", error);
-                goto done;
-        }
-	error = saClmSelectionObjectGet(avnd_cb->clmHandle, &avnd_cb->clm_sel_obj);
-        if (SA_AIS_OK != error) {
-                LOG_ER("Failed to get CLM selectionObject: %u", error);
-                goto done;
-        }
-	error = saClmClusterTrack_4(avnd_cb->clmHandle, trackFlags, nullptr);
-        if (SA_AIS_OK != error) {
-                LOG_ER("Failed to start cluster tracking: %u", error);
-                goto done;
-        }
+
+	cb->first_time_up = true;
+	cb->clmHandle = 0;
+	for (;;) {
+		SaVersionT Version = { 'B', 4, 1 };
+		error = saClmInitialize_4(&cb->clmHandle, &callbacks, &Version);
+		if (error == SA_AIS_ERR_TRY_AGAIN ||
+		    error == SA_AIS_ERR_TIMEOUT ||
+                    error == SA_AIS_ERR_UNAVAILABLE) {
+			if (error != SA_AIS_ERR_TRY_AGAIN) {
+				LOG_WA("saClmInitialize_4 returned %u",
+				       (unsigned) error);
+			}
+			osaf_nanosleep(&kHundredMilliseconds);
+			continue;
+		}
+		if (error == SA_AIS_OK) break;
+		LOG_ER("Failed to Initialize with CLM: %u", error);
+		goto done;
+	}
+	error = saClmSelectionObjectGet(cb->clmHandle, &cb->clm_sel_obj);
+	if (SA_AIS_OK != error) {
+		LOG_ER("Failed to get CLM selectionObject: %u", error);
+		goto done;
+	}
+	error = saClmClusterTrack_4(cb->clmHandle, trackFlags, nullptr);
+	if (SA_AIS_OK != error) {
+		LOG_ER("Failed to start cluster tracking: %u", error);
+		goto done;
+	}
 
 done:
 	TRACE_LEAVE();
         return error;
 }
+
+static void* avnd_clm_init_thread(void* arg)
+{
+	TRACE_ENTER();
+	AVND_CB* cb = static_cast<AVND_CB*>(arg);
+
+	avnd_clm_init(cb);
+
+	TRACE_LEAVE();
+	return nullptr;
+}
+
+SaAisErrorT avnd_start_clm_init_bg()
+{
+	pthread_t thread;
+	pthread_attr_t attr;
+	pthread_attr_init(&attr);
+	pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
+
+	if (pthread_create(&thread, &attr, avnd_clm_init_thread, avnd_cb)
+	    != 0) {
+		LOG_ER("pthread_create FAILED: %s", strerror(errno));
+		exit(EXIT_FAILURE);
+	}
+
+	pthread_attr_destroy(&attr);
+
+	return SA_AIS_OK;
+}
diff --git a/osaf/services/saf/amf/amfnd/di.cc b/osaf/services/saf/amf/amfnd/di.cc
--- a/osaf/services/saf/amf/amfnd/di.cc
+++ b/osaf/services/saf/amf/amfnd/di.cc
@@ -515,7 +515,7 @@ uint32_t avnd_evt_mds_avd_up_evh(AVND_CB
 		NCS_NODE_ID node_id, my_node_id = ncs_get_node_id();
 		node_id = m_NCS_NODE_ID_FROM_MDS_DEST(evt->info.mds.mds_dest);
 
-		if ((node_id == my_node_id) && (cb->type == AVSV_AVND_CARD_SYS_CON)) {
+		if (node_id == my_node_id) {
 			TRACE("Starting hb supervision of local avd");
 			avnd_stop_tmr(cb, &cb->hb_duration_tmr);
 			avnd_start_tmr(cb, &cb->hb_duration_tmr,
diff --git a/osaf/services/saf/amf/amfnd/include/avnd_cb.h b/osaf/services/saf/amf/amfnd/include/avnd_cb.h
--- a/osaf/services/saf/amf/amfnd/include/avnd_cb.h
+++ b/osaf/services/saf/amf/amfnd/include/avnd_cb.h
@@ -71,7 +71,6 @@ typedef struct avnd_cb_tag {
 	uint32_t su_failover_max;	/* max SU failovers (config) */
 	uint32_t su_failover_cnt;	/* su failover cnt within a probation period */
 	AVND_TMR node_err_esc_tmr;	/* node err esc tmr */
-        AVSV_AVND_CARD type;    /* node type (scxb or payload) */
 	SaNameT amf_nodeName;
         SaClmClusterNodeT_4 node_info;    /* this node's info */
 
diff --git a/osaf/services/saf/amf/amfnd/include/avnd_clm.h b/osaf/services/saf/amf/amfnd/include/avnd_clm.h
--- a/osaf/services/saf/amf/amfnd/include/avnd_clm.h
+++ b/osaf/services/saf/amf/amfnd/include/avnd_clm.h
@@ -33,6 +33,7 @@
 
 struct avnd_cb_tag;
 
-extern SaAisErrorT avnd_clm_init(void);
+extern SaAisErrorT avnd_clm_init(avnd_cb_tag* cb);
+extern SaAisErrorT avnd_start_clm_init_bg();
 
 #endif
diff --git a/osaf/services/saf/amf/amfnd/main.cc b/osaf/services/saf/amf/amfnd/main.cc
--- a/osaf/services/saf/amf/amfnd/main.cc
+++ b/osaf/services/saf/amf/amfnd/main.cc
@@ -197,43 +197,6 @@ done:
 	exit(1);
 }
 
-static int get_node_type(void)
-{
-        size_t no_of_matches;
-        int type;
-        char buf[32];
-        FILE *f = fopen(PKGSYSCONFDIR "/node_type", "r");
-
-        if (!f) {
-                LOG_ER("Could not open file %s - %s", PKGSYSCONFDIR "/node_type", strerror(errno));
-                return AVSV_AVND_CARD_PAYLOAD;
-        }
-
-	// Give length of buf -1 as argument to fscanf to avoid buffer overflows
-        if ((no_of_matches = fscanf(f, "%31s", buf)) > 0) {
-                if (strncmp(buf, "controller", sizeof(buf)) == 0) {
-			TRACE("Node type: controller");
-                        type = AVSV_AVND_CARD_SYS_CON;
-		}
-                else if (strncmp(buf, "payload", sizeof(buf)) == 0) {
-			TRACE("Node type: payload");
-                        type = AVSV_AVND_CARD_PAYLOAD;
-		}
-                else {
-                        LOG_ER("Unknown node type %s", buf);
-                        type = AVSV_AVND_CARD_PAYLOAD;
-                }
-        } else {
-		LOG_ER("fscanf FAILED for %s - %s", PKGSYSCONFDIR "/node_type", strerror(errno));
-                type = AVSV_AVND_CARD_PAYLOAD;
-        }
-
-        (void)fclose(f);
-
-	return type;
-}
-
-
 /****************************************************************************
   Name          : avnd_create
  
@@ -278,7 +241,7 @@ uint32_t avnd_create(void)
 	}
 
 	/* initialize external interfaces */
-	rc = avnd_clm_init();
+	rc = avnd_clm_init(cb);
 	if (SA_AIS_OK != rc) {
 		rc = NCSCC_RC_FAILURE;
 		goto done;
@@ -363,8 +326,6 @@ AVND_CB *avnd_cb_create()
 	/* initialize healthcheck db */
 	avnd_hcdb_init(cb);
 
-	avnd_cb->type = static_cast<AVSV_AVND_CARD>(get_node_type());
-
 	/* initialize pg db */
 	if (NCSCC_RC_SUCCESS != avnd_pgdb_init(cb))
 		goto err;
@@ -598,18 +559,19 @@ void avnd_main_process(void)
 			break;
 		}
 
-		if (fds[FD_CLM].revents & POLLIN) {
-			TRACE("CLM event recieved");
+		if (avnd_cb->clmHandle && (fds[FD_CLM].revents & POLLIN)) {
+			//LOG_NO("DEBUG-> CLM event fd: %d sel_obj: %llu, clm handle: %llu", fds[FD_CLM].fd, avnd_cb->clm_sel_obj, avnd_cb->clmHandle);
 			result = saClmDispatch(avnd_cb->clmHandle, SA_DISPATCH_ALL);
 			switch (result) {
 			case SA_AIS_OK:
 				break;
 			case SA_AIS_ERR_BAD_HANDLE:
 				usleep(100000);
-				rc = avnd_clm_init();
+				LOG_NO("saClmDispatch BAD_HANDLE");
+				rc = avnd_start_clm_init_bg();
 				osafassert(rc == SA_AIS_OK);
 				break;
-			default:
+				default:
 				goto done;
 			}
 		}
diff --git a/osaf/services/saf/amf/config/amfd.conf b/osaf/services/saf/amf/config/amfd.conf
--- a/osaf/services/saf/amf/config/amfd.conf
+++ b/osaf/services/saf/amf/config/amfd.conf
@@ -9,6 +9,11 @@
 # A value lower than 100ms will be changed to 100ms
 export AVSV_HB_PERIOD=10000000000
 
+# Minimum number of nodes with system controller capability in the system. AMF
+# will reject attempts to delete a node from the IMM configuration if the total
+# number of such nodes would fall below this configured limit.
+#export AVSV_MINIMUM_CLUSTER_SIZE=2
+
 # Uncomment the next line to enable trace
 #args="--tracemask=0xffffffff"
 
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to