See comments marked [AndersW].

regards,
Anders Widell

On 04/04/2016 01:51 PM, praveen malviya wrote:
> Ack from me with minor comments:
>
> -It would be better to rename the variable to 
> "OSAF_AMF_MIN_CLUSTER_SIZE" to avoid any confusion with CLM cluster.
[AndersW] Ok, will change the name of the variable.
> -Also check for minimum cluster size must be added for deletion of MW 
> 2N and Nored SUs as user may delete SU and node in different CCBs.
[AndersW] This check was not present in the base code. Can we add it 
later (after FC)?
> -Since it is an environment variable, a user can still delete a SC in 
> a running cluster by configuring a spare controller with less value 
> and making that spare controller active. Such a problem can only be 
> fixed by making it unique cluster wide through IMM (without allowing 
> dynamic modification) and check-pointing it in AMFD.
[AndersW] Well, I am assuming the setting is the same on all nodes in 
the cluster. This is always a problem with settings in the *.conf file, 
and not unique to this particular variable. I am not really sure what 
you are worried about, though. I can always create a single-node cluster 
by configuring an extra dummy node that doesn't actually exist.
>
> I hope the testing has been done for backward compatibility(e.g. Old 
> Amfd communicating with new Payload or so on as node type is removed).
[AndersW] We have been running this feature as a prototype for half a 
year, so the feature has received a lot of testing. Due to review 
comments, the patch has changed a bit during the last few days though. 
If anything shows up as a result of these changes I am sure we can fix 
them before GA.
>
> Also noticed that on spare controllers, directors are becoming standby 
> in csi set callback and at the same time registering with MDS and 
> initializing with other service. I did not observe any time out and 
> try again. Although time out is handled, but in any case it should not 
> be greater than configured limits of csisetcallbacktimeout.
> Thanks,
> Praveen
>
>
> On 04-Apr-16 2:06 AM, Anders Widell wrote:
>> Here is the latest version of the AMF patch for ticket [#79]. I have now
>> made the policy for the minimum number of system controller nodes
>> configurable (in amfd.conf; the IMM API is a bit too heavy-weight for me
>> to use this late in the review process, and I leave its implementation
>> as an exercise for anyone interested).
>>
>> Another big change is that the patch is now re-based so that it can be
>> applied on the default branch after ticket [#1620] was pushed. Apart
>> from just solving merge conflicts, the patch also contains a few other
>> modifications needed for ticket [#79] to work properly together with
>> ticket [#1620].
>>
>> Finally, I have put back the code for setting the sync state to "out of
>> sync" after losing contact with the standby. After some digging in the
>> archives, it appears the reason for this change was to make sure an
>> si-swap cannot be performed until the (new) standby is in sync.
>>
>> regards,
>> Anders Widell
>>
>> On 04/01/2016 06:13 PM, Mathivanan Naickan Palanivelu wrote:
>>> I think controlling this behavior at run time should be okay. (I
>>> didn't intend to say that introducing a build option is the only way,
>>> And the aim was not for saving memory either :-))
>>>
>>> So, to start with let's then introduce a global runtime environment
>>> variable(?) or IMM attribute - that would
>>> control this standalone configuration changes to succeed.
>>>
>>> We could think of adding restrictions to cluster-size later on perhaps.
>>>
>>> Cheers,
>>> Mathi.
>>>
>>>
>>>> -----Original Message-----
>>>> From: Anders Widell [mailto:[email protected]]
>>>> Sent: Friday, April 01, 2016 7:28 PM
>>>> To: Mathivanan Naickan Palanivelu; Praveen Malviya;
>>>> [email protected]; [email protected]; Nagendra Kumar
>>>> Cc: [email protected]
>>>> Subject: Re: [devel] [PATCH 1 of 1] amf: Support AMF configurations
>>>> containing more than two OpenSAF 2N SUs [#79]
>>>>
>>>> Now you are talking about the possibility to save some memory on a
>>>> 1-node
>>>> system by e.g. introducing #ifdefs in the code to disable code paths
>>>> that deal
>>>> with redundancy. I can agree that in some very resource-constrained
>>>> embedded system this could make sense, but on the other hand, this is
>>>> not
>>>> good from a testing perspective. You would have two separate builds 
>>>> that
>>>> needs to be tested. From a developer's perspective, it is not good
>>>> either.
>>>> #ifdefs will mess up the code, and the developers also need to build
>>>> and test
>>>> the code twice. And what happens when you get the second feature,
>>>> which is
>>>> also enabled and disabled using #ifdefs. You will get four possible
>>>> builds. With
>>>> the third feature, eight possible builds.
>>>>
>>>> On the other hand, a redundant system must work also on a single node.
>>>> Otherwise it wouldn't be redundant - i.e. if you /require/ two nodes
>>>> for the
>>>> system to operate then you have a 2-node system which is less
>>>> available than
>>>> just one single node. If any of the two nodes fails then the whole
>>>> system fails.
>>>> So we must in any case test that we can run on a single node without
>>>> problems. Unless we have a very strong use-case for saving those
>>>> extra bytes
>>>> of code I would not like to introduce a build-time option for 
>>>> disabling
>>>> redundancy.
>>>>
>>>> We could introduce a run-time option for disallowing single-node
>>>> configurations. In fact, I would argue in such a case that it would
>>>> be an option
>>>> to set the minimum allowed number of nodes in the system - i.e. the
>>>> minimum size you can scale down to. It doesn't necessarily have to be
>>>> two;
>>>> maybe someone wishes to disallow any configuration with less than five
>>>> nodes.
>>>>
>>>> regards,
>>>> Anders Widell
>>>>
>>>> On 04/01/2016 03:05 PM, Mathivanan Naickan Palanivelu wrote:
>>>>> We could tie the symmetry to the 'deploy-standlone' option chosen.
>>>>> i.e. if 'deploy-standone' feature is enabled don't expect(wherever)
>>>>> standbys including to not expect things like setting up shared file
>>>>> system and
>>>> all code flows that expect a second/alternate to take over, etc!
>>>>> I think this has to be thought through and not just considered as a
>>>>> matter of 'allowing a configuration change' for a tool to work.
>>>>> It is another thing that we should discussed this when the 
>>>>> asymmetry in
>>>> configuration got created!
>>>>> If it's about reflecting the 'state' then there is sufficient states
>>>>> to convey the status of that application.
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Anders Widell [mailto:[email protected]]
>>>>>> Sent: Friday, April 01, 2016 5:49 PM
>>>>>> To: Mathivanan Naickan Palanivelu; Praveen Malviya;
>>>>>> [email protected]; [email protected]; Nagendra
>>>> Kumar
>>>>>> Cc: [email protected]
>>>>>> Subject: Re: [devel] [PATCH 1 of 1] amf: Support AMF configurations
>>>>>> containing more than two OpenSAF 2N SUs [#79]
>>>>>>
>>>>>> Today it is perfectly possible to configure a 1-node system without
>>>>>> using any build-time option. The problem today is that if you expand
>>>>>> a 1-node system to a 2-node system, then you cannot later shrink it
>>>>>> back into a 1-node system again. This asymmetry that you cannot go
>>>>>> back to where you came from makes little sense to me. Suppose that
>>>>>> the second node has been physically removed, so that you really have
>>>>>> only one single node. What harm would it do to update the OpenSAF
>>>> configuration to reflect this fact of reality?
>>>>>> regards,
>>>>>> Anders Widell
>>>>>>
>>>>>> On 04/01/2016 12:54 PM, Mathivanan Naickan Palanivelu wrote:
>>>>>>> Ofcourse there could be applications(in different problem domains)
>>>>>>> that
>>>>>> could be configured to run in standalone mode or in HA mode.
>>>>>>> Such applications could still be configured(in AMF) to be run on
>>>>>>> just 1
>>>>>> OpenSAF node, nothing in OpenSAF stops standalone applications 
>>>>>> today!
>>>>>>> But, I don't think it is necessary to trickle down an application's
>>>>>>> standalone
>>>>>> configuration into OpenSAF's configuration too!
>>>>>>> Somehow this topic has come up again, and I don't understand the
>>>>>>> requirement behind why a user's tool expects HA middleware also to
>>>>>>> be
>>>>>> configured in standalone mode!
>>>>>>> If we still want to satisfy such(adhoc?) tools, then we could do
>>>>>>> that by bringing in a 'standalone' deployment option in OpenSAF, 
>>>>>>> i.e.
>>>>>>> ./configure --deploy-standone. When this option is enable then
>>>>>>> immxml-
>>>>>> tools, AMF and any other OpenSAF service would not expect a 
>>>>>> mandatory
>>>>>> STANDBY to exist, thereby also allow deletion or any other operation
>>>>>> of its interest!
>>>>>>> Cheers,
>>>>>>> Mathi.
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: praveen malviya
>>>>>>>> Sent: Friday, April 01, 2016 3:52 PM
>>>>>>>> To: Anders Widell; [email protected];
>>>>>>>> [email protected]; Nagendra Kumar
>>>>>>>> Cc: [email protected]
>>>>>>>> Subject: Re: [devel] [PATCH 1 of 1] amf: Support AMF 
>>>>>>>> configurations
>>>>>>>> containing more than two OpenSAF 2N SUs [#79]
>>>>>>>>
>>>>>>>> Please see my comments inline below:
>>>>>>>>
>>>>>>>> On 31-Mar-16 6:50 PM, Anders Widell wrote:
>>>>>>>>> Yes I added the check in the admin op as you suggested. But I
>>>>>>>>> don't fully agree that the same check should be done when 
>>>>>>>>> removing
>>>>>>>>> system controller nodes. With the introduction of this 
>>>>>>>>> feature, we
>>>>>>>>> are starting to move away from the concept of different node 
>>>>>>>>> types
>>>>>>>>> (controller / payload). Indeed, I removed the "type" member
>>>>>>>>> variable from the node class in the latest patch.
>>>>>>>> [Praveen]
>>>>>>>> Yes, all nodes can be controller nodes or rather all nodes are the
>>>>>>>> same except their roles. But that does not change the architecture
>>>>>>>> of OpenSAF w.r.t the redundancy model. i.e. The configuration is
>>>>>>>> still 2N
>>>>>> redundancy model.
>>>>>>>> i.e. The smallest opensaf sized cluster (without payloads) would
>>>>>>>> still be configured either of the two options as below:
>>>>>>>> (a) immxml-clustersize -s 2 -p0
>>>>>>>> During scale out, for spare addition atleast one standy has to
>>>>>>>> exist before proceeding to configure the rest of the cluster nodes
>>>>>>>> as standbys
>>>>>>>> (OR)
>>>>>>>> (b) immxml-clustersize -s 3 -p0
>>>>>>>> Obviously the 3rd node and all other nodes added after the two
>>>>>>>> nodes would act as spares.
>>>>>>>>
>>>>>>>>> To preserve backwards compatibility, I can agree to have this
>>>>>>>>> check on systems that are configured with both system controller
>>>>>>>>> nodes and payload nodes (as in your example with
>>>>>>>>> immxml-clustersize). This would mean that AMF will reject removal
>>>>>>>>> of any of the last two system controller nodes - IF the cluster
>>>>>>>>> has payload nodes. What do you think
>>>>>>>> [Praveen]
>>>>>>>> That is the normal case anyhow today. Only difference we have to
>>>>>>>> make now is to allow deletion of spare nodes whether there are
>>>> payloads or not.
>>>>>>>> Thanks,
>>>>>>>> Praveen.
>>>>>>>>
>>>>>>>>> about this approach? I am not sure how easy this would be to
>>>>>>>>> implement, but I can give it a try.
>>>>>>>>>
>>>>>>>>> regards,
>>>>>>>>> Anders Widell
>>>>>>>>>
>>>>>>>>> On 03/31/2016 03:05 PM, praveen malviya wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> In the diff patch, I have seen that admin operation on MW 2N SU
>>>>>>>>>> is not allowed when more than 2 SUs are configured which
>>>>>>>>>> translates to the fact that system is running with spare
>>>>>>>>>> controllers.
>>>>>>>>>>
>>>>>>>>>> A similar type of check is needed for deletion of controller
>>>>>>>>>> configuration from AMF. The check would not allow deletion of
>>>>>>>>>> controller if only two of them remained. It is inline with the
>>>>>>>>>> current OpenSAF implementation with the fact that we do not 
>>>>>>>>>> allow
>>>>>>>>>> any payload to get configured when user tries to generate 
>>>>>>>>>> imm.xml
>>>>>>>>>> with single controller and a given no. of payload because such a
>>>>>>>>>> configuration does not provide redundancy to payloads.
>>>>>>>>>>
>>>>>>>>>> Note: ./immxml-clustersize -s 1 -p 1
>>>>>>>>>> error: Two SC's is required for clusters with payloads. Exiting!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Praveen
>>>>>>>>>>
>>>>>>>>>> On 31-Mar-16 2:05 AM, Anders Widell wrote:
>>>>>>>>>>> Here is a patch that addresses the review comments from Hans 
>>>>>>>>>>> and
>>>>>>>>>>> Praveen. It should be applied on top of the AMF patch that was
>>>>>>>>>>> sent out for review.
>>>>>>>>>>>
>>>>>>>>>>> thanks,
>>>>>>>>>>> Anders Widell
>>>>>>>>>>>
>>>>>>>>>>> On 03/30/2016 04:35 PM, Anders Widell wrote:
>>>>>>>>>>>> Hi!
>>>>>>>>>>>>
>>>>>>>>>>>> See my replies inline, marked [AndersW].
>>>>>>>>>>>>
>>>>>>>>>>>> regards,
>>>>>>>>>>>> Anders Widell
>>>>>>>>>>>>
>>>>>>>>>>>> On 03/17/2016 11:32 AM, praveen malviya wrote:
>>>>>>>>>>>>> Hi Anders,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please find some comments and queries inline with [Praveen]
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Praveen
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 29-Feb-16 8:44 PM, Anders Widell wrote:
>>>>>>>>>>>>>> osaf/services/saf/amf/amfd/clm.cc | 21 +++++-
>>>>>>>>>>>>>> osaf/services/saf/amf/amfd/include/amfd.h |   2 +
>>>>>>>>>>>>>> osaf/services/saf/amf/amfd/include/cb.h |   1 +
>>>>>>>>>>>>>> osaf/services/saf/amf/amfd/include/role.h |   2 +
>>>>>>>>>>>>>> osaf/services/saf/amf/amfd/main.cc |  78
>>>>>>>>>>>>>> ++++++---------------------
>>>>>>>>>>>>>> osaf/services/saf/amf/amfd/ndfsm.cc |   9 ++-
>>>>>>>>>>>>>> osaf/services/saf/amf/amfd/node.cc |   7 +--
>>>>>>>>>>>>>> osaf/services/saf/amf/amfd/role.cc |  86
>>>>>>>>>>>>>> ++++++++++++++++++++++++++++++-
>>>>>>>>>>>>>> osaf/services/saf/amf/amfd/sgproc.cc |  11 +++-
>>>>>>>>>>>>>> osaf/services/saf/amf/amfnd/clm.cc |  27 ++++++--
>>>>>>>>>>>>>>      10 files changed, 160 insertions(+), 84 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Add support for configuring the system with more than two
>>>>>>>>>>>>>> OpenSAF
>>>>>>>> 2N
>>>>>>>>>>>>>> SUs. In
>>>>>>>>>>>>>> particular, this means that all OpenSAF directors must
>>>>>>>>>>>>>> support starting up and running without (initially) getting
>>>>>>>>>>>>>> any assignment from AMF.
>>>>>>>>>>>>>> Locking of
>>>>>>>>>>>>>> an OpenSAF 2N SU is currently not supported on a system
>>>>>>>>>>>>>> configured with more than two OpenSAF 2N SUs.
>>>>>>>>>>>>> [Praveen] This patch does not contain any change for any
>>>>>>>>>>>>> restricton on locking of OpenSAF 2N SU as mentioned above.
>>>>>>>>>>>> [AndersW] No. This restriction will be documented. Do you 
>>>>>>>>>>>> think
>>>>>>>>>>>> we need to add checks for this case in the admin op as well?
>>>>>>>>>>>>>> diff --git a/osaf/services/saf/amf/amfd/clm.cc
>>>>>>>>>>>>>> b/osaf/services/saf/amf/amfd/clm.cc
>>>>>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/clm.cc
>>>>>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/clm.cc
>>>>>>>>>>>>>> @@ -21,8 +21,7 @@
>>>>>>>>>>>>>>      #include <amfd.h>
>>>>>>>>>>>>>>      #include <clm.h>
>>>>>>>>>>>>>>      #include <node.h>
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> -static SaVersionT clmVersion = { 'B', 4, 1 };
>>>>>>>>>>>>>> +#include "osaf_time.h"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      static void clm_node_join_complete(AVD_AVND *node)
>>>>>>>>>>>>>>      {
>>>>>>>>>>>>>> @@ -392,9 +391,21 @@ SaAisErrorT avd_clm_init(void)
>>>>>>>>>>>>>>              SaAisErrorT error = SA_AIS_OK;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          TRACE_ENTER();
>>>>>>>>>>>>>> -    error = saClmInitialize_4(&avd_cb->clmHandle,
>>>> &clm_callbacks,
>>>>>>>>>>>>>> &clmVersion);
>>>>>>>>>>>>>> -    if (SA_AIS_OK != error) {
>>>>>>>>>>>>>> -        LOG_ER("Failed to initialize with CLM %u", error);
>>>>>>>>>>>>>> +    for (;;) {
>>>>>>>>>>>>>> +        SaVersionT Version = { 'B', 4, 1 };
>>>>>>>>>>>>>> +        error = saClmInitialize_4(&avd_cb->clmHandle,
>>>>>>>>>>>>>> &clm_callbacks, &Version);
>>>>>>>>>>>>>> +        if (error == SA_AIS_ERR_TRY_AGAIN ||
>>>>>>>>>>>>>> +            error == SA_AIS_ERR_TIMEOUT ||
>>>>>>>>>>>>>> +                    error == SA_AIS_ERR_UNAVAILABLE) {
>>>>>>>>>>>>>> +            if (error != SA_AIS_ERR_TRY_AGAIN) {
>>>>>>>>>>>>>> + LOG_WA("saClmInitialize_4 returned %u",
>>>>>>>>>>>>>> +                       (unsigned) error);
>>>>>>>>>>>>>> +            }
>>>>>>>>>>>>>> + osaf_nanosleep(&kHundredMilliseconds);
>>>>>>>>>>>>>> +            continue;
>>>>>>>>>>>>>> +        }
>>>>>>>>>>>>>> +        if (error == SA_AIS_OK) break;
>>>>>>>>>>>>>> +        LOG_ER("Failed to Initialize with CLM: %u", error);
>>>>>>>>>>>>>>              goto done;
>>>>>>>>>>>>>>          }
>>>>>>>>>>>>>>          error = saClmSelectionObjectGet(avd_cb->clmHandle,
>>>>>>>>>>>>>> &avd_cb->clm_sel_obj);
>>>>>>>>>>>>>> diff --git a/osaf/services/saf/amf/amfd/include/amfd.h
>>>>>>>>>>>>>> b/osaf/services/saf/amf/amfd/include/amfd.h
>>>>>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/include/amfd.h
>>>>>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/include/amfd.h
>>>>>>>>>>>>>> @@ -33,6 +33,7 @@
>>>>>>>>>>>>>>      #ifndef AVD_H
>>>>>>>>>>>>>>      #define AVD_H
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +#include <stdint.h>
>>>>>>>>>>>>>>      #include "logtrace.h"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      #include "amf.h"
>>>>>>>>>>>>>> @@ -65,5 +66,6 @@
>>>>>>>>>>>>>>      #include "ckpt_msg.h"
>>>>>>>>>>>>>>      #include "ckpt_edu.h"
>>>>>>>>>>>>>>      #include "ckpt_updt.h"
>>>>>>>>>>>>>> +#include "saAmf.h"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      #endif
>>>>>>>>>>>>>> diff --git a/osaf/services/saf/amf/amfd/include/cb.h
>>>>>>>>>>>>>> b/osaf/services/saf/amf/amfd/include/cb.h
>>>>>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/include/cb.h
>>>>>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/include/cb.h
>>>>>>>>>>>>>> @@ -207,6 +207,7 @@ typedef struct cl_cb_tag {
>>>>>>>>>>>>>>          SaClmHandleT clmHandle;
>>>>>>>>>>>>>>          SaSelectionObjectT clm_sel_obj;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +    bool fully_initialized;
>>>>>>>>>>>>>>          bool swap_switch; /* true - In middle of role 
>>>>>>>>>>>>>> switch.
>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          /** true when active services (IMM, LOG, NTF, etc.)
>>>>>>>>>>>>>> exist diff --git a/osaf/services/saf/amf/amfd/include/role.h
>>>>>>>>>>>>>> b/osaf/services/saf/amf/amfd/include/role.h
>>>>>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/include/role.h
>>>>>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/include/role.h
>>>>>>>>>>>>>> @@ -34,6 +34,8 @@ extern uint32_t
>>>> amfd_switch_qsd_stdby(AV
>>>>>>>>>>>>>>      extern uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb);
>>>>>>>>>>>>>>      extern uint32_t amfd_switch_qsd_actv(AVD_CL_CB *cb);
>>>>>>>>>>>>>>      extern uint32_t amfd_switch_actv_qsd(AVD_CL_CB *cb);
>>>>>>>>>>>>>> +extern uint32_t initialize_for_assignment(cl_cb_tag* cb,
>>>>>>>>>>>>>> + SaAmfHAStateT
>>>>>>>>>>>>>> +ha_state);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      #endif /* ROLE_H */
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/osaf/services/saf/amf/amfd/main.cc
>>>>>>>>>>>>>> b/osaf/services/saf/amf/amfd/main.cc
>>>>>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/main.cc
>>>>>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/main.cc
>>>>>>>>>>>>>> @@ -56,6 +56,7 @@
>>>>>>>>>>>>>>      #include <sutcomptype.h>
>>>>>>>>>>>>>>      #include <sutype.h>
>>>>>>>>>>>>>>      #include <su.h>
>>>>>>>>>>>>>> +#include "osaf_utility.h"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      static const char* internal_version_id_  __attribute__
>>>>>>>>>>>>>> ((used)) =
>>>>>>>>>>>>>> "@(#) $Id: " INTERNAL_VERSION_ID " $";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -445,7 +446,8 @@ static void rda_cb(uint32_t notused, PCS
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          if (((avd_cb->avail_state_avd ==
>>>>>>>>>>>>>> SA_AMF_HA_STANDBY) ||
>>>>>>>>>>>>>> (avd_cb->avail_state_avd == SA_AMF_HA_QUIESCED))
>>>> &&
>>>>>>>>>>>>>> - (cb_info->info.io_role == PCS_RDA_ACTIVE)) {
>>>>>>>>>>>>>> +        (cb_info->info.io_role == PCS_RDA_ACTIVE ||
>>>>>>>>>>>>>> +        cb_info->info.io_role == PCS_RDA_STANDBY)) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>              uint32_t rc;
>>>>>>>>>>>>>>              AVD_EVT *evt;
>>>>>>>>>>>>>> @@ -474,7 +476,6 @@ static uint32_t initialize(void)
>>>>>>>>>>>>>>      {
>>>>>>>>>>>>>>          AVD_CL_CB *cb = avd_cb;
>>>>>>>>>>>>>>          int rc = NCSCC_RC_FAILURE;
>>>>>>>>>>>>>> -    SaVersionT ntfVersion = { 'A', 0x01, 0x01 };
>>>>>>>>>>>>>>          SaAmfHAStateT role;
>>>>>>>>>>>>>>          char *val;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -524,8 +525,13 @@ static uint32_t initialize(void)
>>>>>>>>>>>>>>          }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          cb->init_state = AVD_INIT_BGN;
>>>>>>>>>>>>>> +    cb->mbcsv_sel_obj = -1;
>>>>>>>>>>>>>> +    cb->imm_sel_obj = -1;
>>>>>>>>>>>>>> +    cb->clm_sel_obj = -1;
>>>>>>>>>>>>>> +    cb->fully_initialized = false;
>>>>>>>>>>>>>>          cb->swap_switch = false;
>>>>>>>>>>>>>>          cb->active_services_exist = true;
>>>>>>>>>>>>>> +    cb->mbcsv_sel_obj = -1;
>>>>>>>>>>>>> [Praveen] Duplicate initialization, already done above.
>>>>>>>>>>>> [Anders W] Will remove.
>>>>>>>>>>>>>> cb->stby_sync_state = AVD_STBY_IN_SYNC;
>>>>>>>>>>>>>>          cb->sync_required = true;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -544,67 +550,20 @@ static uint32_t initialize(void)
>>>>>>>>>>>>>>          /* get the node id of the node on which the AVD is
>>>>>>>>>>>>>> running.
>>>> */
>>>>>>>>>>>>>> cb->node_id_avd = m_NCS_GET_NODE_ID;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -    if (avd_mds_init(cb) != NCSCC_RC_SUCCESS) {
>>>>>>>>>>>>>> -        LOG_ER("avd_mds_init FAILED");
>>>>>>>>>>>>>> -        goto done;
>>>>>>>>>>>>>> -    }
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> -    if (NCSCC_RC_FAILURE == avsv_mbcsv_register(cb)) {
>>>>>>>>>>>>>> -        LOG_ER("avsv_mbcsv_register FAILED");
>>>>>>>>>>>>>> -        goto done;
>>>>>>>>>>>>>> -    }
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> -    if (avd_clm_init() != SA_AIS_OK) {
>>>>>>>>>>>>>> -        LOG_EM("avd_clm_init FAILED");
>>>>>>>>>>>>>> -        goto done;
>>>>>>>>>>>>>> -    }
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> -    if (avd_imm_init(cb) != SA_AIS_OK) {
>>>>>>>>>>>>>> -        LOG_ER("avd_imm_init FAILED");
>>>>>>>>>>>>>> -        goto done;
>>>>>>>>>>>>>> -    }
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> -    if ((rc = saNtfInitialize(&cb->ntfHandle, nullptr,
>>>>>>>>>>>>>> &ntfVersion)) != SA_AIS_OK) {
>>>>>>>>>>>>>> -        LOG_ER("saNtfInitialize Failed (%u)", rc);
>>>>>>>>>>>>>> -        rc = NCSCC_RC_FAILURE;
>>>>>>>>>>>>>> -        goto done;
>>>>>>>>>>>>>> -    }
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>          if ((rc = rda_get_role(&role)) != 
>>>>>>>>>>>>>> NCSCC_RC_SUCCESS) {
>>>>>>>>>>>>>>              LOG_ER("rda_get_role FAILED");
>>>>>>>>>>>>>>              goto done;
>>>>>>>>>>>>>>          }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -    cb->avail_state_avd = role;
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> -    if (NCSCC_RC_SUCCESS != avd_mds_set_vdest_role(cb,
>>>> role)) {
>>>>>>>>>>>>>> - LOG_ER("avd_mds_set_vdest_role FAILED");
>>>>>>>>>>>>>> -        goto done;
>>>>>>>>>>>>>> -    }
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> -    if (NCSCC_RC_SUCCESS != avsv_set_ckpt_role(cb, role)) {
>>>>>>>>>>>>>> -        LOG_ER("avsv_set_ckpt_role FAILED");
>>>>>>>>>>>>>> -        goto done;
>>>>>>>>>>>>>> -    }
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>          if ((rc = rda_register_callback(0, rda_cb)) !=
>>>>>>>>>>>>>> NCSCC_RC_SUCCESS) {
>>>>>>>>>>>>>> LOG_ER("rda_register_callback FAILED %u", rc);
>>>>>>>>>>>>>>              goto done;
>>>>>>>>>>>>>>          }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -    if (role == SA_AMF_HA_ACTIVE) {
>>>>>>>>>>>>>> -        rc = avd_active_role_initialization(cb, role);
>>>>>>>>>>>>>> -        if (rc != NCSCC_RC_SUCCESS) {
>>>>>>>>>>>>>> - LOG_ER("avd_active_role_initialization FAILED");
>>>>>>>>>>>>>> -            goto done;
>>>>>>>>>>>>>> -        }
>>>>>>>>>>>>>> -    }
>>>>>>>>>>>>>> -    else {
>>>>>>>>>>>>>> -        rc = avd_standby_role_initialization(cb);
>>>>>>>>>>>>>> -        if (rc != NCSCC_RC_SUCCESS) {
>>>>>>>>>>>>>> - LOG_ER("avd_standby_role_initialization FAILED");
>>>>>>>>>>>>>> -            goto done;
>>>>>>>>>>>>>> -        }
>>>>>>>>>>>>>> +    if ((rc = initialize_for_assignment(cb, role))
>>>>>>>>>>>>>> +        != NCSCC_RC_SUCCESS) {
>>>>>>>>>>>>>> + LOG_ER("initialize_for_assignment FAILED %u",
>>>>>>>>>>>>>> + (unsigned)
>>>>>>>>>>>>>> rc);
>>>>>>>>>>>>>> +        goto done;
>>>>>>>>>>>>>>          }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          rc = NCSCC_RC_SUCCESS; @@ -647,14 +606,13 @@ static
>>>>>>>>>>>>>> void main_loop(void)
>>>>>>>>>>>>>>          fds[FD_TERM].events = POLLIN;
>>>>>>>>>>>>>>          fds[FD_MBX].fd = mbx_fd.rmv_obj;
>>>>>>>>>>>>>>          fds[FD_MBX].events = POLLIN;
>>>>>>>>>>>>>> -    fds[FD_MBCSV].fd = cb->mbcsv_sel_obj;
>>>>>>>>>>>>>> -    fds[FD_MBCSV].events = POLLIN;
>>>>>>>>>>>>>> -    fds[FD_CLM].fd = cb->clm_sel_obj;
>>>>>>>>>>>>>> -    fds[FD_CLM].events = POLLIN;
>>>>>>>>>>>>>> -    fds[FD_IMM].fd = cb->imm_sel_obj; // IMM fd must be 
>>>>>>>>>>>>>> last
>>>> in
>>>>>>>>>>>>>> array
>>>>>>>>>>>>>> -    fds[FD_IMM].events = POLLIN;
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>          while (1) {
>>>>>>>>>>>>>> +        fds[FD_MBCSV].fd = cb->mbcsv_sel_obj;
>>>>>>>>>>>>>> +        fds[FD_MBCSV].events = POLLIN;
>>>>>>>>>>>>>> +        fds[FD_CLM].fd = cb->clm_sel_obj;
>>>>>>>>>>>>>> +        fds[FD_CLM].events = POLLIN;
>>>>>>>>>>>>>> +        fds[FD_IMM].fd = cb->imm_sel_obj; // IMM fd must be
>>>>>>>>>>>>>> + last in
>>>>>>>>>>>>>> array
>>>>>>>>>>>>>> +        fds[FD_IMM].events = POLLIN;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>              if (cb->immOiHandle != 0) {
>>>>>>>>>>>>>>                  fds[FD_IMM].fd = cb->imm_sel_obj; diff 
>>>>>>>>>>>>>> --git
>>>>>>>>>>>>>> a/osaf/services/saf/amf/amfd/ndfsm.cc
>>>>>>>>>>>>>> b/osaf/services/saf/amf/amfd/ndfsm.cc
>>>>>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/ndfsm.cc
>>>>>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/ndfsm.cc
>>>>>>>>>>>>>> @@ -190,8 +190,9 @@ void
>>>> avd_nd_ncs_su_assigned(AVD_CL_CB
>>>>>> *c
>>>>>>>>>>>>>> TRACE_ENTER();
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          for (const auto& ncs_su : avnd->list_of_ncs_su) {
>>>>>>>>>>>>>> -        if ((ncs_su->list_of_susi == AVD_SU_SI_REL_NULL) ||
>>>>>>>>>>>>>> - (ncs_su->list_of_susi->fsm !=
>>>> AVD_SU_SI_STATE_ASGND))
>>>>>> {
>>>>>>>>>>>>>> +        if ((ncs_su->sg_of_su->curr_assigned_sus() < 2) &&
>>>>>>>>>>>>>> +            ((ncs_su->list_of_susi == 
>>>>>>>>>>>>>> AVD_SU_SI_REL_NULL) ||
>>>>>>>>>>>>>> + (ncs_su->list_of_susi->fsm !=
>>>>>>>>>>>>>> + AVD_SU_SI_STATE_ASGND))) {
>>>>>>>>>>>>>>                  TRACE_LEAVE();
>>>>>>>>>>>>>>                  /* this is an unassigned SU so no need to
>>>>>>>>>>>>>> scan further return here. */
>>>>>>>>>>>>>>                  return;
>>>>>>>>>>>>>> @@ -328,6 +329,10 @@ void
>>>>>> avd_mds_avnd_down_evh(AVD_CL_CB
>>>>>>>> *cb
>>>>>>>>>>>>>>              if (avd_cb->avail_state_avd ==
>>>>>>>>>>>>>> SA_AMF_HA_ACTIVE) {
>>>>>>>>>>>>>> avd_node_failover(node);
>>>>>>>>>>>>>> +            // Update standby out of sync if standby sc
>>>>>>>>>>>>>> goes down
>>>>>>>>>>>>>> +            if (eravd_cb->node_id_avd_oth ==
>>>>>>>>>>>>>> node->node_info.nodeId) {
>>>>>>>>>>>>>> +                cb->stby_sync_state = AVD_STBY_OUT_OF_SYNC;
>>>>>>>>>>>>>> +            }
>>>>>>>>>>>>> [Praveen] Deep down in a function call in 
>>>>>>>>>>>>> avd_node_failover(),
>>>>>>>>>>>>> AMF marks standby status IN_SYNC in some special case. AMF
>>>>>> marks
>>>>>>>>>>>>> out of sync status whenever it gets sync request in chkop.cc
>>>>>>>>>>>>> so it is not needed here.
>>>>>>>>>>>> [AndersW] I don't remember why this was added. Will check and
>>>>>>>> remove
>>>>>>>>>>>> if it doesn't cause any problems. Isn't it a bit strange
>>>>>>>>>>>> though, so say that the standby is in sync when in fact it is
>>>>>>>>>>>> down?
>>>>>>>>>>>>>>              } else {
>>>>>>>>>>>>>>                  /* Remove dynamic info for node but keep in
>>>>>>>>>>>>>> nodeid tree.
>>>>>>>>>>>>>>                   * Possibly used at the end of controller
>>>>>>>>>>>>>> failover to diff --git a/osaf/services/saf/amf/amfd/node.cc
>>>>>>>>>>>>>> b/osaf/services/saf/amf/amfd/node.cc
>>>>>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/node.cc
>>>>>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/node.cc
>>>>>>>>>>>>>> @@ -120,7 +120,7 @@ void AVD_AVND::initialize() {
>>>>>>>>>>>>>>        pg_csi_list = {};
>>>>>>>>>>>>>>        pg_csi_list.order = NCS_DBLIST_ANY_ORDER;
>>>>>>>>>>>>>>        pg_csi_list.cmp_cookie = avsv_dblist_uns32_cmp;
>>>>>>>>>>>>>> -  type = AVSV_AVND_CARD_PAYLOAD;
>>>>>>>>>>>>>> +  type = AVSV_AVND_CARD_SYS_CON;
>>>>>>>>>>>>> [Praveen] Initially keeping as PAYLOAD, AMFD later changes
>>>>>>>>>>>>> node type of controller AMFNDs to SYS_CON after evaluating
>>>>>>>>>>>>> some
>>>>>> parameters.
>>>>>>>>>>>>> Above default is set to SYS_CON, there is no change in this
>>>>>>>>>>>>> patch which sets payload as payload. In this way node type of
>>>>>>>>>>>>> a payload will also become SYS_CON.
>>>>>>>>>>>> [AndersW] I think actually the "type" member variable ought to
>>>>>>>>>>>> be removed, since it is not needed. I can update the patch to
>>>>>>>>>>>> remove this variable completely.
>>>>>>>>>>>>>>        rcv_msg_id = {};
>>>>>>>>>>>>>>        snd_msg_id = {};
>>>>>>>>>>>>>>        cluster_list_node_next = {}; @@ -486,11 +486,6 @@
>>>>>>>>>>>>>> static SaAisErrorT node_ccb_completed_de
>>>>>>>>>>>>>>              return SA_AIS_ERR_BAD_OPERATION;
>>>>>>>>>>>>>>          }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -    if (node->type == AVSV_AVND_CARD_SYS_CON) {
>>>>>>>>>>>>>> - report_ccb_validation_error(opdata, "Cannot remove
>>>>>>>>>>>>>> controller node");
>>>>>>>>>>>>>> -        return SA_AIS_ERR_BAD_OPERATION;
>>>>>>>>>>>>>> -    }
>>>>>>>>>>>>>> -
>>>>>>>>>>>>> [Praveen] Based on some earliar discussion, I remember
>>>>>>>>>>>>> deletion of sys controller is restricted. So check should
>>>>>>>>>>>>> modified to reject the operation if there are only two system
>>>>>>>>>>>>> controlers in the
>>>>>> system.
>>>>>>>>>>>> [AndersW] Now you can create a cluster with many system
>>>>>>>>>>>> controller nodes (all nodes could be controllers), so we can't
>>>>>>>>>>>> have this restriction any longer. I don't see any reason to
>>>>>>>>>>>> treat the two-node case in a special way. You can create a
>>>>>>>>>>>> cluster consisting of only one single node, and then scale it
>>>>>>>>>>>> out to a two-node cluster. Why shouldn't it be possible to
>>>>>>>>>>>> scale it back to one
>>>>>> single node again?
>>>>>>>>>>>>>>          /* Check to see that the node is in admin locked
>>>>>>>>>>>>>> state before delete */
>>>>>>>>>>>>>>          if (node->saAmfNodeAdminState !=
>>>>>>>>>>>>>> SA_AMF_ADMIN_LOCKED_INSTANTIATION) {
>>>>>>>>>>>>>> report_ccb_validation_error(opdata, "Node '%s' is
>>>>>>>>>>>>>> not locked instantiation", opdata->objectName.value); diff
>>>>>>>>>>>>>> --git a/osaf/services/saf/amf/amfd/role.cc
>>>>>>>>>>>>>> b/osaf/services/saf/amf/amfd/role.cc
>>>>>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/role.cc
>>>>>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/role.cc
>>>>>>>>>>>>>> @@ -46,6 +46,7 @@
>>>>>>>>>>>>>>      #include <si_dep.h>
>>>>>>>>>>>>>>      #include "osaf_utility.h"
>>>>>>>>>>>>>>      #include "role.h"
>>>>>>>>>>>>>> +#include "nid_api.h"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      extern pthread_mutex_t imm_reinit_mutex;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -73,7 +74,15 @@ void avd_role_change_evh(AVD_CL_CB
>>>> *cb,
>>>>>>>>>>>>>> AVD_ROLE_CHG_CAUSE_T cause =
>>>>>>>>>>>>>> msg->msg_info.d2d_chg_role_req.cause;
>>>>>>>>>>>>>>          SaAmfHAStateT role =
>>>>>>>>>>>>>> msg->msg_info.d2d_chg_role_req.role;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -    TRACE_ENTER2("cause=%u, role=%u", cause, role);
>>>>>>>>>>>>>> +    TRACE_ENTER2("cause=%u, role=%u, current_role=%u",
>>>>>>>>>>>>>> + cause,
>>>>>>>> role,
>>>>>>>>>>>>>> + cb->avail_state_avd);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +    if ((status = initialize_for_assignment(cb, role))
>>>>>>>>>>>>>> +        != NCSCC_RC_SUCCESS) {
>>>>>>>>>>>>>> + LOG_ER("initialize_for_assignment FAILED %u",
>>>>>>>>>>>>>> +            (unsigned) status);
>>>>>>>>>>>>>> +        _exit(EXIT_FAILURE);
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          if (cb->avail_state_avd == role) {
>>>>>>>>>>>>>>              goto done;
>>>>>>>>>>>>>> @@ -128,6 +137,13 @@ void avd_role_change_evh(AVD_CL_CB
>>>>>> *cb,
>>>>>>>>>>>>>>          }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          if ((cause == AVD_FAIL_OVER) &&
>>>>>>>>>>>>>> +        (cb->avail_state_avd == SA_AMF_HA_QUIESCED) && 
>>>>>>>>>>>>>> (role
>>>>>>>>>>>>>> + ==
>>>>>>>>>>>>>> SA_AMF_HA_STANDBY)) {
>>>>>>>>>>>>>> +        /* Fail-over Quiesced to Active */
>>>>>>>>>>>>>> +        status = NCSCC_RC_SUCCESS;
>>>>>>>>>>>>>> +        goto done;
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>> [Praveen] A quiesced controller never goes from quiesced to
>>>>>>>>>>>>> stanby only in swichover and not in failover.
>>>>>>>>>>>>> So comment must be /*Fail-over Quiesced to standby (spare
>>>>>>>>>>>>> controller role change)*/
>>>>>>>>>>>> [AndersW] Will update the comment.
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +    if ((cause == AVD_FAIL_OVER) &&
>>>>>>>>>>>>>>              (cb->avail_state_avd == SA_AMF_HA_QUIESCED) &&
>>>>>>>>>>>>>> (role ==
>>>>>>>>>>>>>> SA_AMF_HA_ACTIVE)) {
>>>>>>>>>>>>>>              /* Fail-over Quiesced to Active */
>>>>>>>>>>>>>>              status = avd_role_failover_qsd_actv(cb, 
>>>>>>>>>>>>>> role); @@
>>>>>>>>>>>>>> -155,7 +171,73 @@ void avd_role_change_evh(AVD_CL_CB *cb,
>>>>>>>>>>>>>>          return;
>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -
>>>> /**********************************************************
>>>>>>>> ******************\
>>>>>>>>>>>>>> +uint32_t initialize_for_assignment(cl_cb_tag* cb,
>>>>>>>>>>>>>> +SaAmfHAStateT
>>>>>>>>>>>>>> ha_state)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +    TRACE_ENTER2("ha_state = %d",
>>>>>>>>>>>>>> static_cast<int>(ha_state));
>>>>>>>>>>>>>> +    SaVersionT ntfVersion = {'A', 0x01, 0x01};
>>>>>>>>>>>>>> +    uint32_t rc = NCSCC_RC_SUCCESS;
>>>>>>>>>>>>>> +    SaAisErrorT error;
>>>>>>>>>>>>>> +    if (cb->fully_initialized) goto done;
>>>>>>>>>>>>>> +    cb->avail_state_avd = ha_state;
>>>>>>>>>>>>>> +    if (ha_state == SA_AMF_HA_QUIESCED) {
>>>>>>>>>>>>>> +        if ((rc = nid_notify(const_cast<char*>("AMFD"),
>>>>>>>>>>>>>> +                     NCSCC_RC_SUCCESS, nullptr)) !=
>>>>>>>>>>>>>> NCSCC_RC_SUCCESS) {
>>>>>>>>>>>>>> +            LOG_ER("nid_notify failed");
>>>>>>>>>>>>>> +        }
>>>>>>>>>>>>>> +        goto done;
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> +    if ((rc = avd_mds_init(cb)) != NCSCC_RC_SUCCESS) {
>>>>>>>>>>>>>> +        LOG_ER("avd_mds_init FAILED");
>>>>>>>>>>>>>> +        goto done;
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> +    if ((rc = avsv_mbcsv_register(cb)) != 
>>>>>>>>>>>>>> NCSCC_RC_SUCCESS) {
>>>>>>>>>>>>>> +        LOG_ER("avsv_mbcsv_register FAILED");
>>>>>>>>>>>>>> +        goto done;
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> +    if (avd_clm_init() != SA_AIS_OK) {
>>>>>>>>>>>>>> +        LOG_EM("avd_clm_init FAILED");
>>>>>>>>>>>>>> +        rc = NCSCC_RC_FAILURE;
>>>>>>>>>>>>>> +        goto done;
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> +    if (avd_imm_init(cb) != SA_AIS_OK) {
>>>>>>>>>>>>>> +        LOG_ER("avd_imm_init FAILED");
>>>>>>>>>>>>>> +        rc = NCSCC_RC_FAILURE;
>>>>>>>>>>>>>> +        goto done;
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> +    if ((error = saNtfInitialize(&cb->ntfHandle, nullptr,
>>>>>>>>>>>>>> &ntfVersion)) !=
>>>>>>>>>>>>>> +        SA_AIS_OK) {
>>>>>>>>>>>>>> +        LOG_ER("saNtfInitialize Failed (%u)", error);
>>>>>>>>>>>>>> +        rc = NCSCC_RC_FAILURE;
>>>>>>>>>>>>>> +        goto done;
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> +    if ((rc = avd_mds_set_vdest_role(cb, ha_state)) !=
>>>>>>>>>>>>>> NCSCC_RC_SUCCESS) {
>>>>>>>>>>>>>> +        LOG_ER("avd_mds_set_vdest_role FAILED");
>>>>>>>>>>>>>> +        goto done;
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> +    if ((rc = avsv_set_ckpt_role(cb, ha_state)) !=
>>>>>>>>>>>>>> NCSCC_RC_SUCCESS) {
>>>>>>>>>>>>>> +        LOG_ER("avsv_set_ckpt_role FAILED");
>>>>>>>>>>>>>> +        goto done;
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> +    if (ha_state == SA_AMF_HA_ACTIVE) {
>>>>>>>>>>>>>> +        rc = avd_active_role_initialization(cb, ha_state);
>>>>>>>>>>>>>> +        if (rc != NCSCC_RC_SUCCESS) {
>>>>>>>>>>>>>> + LOG_ER("avd_active_role_initialization FAILED");
>>>>>>>>>>>>>> +            goto done;
>>>>>>>>>>>>>> +        }
>>>>>>>>>>>>>> +    } else if (ha_state == SA_AMF_HA_STANDBY) {
>>>>>>>>>>>>>> +        rc = avd_standby_role_initialization(cb);
>>>>>>>>>>>>>> +        if (rc != NCSCC_RC_SUCCESS) {
>>>>>>>>>>>>>> + LOG_ER("avd_standby_role_initialization FAILED");
>>>>>>>>>>>>>> +            goto done;
>>>>>>>>>>>>>> +        }
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> +    cb->fully_initialized = true;
>>>>>>>>>>>>>> +done:
>>>>>>>>>>>>>> +    TRACE_LEAVE2("rc = %u", rc);
>>>>>>>>>>>>>> +     return rc;
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>
>>>> +/*********************************************************
>>>>>>>> *******************
>>>>>>>>>>>>>> \
>>>>>>>>>>>>>>       * Function: avd_init_role_set
>>>>>>>>>>>>>>       *
>>>>>>>>>>>>>>       * Purpose:  AVSV function to handle AVD's initial
>>>>>>>>>>>>>> role setting.
>>>>>>>>>>>>>> diff --git a/osaf/services/saf/amf/amfd/sgproc.cc
>>>>>>>>>>>>>> b/osaf/services/saf/amf/amfd/sgproc.cc
>>>>>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/sgproc.cc
>>>>>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/sgproc.cc
>>>>>>>>>>>>>> @@ -1390,7 +1390,16 @@ void
>>>> avd_su_si_assign_evh(AVD_CL_CB
>>>>>>>> *cb,
>>>>>>>>>>>>>> /* Since a NCS SU has been assigned
>>>>>>>>>>>>>> trigger the node FSM. */
>>>>>>>>>>>>>>                          /* For (ncs_spec == SA_TRUE), su 
>>>>>>>>>>>>>> will
>>>>>>>>>>>>>> not be external, so su
>>>>>>>>>>>>>>                                 will have node attached. */
>>>>>>>>>>>>>> - avd_nd_ncs_su_assigned(cb,
>>>>>>>>>>>>>> susi->su->su_on_node);
>>>>>>>>>>>>>> +                    for (AmfDb<uint32_t,
>>>>>>>>>>>>>> + AVD_AVND>::const_iterator
>>>>>>>>>>>>>> it = node_id_db->begin();
>>>>>>>>>>>>>> +                        it != node_id_db->end(); it++) {
>>>>>>>>>>>>>> +                        AVD_AVND *node =
>>>>>>>>>>>>>> const_cast<AVD_AVND*>((*it).second);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +                        if (node->node_state ==
>>>>>>>>>>>>>> AVD_AVND_STATE_NCS_INIT && node->adest != 0) {
>>>>>>>>>>>>>> + avd_nd_ncs_su_assigned(cb, node);
>>>>>>>>>>>>>> +                        } else {
>>>>>>>>>>>>>> + TRACE("Node_state: %u adest: %"
>>>>>>>>>>>>>> + PRIx64
>>>>>>>>>>>>>> " node not ready for assignments", node->node_state, node-
>>>>>>>>> adest);
>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>> +                    }
>>>>>>>>>>>>> [Praveen] Could not understand this change? Since spare
>>>>>>>>>>>>> controllers are also payloads, in which case adest can be 0.
>>>>>>>>>>>>> Is this for headless case, there comp and su assignment
>>>>>>>>>>>>> information comes before node
>>>>>>>> up
>>>>>>>>>>>>> message.
>>>>>>>>>>>> [AndersW] This loop is needed to ensure set_leds is performed.
>>>>>>>>>>>> It is sufficient that we have an active and a standby
>>>>>>>>>>>> assignment for the OpenSAF 2N SU, so once we have that we need
>>>>>>>>>>>> to loop over all nodes
>>>>>>>> and
>>>>>>>>>>>> perform set_leds if possible.
>>>>>>>>>>>>
>>>>>>>>>>>> I think the check that adest is non-zero was added by Hans, to
>>>>>>>>>>>> fix some problem that might be related to headless if I
>>>>>>>>>>>> remember correctly. @Hans, do you remember why this check was
>>>> needed?
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>                  }
>>>>>>>>>>>>>>              } else {
>>>>>>>>>>>>>> diff --git a/osaf/services/saf/amf/amfnd/clm.cc
>>>>>>>>>>>>>> b/osaf/services/saf/amf/amfnd/clm.cc
>>>>>>>>>>>>>> --- a/osaf/services/saf/amf/amfnd/clm.cc
>>>>>>>>>>>>>> +++ b/osaf/services/saf/amf/amfnd/clm.cc
>>>>>>>>>>>>>> @@ -37,6 +37,7 @@
>>>>>>>>>>>>>>      #include "avnd.h"
>>>>>>>>>>>>>>      #include "mds_pvt.h"
>>>>>>>>>>>>>>      #include "nid_api.h"
>>>>>>>>>>>>>> +#include "osaf_time.h"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      static void clm_node_left(SaClmNodeIdT node_id)
>>>>>>>>>>>>>>      {
>>>>>>>>>>>>>> @@ -166,7 +167,6 @@ uint32_t
>>>>>>>> avnd_evt_avd_node_up_evh(AVND_C
>>>>>>>>>>>>>>          info = &evt->info.avd->msg_info.d2n_node_up;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          /*** update this node with the supplied parameters
>>>>>>>>>>>>>> ***/
>>>>>>>>>>>>>> -    cb->type = info->node_type;
>>>>>>>>>>>>>>          cb->su_failover_max = info->su_failover_max;
>>>>>>>>>>>>>>          cb->su_failover_prob = info->su_failover_prob;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -249,8 +249,6 @@ done:
>>>>>>>>>>>>>>          return;
>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -static SaVersionT Version = { 'B', 4, 1 };
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>      static const SaClmCallbacksT_4 callbacks = {
>>>>>>>>>>>>>>              0,
>>>>>>>>>>>>>> /*.saClmClusterTrackCallback =*/ clm_track_cb @@
>>>>>>>>>>>>>> -263,11 +261,24 @@ SaAisErrorT avnd_clm_init(void)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          TRACE_ENTER();
>>>>>>>>>>>>>>          avnd_cb->first_time_up = true;
>>>>>>>>>>>>> [Praveen] Did not get the reason for its removal?
>>>>>>>>>>>>> Not being updated any where else in the patch.
>>>>>>>>>>>> [AndersW] Nothing is removed here. The CLM initialization loop
>>>>>>>>>>>> has been enhanced to handle more error codes (TIMEOUT &
>>>>>>>> UNAVAILABLE).
>>>>>>>>>>>>>> -    error = saClmInitialize_4(&avnd_cb->clmHandle,
>>>>>>>>>>>>>> &callbacks,
>>>>>>>>>>>>>> &Version);
>>>>>>>>>>>>>> -        if (SA_AIS_OK != error) {
>>>>>>>>>>>>>> -                LOG_ER("Failed to Initialize with CLM:
>>>>>>>>>>>>>> %u", error);
>>>>>>>>>>>>>> -                goto done;
>>>>>>>>>>>>>> -        }
>>>>>>>>>>>>>> +    for (;;) {
>>>>>>>>>>>>>> +        SaVersionT Version = { 'B', 4, 1 };
>>>>>>>>>>>>>> +        error = saClmInitialize_4(&avnd_cb->clmHandle,
>>>> &callbacks,
>>>>>>>>>>>>>> + &Version);
>>>>>>>>>>>>>> +        if (error == SA_AIS_ERR_TRY_AGAIN ||
>>>>>>>>>>>>>> +            error == SA_AIS_ERR_TIMEOUT ||
>>>>>>>>>>>>>> +                    error == SA_AIS_ERR_UNAVAILABLE) {
>>>>>>>>>>>>>> +            if (error != SA_AIS_ERR_TRY_AGAIN) {
>>>>>>>>>>>>>> + LOG_WA("saClmInitialize_4 returned %u",
>>>>>>>>>>>>>> +                       (unsigned) error);
>>>>>>>>>>>>>> +            }
>>>>>>>>>>>>>> + osaf_nanosleep(&kHundredMilliseconds);
>>>>>>>>>>>>>> +            continue;
>>>>>>>>>>>>>> +        }
>>>>>>>>>>>>>> +        if (error == SA_AIS_OK) break;
>>>>>>>>>>>>>> +        LOG_ER("Failed to Initialize with CLM: %u", error);
>>>>>>>>>>>>>> +        goto done;
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>          error = saClmSelectionObjectGet(avnd_cb->clmHandle,
>>>>>>>>>>>>>> &avnd_cb->clm_sel_obj);
>>>>>>>>>>>>>>              if (SA_AIS_OK != error) {
>>>>>>>>>>>>>>                      LOG_ER("Failed to get CLM
>>>>>>>>>>>>>> selectionObject:
>>>>>>>>>>>>>> %u", error);
>>>>>>>>>>>>>>
>>>>>>>> ------------------------------------------------------------------- 
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> ---------
>>>>>>>> Transform Data into Opportunity.
>>>>>>>> Accelerate data analysis in your applications with Intel Data
>>>>>>>> Analytics Acceleration Library.
>>>>>>>> Click to learn more.
>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
>>>>>>>> _______________________________________________
>>>>>>>> Opensaf-devel mailing list
>>>>>>>> [email protected]
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>>


------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to