I think controlling this behavior at run time should be okay. (I didn't intend to say that introducing a build option is the only way, And the aim was not for saving memory either :-))
So, to start with let's then introduce a global runtime environment variable(?) or IMM attribute - that would control this standalone configuration changes to succeed. We could think of adding restrictions to cluster-size later on perhaps. Cheers, Mathi. >-----Original Message----- >From: Anders Widell [mailto:[email protected]] >Sent: Friday, April 01, 2016 7:28 PM >To: Mathivanan Naickan Palanivelu; Praveen Malviya; >[email protected]; [email protected]; Nagendra Kumar >Cc: [email protected] >Subject: Re: [devel] [PATCH 1 of 1] amf: Support AMF configurations >containing more than two OpenSAF 2N SUs [#79] > >Now you are talking about the possibility to save some memory on a 1-node >system by e.g. introducing #ifdefs in the code to disable code paths that deal >with redundancy. I can agree that in some very resource-constrained >embedded system this could make sense, but on the other hand, this is not >good from a testing perspective. You would have two separate builds that >needs to be tested. From a developer's perspective, it is not good either. >#ifdefs will mess up the code, and the developers also need to build and test >the code twice. And what happens when you get the second feature, which is >also enabled and disabled using #ifdefs. You will get four possible builds. >With >the third feature, eight possible builds. > >On the other hand, a redundant system must work also on a single node. >Otherwise it wouldn't be redundant - i.e. if you /require/ two nodes for the >system to operate then you have a 2-node system which is less available than >just one single node. If any of the two nodes fails then the whole system >fails. >So we must in any case test that we can run on a single node without >problems. Unless we have a very strong use-case for saving those extra bytes >of code I would not like to introduce a build-time option for disabling >redundancy. > >We could introduce a run-time option for disallowing single-node >configurations. In fact, I would argue in such a case that it would be an >option >to set the minimum allowed number of nodes in the system - i.e. the >minimum size you can scale down to. It doesn't necessarily have to be two; >maybe someone wishes to disallow any configuration with less than five >nodes. > >regards, >Anders Widell > >On 04/01/2016 03:05 PM, Mathivanan Naickan Palanivelu wrote: >> We could tie the symmetry to the 'deploy-standlone' option chosen. >> i.e. if 'deploy-standone' feature is enabled don't expect(wherever) >> standbys including to not expect things like setting up shared file system >> and >all code flows that expect a second/alternate to take over, etc! >> I think this has to be thought through and not just considered as a >> matter of 'allowing a configuration change' for a tool to work. >> It is another thing that we should discussed this when the asymmetry in >configuration got created! >> If it's about reflecting the 'state' then there is sufficient states >> to convey the status of that application. >> >> >>> -----Original Message----- >>> From: Anders Widell [mailto:[email protected]] >>> Sent: Friday, April 01, 2016 5:49 PM >>> To: Mathivanan Naickan Palanivelu; Praveen Malviya; >>> [email protected]; [email protected]; Nagendra >Kumar >>> Cc: [email protected] >>> Subject: Re: [devel] [PATCH 1 of 1] amf: Support AMF configurations >>> containing more than two OpenSAF 2N SUs [#79] >>> >>> Today it is perfectly possible to configure a 1-node system without >>> using any build-time option. The problem today is that if you expand >>> a 1-node system to a 2-node system, then you cannot later shrink it >>> back into a 1-node system again. This asymmetry that you cannot go >>> back to where you came from makes little sense to me. Suppose that >>> the second node has been physically removed, so that you really have >>> only one single node. What harm would it do to update the OpenSAF >configuration to reflect this fact of reality? >>> >>> regards, >>> Anders Widell >>> >>> On 04/01/2016 12:54 PM, Mathivanan Naickan Palanivelu wrote: >>>> Ofcourse there could be applications(in different problem domains) >>>> that >>> could be configured to run in standalone mode or in HA mode. >>>> Such applications could still be configured(in AMF) to be run on >>>> just 1 >>> OpenSAF node, nothing in OpenSAF stops standalone applications today! >>>> But, I don't think it is necessary to trickle down an application's >>>> standalone >>> configuration into OpenSAF's configuration too! >>>> Somehow this topic has come up again, and I don't understand the >>>> requirement behind why a user's tool expects HA middleware also to >>>> be >>> configured in standalone mode! >>>> If we still want to satisfy such(adhoc?) tools, then we could do >>>> that by bringing in a 'standalone' deployment option in OpenSAF, i.e. >>>> ./configure --deploy-standone. When this option is enable then >>>> immxml- >>> tools, AMF and any other OpenSAF service would not expect a mandatory >>> STANDBY to exist, thereby also allow deletion or any other operation >>> of its interest! >>>> Cheers, >>>> Mathi. >>>> >>>>> -----Original Message----- >>>>> From: praveen malviya >>>>> Sent: Friday, April 01, 2016 3:52 PM >>>>> To: Anders Widell; [email protected]; >>>>> [email protected]; Nagendra Kumar >>>>> Cc: [email protected] >>>>> Subject: Re: [devel] [PATCH 1 of 1] amf: Support AMF configurations >>>>> containing more than two OpenSAF 2N SUs [#79] >>>>> >>>>> Please see my comments inline below: >>>>> >>>>> On 31-Mar-16 6:50 PM, Anders Widell wrote: >>>>>> Yes I added the check in the admin op as you suggested. But I >>>>>> don't fully agree that the same check should be done when removing >>>>>> system controller nodes. With the introduction of this feature, we >>>>>> are starting to move away from the concept of different node types >>>>>> (controller / payload). Indeed, I removed the "type" member >>>>>> variable from the node class in the latest patch. >>>>> [Praveen] >>>>> Yes, all nodes can be controller nodes or rather all nodes are the >>>>> same except their roles. But that does not change the architecture >>>>> of OpenSAF w.r.t the redundancy model. i.e. The configuration is >>>>> still 2N >>> redundancy model. >>>>> i.e. The smallest opensaf sized cluster (without payloads) would >>>>> still be configured either of the two options as below: >>>>> (a) immxml-clustersize -s 2 -p0 >>>>> During scale out, for spare addition atleast one standy has to >>>>> exist before proceeding to configure the rest of the cluster nodes >>>>> as standbys >>>>> (OR) >>>>> (b) immxml-clustersize -s 3 -p0 >>>>> Obviously the 3rd node and all other nodes added after the two >>>>> nodes would act as spares. >>>>> >>>>>> To preserve backwards compatibility, I can agree to have this >>>>>> check on systems that are configured with both system controller >>>>>> nodes and payload nodes (as in your example with >>>>>> immxml-clustersize). This would mean that AMF will reject removal >>>>>> of any of the last two system controller nodes - IF the cluster >>>>>> has payload nodes. What do you think >>>>> [Praveen] >>>>> That is the normal case anyhow today. Only difference we have to >>>>> make now is to allow deletion of spare nodes whether there are >payloads or not. >>>>> >>>>> Thanks, >>>>> Praveen. >>>>> >>>>>> about this approach? I am not sure how easy this would be to >>>>>> implement, but I can give it a try. >>>>>> >>>>>> regards, >>>>>> Anders Widell >>>>>> >>>>>> On 03/31/2016 03:05 PM, praveen malviya wrote: >>>>>>> Hi, >>>>>>> >>>>>>> In the diff patch, I have seen that admin operation on MW 2N SU >>>>>>> is not allowed when more than 2 SUs are configured which >>>>>>> translates to the fact that system is running with spare controllers. >>>>>>> >>>>>>> A similar type of check is needed for deletion of controller >>>>>>> configuration from AMF. The check would not allow deletion of >>>>>>> controller if only two of them remained. It is inline with the >>>>>>> current OpenSAF implementation with the fact that we do not allow >>>>>>> any payload to get configured when user tries to generate imm.xml >>>>>>> with single controller and a given no. of payload because such a >>>>>>> configuration does not provide redundancy to payloads. >>>>>>> >>>>>>> Note: ./immxml-clustersize -s 1 -p 1 >>>>>>> error: Two SC's is required for clusters with payloads. Exiting! >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Praveen >>>>>>> >>>>>>> On 31-Mar-16 2:05 AM, Anders Widell wrote: >>>>>>>> Here is a patch that addresses the review comments from Hans and >>>>>>>> Praveen. It should be applied on top of the AMF patch that was >>>>>>>> sent out for review. >>>>>>>> >>>>>>>> thanks, >>>>>>>> Anders Widell >>>>>>>> >>>>>>>> On 03/30/2016 04:35 PM, Anders Widell wrote: >>>>>>>>> Hi! >>>>>>>>> >>>>>>>>> See my replies inline, marked [AndersW]. >>>>>>>>> >>>>>>>>> regards, >>>>>>>>> Anders Widell >>>>>>>>> >>>>>>>>> On 03/17/2016 11:32 AM, praveen malviya wrote: >>>>>>>>>> Hi Anders, >>>>>>>>>> >>>>>>>>>> Please find some comments and queries inline with [Praveen] >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Praveen >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 29-Feb-16 8:44 PM, Anders Widell wrote: >>>>>>>>>>> osaf/services/saf/amf/amfd/clm.cc | 21 +++++- >>>>>>>>>>> osaf/services/saf/amf/amfd/include/amfd.h | 2 + >>>>>>>>>>> osaf/services/saf/amf/amfd/include/cb.h | 1 + >>>>>>>>>>> osaf/services/saf/amf/amfd/include/role.h | 2 + >>>>>>>>>>> osaf/services/saf/amf/amfd/main.cc | 78 >>>>>>>>>>> ++++++--------------------- >>>>>>>>>>> osaf/services/saf/amf/amfd/ndfsm.cc | 9 ++- >>>>>>>>>>> osaf/services/saf/amf/amfd/node.cc | 7 +-- >>>>>>>>>>> osaf/services/saf/amf/amfd/role.cc | 86 >>>>>>>>>>> ++++++++++++++++++++++++++++++- >>>>>>>>>>> osaf/services/saf/amf/amfd/sgproc.cc | 11 +++- >>>>>>>>>>> osaf/services/saf/amf/amfnd/clm.cc | 27 ++++++-- >>>>>>>>>>> 10 files changed, 160 insertions(+), 84 deletions(-) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Add support for configuring the system with more than two >>>>>>>>>>> OpenSAF >>>>> 2N >>>>>>>>>>> SUs. In >>>>>>>>>>> particular, this means that all OpenSAF directors must >>>>>>>>>>> support starting up and running without (initially) getting >>>>>>>>>>> any assignment from AMF. >>>>>>>>>>> Locking of >>>>>>>>>>> an OpenSAF 2N SU is currently not supported on a system >>>>>>>>>>> configured with more than two OpenSAF 2N SUs. >>>>>>>>>> [Praveen] This patch does not contain any change for any >>>>>>>>>> restricton on locking of OpenSAF 2N SU as mentioned above. >>>>>>>>> [AndersW] No. This restriction will be documented. Do you think >>>>>>>>> we need to add checks for this case in the admin op as well? >>>>>>>>>>> diff --git a/osaf/services/saf/amf/amfd/clm.cc >>>>>>>>>>> b/osaf/services/saf/amf/amfd/clm.cc >>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/clm.cc >>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/clm.cc >>>>>>>>>>> @@ -21,8 +21,7 @@ >>>>>>>>>>> #include <amfd.h> >>>>>>>>>>> #include <clm.h> >>>>>>>>>>> #include <node.h> >>>>>>>>>>> - >>>>>>>>>>> -static SaVersionT clmVersion = { 'B', 4, 1 }; >>>>>>>>>>> +#include "osaf_time.h" >>>>>>>>>>> >>>>>>>>>>> static void clm_node_join_complete(AVD_AVND *node) >>>>>>>>>>> { >>>>>>>>>>> @@ -392,9 +391,21 @@ SaAisErrorT avd_clm_init(void) >>>>>>>>>>> SaAisErrorT error = SA_AIS_OK; >>>>>>>>>>> >>>>>>>>>>> TRACE_ENTER(); >>>>>>>>>>> - error = saClmInitialize_4(&avd_cb->clmHandle, >&clm_callbacks, >>>>>>>>>>> &clmVersion); >>>>>>>>>>> - if (SA_AIS_OK != error) { >>>>>>>>>>> - LOG_ER("Failed to initialize with CLM %u", error); >>>>>>>>>>> + for (;;) { >>>>>>>>>>> + SaVersionT Version = { 'B', 4, 1 }; >>>>>>>>>>> + error = saClmInitialize_4(&avd_cb->clmHandle, >>>>>>>>>>> &clm_callbacks, &Version); >>>>>>>>>>> + if (error == SA_AIS_ERR_TRY_AGAIN || >>>>>>>>>>> + error == SA_AIS_ERR_TIMEOUT || >>>>>>>>>>> + error == SA_AIS_ERR_UNAVAILABLE) { >>>>>>>>>>> + if (error != SA_AIS_ERR_TRY_AGAIN) { >>>>>>>>>>> + LOG_WA("saClmInitialize_4 returned %u", >>>>>>>>>>> + (unsigned) error); >>>>>>>>>>> + } >>>>>>>>>>> + osaf_nanosleep(&kHundredMilliseconds); >>>>>>>>>>> + continue; >>>>>>>>>>> + } >>>>>>>>>>> + if (error == SA_AIS_OK) break; >>>>>>>>>>> + LOG_ER("Failed to Initialize with CLM: %u", error); >>>>>>>>>>> goto done; >>>>>>>>>>> } >>>>>>>>>>> error = saClmSelectionObjectGet(avd_cb->clmHandle, >>>>>>>>>>> &avd_cb->clm_sel_obj); >>>>>>>>>>> diff --git a/osaf/services/saf/amf/amfd/include/amfd.h >>>>>>>>>>> b/osaf/services/saf/amf/amfd/include/amfd.h >>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/include/amfd.h >>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/include/amfd.h >>>>>>>>>>> @@ -33,6 +33,7 @@ >>>>>>>>>>> #ifndef AVD_H >>>>>>>>>>> #define AVD_H >>>>>>>>>>> >>>>>>>>>>> +#include <stdint.h> >>>>>>>>>>> #include "logtrace.h" >>>>>>>>>>> >>>>>>>>>>> #include "amf.h" >>>>>>>>>>> @@ -65,5 +66,6 @@ >>>>>>>>>>> #include "ckpt_msg.h" >>>>>>>>>>> #include "ckpt_edu.h" >>>>>>>>>>> #include "ckpt_updt.h" >>>>>>>>>>> +#include "saAmf.h" >>>>>>>>>>> >>>>>>>>>>> #endif >>>>>>>>>>> diff --git a/osaf/services/saf/amf/amfd/include/cb.h >>>>>>>>>>> b/osaf/services/saf/amf/amfd/include/cb.h >>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/include/cb.h >>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/include/cb.h >>>>>>>>>>> @@ -207,6 +207,7 @@ typedef struct cl_cb_tag { >>>>>>>>>>> SaClmHandleT clmHandle; >>>>>>>>>>> SaSelectionObjectT clm_sel_obj; >>>>>>>>>>> >>>>>>>>>>> + bool fully_initialized; >>>>>>>>>>> bool swap_switch; /* true - In middle of role switch. >>>>>>>>>>> */ >>>>>>>>>>> >>>>>>>>>>> /** true when active services (IMM, LOG, NTF, etc.) >>>>>>>>>>> exist diff --git a/osaf/services/saf/amf/amfd/include/role.h >>>>>>>>>>> b/osaf/services/saf/amf/amfd/include/role.h >>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/include/role.h >>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/include/role.h >>>>>>>>>>> @@ -34,6 +34,8 @@ extern uint32_t >amfd_switch_qsd_stdby(AV >>>>>>>>>>> extern uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb); >>>>>>>>>>> extern uint32_t amfd_switch_qsd_actv(AVD_CL_CB *cb); >>>>>>>>>>> extern uint32_t amfd_switch_actv_qsd(AVD_CL_CB *cb); >>>>>>>>>>> +extern uint32_t initialize_for_assignment(cl_cb_tag* cb, >>>>>>>>>>> + SaAmfHAStateT >>>>>>>>>>> +ha_state); >>>>>>>>>>> >>>>>>>>>>> #endif /* ROLE_H */ >>>>>>>>>>> >>>>>>>>>>> diff --git a/osaf/services/saf/amf/amfd/main.cc >>>>>>>>>>> b/osaf/services/saf/amf/amfd/main.cc >>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/main.cc >>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/main.cc >>>>>>>>>>> @@ -56,6 +56,7 @@ >>>>>>>>>>> #include <sutcomptype.h> >>>>>>>>>>> #include <sutype.h> >>>>>>>>>>> #include <su.h> >>>>>>>>>>> +#include "osaf_utility.h" >>>>>>>>>>> >>>>>>>>>>> static const char* internal_version_id_ __attribute__ >>>>>>>>>>> ((used)) = >>>>>>>>>>> "@(#) $Id: " INTERNAL_VERSION_ID " $"; >>>>>>>>>>> >>>>>>>>>>> @@ -445,7 +446,8 @@ static void rda_cb(uint32_t notused, PCS >>>>>>>>>>> >>>>>>>>>>> if (((avd_cb->avail_state_avd == SA_AMF_HA_STANDBY) || >>>>>>>>>>> (avd_cb->avail_state_avd == SA_AMF_HA_QUIESCED)) >&& >>>>>>>>>>> - (cb_info->info.io_role == PCS_RDA_ACTIVE)) { >>>>>>>>>>> + (cb_info->info.io_role == PCS_RDA_ACTIVE || >>>>>>>>>>> + cb_info->info.io_role == PCS_RDA_STANDBY)) { >>>>>>>>>>> >>>>>>>>>>> uint32_t rc; >>>>>>>>>>> AVD_EVT *evt; >>>>>>>>>>> @@ -474,7 +476,6 @@ static uint32_t initialize(void) >>>>>>>>>>> { >>>>>>>>>>> AVD_CL_CB *cb = avd_cb; >>>>>>>>>>> int rc = NCSCC_RC_FAILURE; >>>>>>>>>>> - SaVersionT ntfVersion = { 'A', 0x01, 0x01 }; >>>>>>>>>>> SaAmfHAStateT role; >>>>>>>>>>> char *val; >>>>>>>>>>> >>>>>>>>>>> @@ -524,8 +525,13 @@ static uint32_t initialize(void) >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> cb->init_state = AVD_INIT_BGN; >>>>>>>>>>> + cb->mbcsv_sel_obj = -1; >>>>>>>>>>> + cb->imm_sel_obj = -1; >>>>>>>>>>> + cb->clm_sel_obj = -1; >>>>>>>>>>> + cb->fully_initialized = false; >>>>>>>>>>> cb->swap_switch = false; >>>>>>>>>>> cb->active_services_exist = true; >>>>>>>>>>> + cb->mbcsv_sel_obj = -1; >>>>>>>>>> [Praveen] Duplicate initialization, already done above. >>>>>>>>> [Anders W] Will remove. >>>>>>>>>>> cb->stby_sync_state = AVD_STBY_IN_SYNC; >>>>>>>>>>> cb->sync_required = true; >>>>>>>>>>> >>>>>>>>>>> @@ -544,67 +550,20 @@ static uint32_t initialize(void) >>>>>>>>>>> /* get the node id of the node on which the AVD is running. >*/ >>>>>>>>>>> cb->node_id_avd = m_NCS_GET_NODE_ID; >>>>>>>>>>> >>>>>>>>>>> - if (avd_mds_init(cb) != NCSCC_RC_SUCCESS) { >>>>>>>>>>> - LOG_ER("avd_mds_init FAILED"); >>>>>>>>>>> - goto done; >>>>>>>>>>> - } >>>>>>>>>>> - >>>>>>>>>>> - if (NCSCC_RC_FAILURE == avsv_mbcsv_register(cb)) { >>>>>>>>>>> - LOG_ER("avsv_mbcsv_register FAILED"); >>>>>>>>>>> - goto done; >>>>>>>>>>> - } >>>>>>>>>>> - >>>>>>>>>>> - if (avd_clm_init() != SA_AIS_OK) { >>>>>>>>>>> - LOG_EM("avd_clm_init FAILED"); >>>>>>>>>>> - goto done; >>>>>>>>>>> - } >>>>>>>>>>> - >>>>>>>>>>> - if (avd_imm_init(cb) != SA_AIS_OK) { >>>>>>>>>>> - LOG_ER("avd_imm_init FAILED"); >>>>>>>>>>> - goto done; >>>>>>>>>>> - } >>>>>>>>>>> - >>>>>>>>>>> - if ((rc = saNtfInitialize(&cb->ntfHandle, nullptr, >>>>>>>>>>> &ntfVersion)) != SA_AIS_OK) { >>>>>>>>>>> - LOG_ER("saNtfInitialize Failed (%u)", rc); >>>>>>>>>>> - rc = NCSCC_RC_FAILURE; >>>>>>>>>>> - goto done; >>>>>>>>>>> - } >>>>>>>>>>> - >>>>>>>>>>> if ((rc = rda_get_role(&role)) != NCSCC_RC_SUCCESS) { >>>>>>>>>>> LOG_ER("rda_get_role FAILED"); >>>>>>>>>>> goto done; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> - cb->avail_state_avd = role; >>>>>>>>>>> - >>>>>>>>>>> - if (NCSCC_RC_SUCCESS != avd_mds_set_vdest_role(cb, >role)) { >>>>>>>>>>> - LOG_ER("avd_mds_set_vdest_role FAILED"); >>>>>>>>>>> - goto done; >>>>>>>>>>> - } >>>>>>>>>>> - >>>>>>>>>>> - if (NCSCC_RC_SUCCESS != avsv_set_ckpt_role(cb, role)) { >>>>>>>>>>> - LOG_ER("avsv_set_ckpt_role FAILED"); >>>>>>>>>>> - goto done; >>>>>>>>>>> - } >>>>>>>>>>> - >>>>>>>>>>> if ((rc = rda_register_callback(0, rda_cb)) != >>>>>>>>>>> NCSCC_RC_SUCCESS) { >>>>>>>>>>> LOG_ER("rda_register_callback FAILED %u", rc); >>>>>>>>>>> goto done; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> - if (role == SA_AMF_HA_ACTIVE) { >>>>>>>>>>> - rc = avd_active_role_initialization(cb, role); >>>>>>>>>>> - if (rc != NCSCC_RC_SUCCESS) { >>>>>>>>>>> - LOG_ER("avd_active_role_initialization FAILED"); >>>>>>>>>>> - goto done; >>>>>>>>>>> - } >>>>>>>>>>> - } >>>>>>>>>>> - else { >>>>>>>>>>> - rc = avd_standby_role_initialization(cb); >>>>>>>>>>> - if (rc != NCSCC_RC_SUCCESS) { >>>>>>>>>>> - LOG_ER("avd_standby_role_initialization FAILED"); >>>>>>>>>>> - goto done; >>>>>>>>>>> - } >>>>>>>>>>> + if ((rc = initialize_for_assignment(cb, role)) >>>>>>>>>>> + != NCSCC_RC_SUCCESS) { >>>>>>>>>>> + LOG_ER("initialize_for_assignment FAILED %u", >>>>>>>>>>> + (unsigned) >>>>>>>>>>> rc); >>>>>>>>>>> + goto done; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> rc = NCSCC_RC_SUCCESS; @@ -647,14 +606,13 @@ static >>>>>>>>>>> void main_loop(void) >>>>>>>>>>> fds[FD_TERM].events = POLLIN; >>>>>>>>>>> fds[FD_MBX].fd = mbx_fd.rmv_obj; >>>>>>>>>>> fds[FD_MBX].events = POLLIN; >>>>>>>>>>> - fds[FD_MBCSV].fd = cb->mbcsv_sel_obj; >>>>>>>>>>> - fds[FD_MBCSV].events = POLLIN; >>>>>>>>>>> - fds[FD_CLM].fd = cb->clm_sel_obj; >>>>>>>>>>> - fds[FD_CLM].events = POLLIN; >>>>>>>>>>> - fds[FD_IMM].fd = cb->imm_sel_obj; // IMM fd must be last >in >>>>>>>>>>> array >>>>>>>>>>> - fds[FD_IMM].events = POLLIN; >>>>>>>>>>> - >>>>>>>>>>> while (1) { >>>>>>>>>>> + fds[FD_MBCSV].fd = cb->mbcsv_sel_obj; >>>>>>>>>>> + fds[FD_MBCSV].events = POLLIN; >>>>>>>>>>> + fds[FD_CLM].fd = cb->clm_sel_obj; >>>>>>>>>>> + fds[FD_CLM].events = POLLIN; >>>>>>>>>>> + fds[FD_IMM].fd = cb->imm_sel_obj; // IMM fd must be >>>>>>>>>>> + last in >>>>>>>>>>> array >>>>>>>>>>> + fds[FD_IMM].events = POLLIN; >>>>>>>>>>> >>>>>>>>>>> if (cb->immOiHandle != 0) { >>>>>>>>>>> fds[FD_IMM].fd = cb->imm_sel_obj; diff --git >>>>>>>>>>> a/osaf/services/saf/amf/amfd/ndfsm.cc >>>>>>>>>>> b/osaf/services/saf/amf/amfd/ndfsm.cc >>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/ndfsm.cc >>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/ndfsm.cc >>>>>>>>>>> @@ -190,8 +190,9 @@ void >avd_nd_ncs_su_assigned(AVD_CL_CB >>> *c >>>>>>>>>>> TRACE_ENTER(); >>>>>>>>>>> >>>>>>>>>>> for (const auto& ncs_su : avnd->list_of_ncs_su) { >>>>>>>>>>> - if ((ncs_su->list_of_susi == AVD_SU_SI_REL_NULL) || >>>>>>>>>>> - (ncs_su->list_of_susi->fsm != >AVD_SU_SI_STATE_ASGND)) >>> { >>>>>>>>>>> + if ((ncs_su->sg_of_su->curr_assigned_sus() < 2) && >>>>>>>>>>> + ((ncs_su->list_of_susi == AVD_SU_SI_REL_NULL) || >>>>>>>>>>> + (ncs_su->list_of_susi->fsm != >>>>>>>>>>> + AVD_SU_SI_STATE_ASGND))) { >>>>>>>>>>> TRACE_LEAVE(); >>>>>>>>>>> /* this is an unassigned SU so no need to >>>>>>>>>>> scan further return here. */ >>>>>>>>>>> return; >>>>>>>>>>> @@ -328,6 +329,10 @@ void >>> avd_mds_avnd_down_evh(AVD_CL_CB >>>>> *cb >>>>>>>>>>> if (avd_cb->avail_state_avd == SA_AMF_HA_ACTIVE) { >>>>>>>>>>> avd_node_failover(node); >>>>>>>>>>> + // Update standby out of sync if standby sc goes down >>>>>>>>>>> + if (eravd_cb->node_id_avd_oth == >>>>>>>>>>> node->node_info.nodeId) { >>>>>>>>>>> + cb->stby_sync_state = AVD_STBY_OUT_OF_SYNC; >>>>>>>>>>> + } >>>>>>>>>> [Praveen] Deep down in a function call in avd_node_failover(), >>>>>>>>>> AMF marks standby status IN_SYNC in some special case. AMF >>> marks >>>>>>>>>> out of sync status whenever it gets sync request in chkop.cc >>>>>>>>>> so it is not needed here. >>>>>>>>> [AndersW] I don't remember why this was added. Will check and >>>>> remove >>>>>>>>> if it doesn't cause any problems. Isn't it a bit strange >>>>>>>>> though, so say that the standby is in sync when in fact it is down? >>>>>>>>>>> } else { >>>>>>>>>>> /* Remove dynamic info for node but keep in >>>>>>>>>>> nodeid tree. >>>>>>>>>>> * Possibly used at the end of controller >>>>>>>>>>> failover to diff --git a/osaf/services/saf/amf/amfd/node.cc >>>>>>>>>>> b/osaf/services/saf/amf/amfd/node.cc >>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/node.cc >>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/node.cc >>>>>>>>>>> @@ -120,7 +120,7 @@ void AVD_AVND::initialize() { >>>>>>>>>>> pg_csi_list = {}; >>>>>>>>>>> pg_csi_list.order = NCS_DBLIST_ANY_ORDER; >>>>>>>>>>> pg_csi_list.cmp_cookie = avsv_dblist_uns32_cmp; >>>>>>>>>>> - type = AVSV_AVND_CARD_PAYLOAD; >>>>>>>>>>> + type = AVSV_AVND_CARD_SYS_CON; >>>>>>>>>> [Praveen] Initially keeping as PAYLOAD, AMFD later changes >>>>>>>>>> node type of controller AMFNDs to SYS_CON after evaluating >>>>>>>>>> some >>> parameters. >>>>>>>>>> Above default is set to SYS_CON, there is no change in this >>>>>>>>>> patch which sets payload as payload. In this way node type of >>>>>>>>>> a payload will also become SYS_CON. >>>>>>>>> [AndersW] I think actually the "type" member variable ought to >>>>>>>>> be removed, since it is not needed. I can update the patch to >>>>>>>>> remove this variable completely. >>>>>>>>>>> rcv_msg_id = {}; >>>>>>>>>>> snd_msg_id = {}; >>>>>>>>>>> cluster_list_node_next = {}; @@ -486,11 +486,6 @@ >>>>>>>>>>> static SaAisErrorT node_ccb_completed_de >>>>>>>>>>> return SA_AIS_ERR_BAD_OPERATION; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> - if (node->type == AVSV_AVND_CARD_SYS_CON) { >>>>>>>>>>> - report_ccb_validation_error(opdata, "Cannot remove >>>>>>>>>>> controller node"); >>>>>>>>>>> - return SA_AIS_ERR_BAD_OPERATION; >>>>>>>>>>> - } >>>>>>>>>>> - >>>>>>>>>> [Praveen] Based on some earliar discussion, I remember >>>>>>>>>> deletion of sys controller is restricted. So check should >>>>>>>>>> modified to reject the operation if there are only two system >>>>>>>>>> controlers in the >>> system. >>>>>>>>> [AndersW] Now you can create a cluster with many system >>>>>>>>> controller nodes (all nodes could be controllers), so we can't >>>>>>>>> have this restriction any longer. I don't see any reason to >>>>>>>>> treat the two-node case in a special way. You can create a >>>>>>>>> cluster consisting of only one single node, and then scale it >>>>>>>>> out to a two-node cluster. Why shouldn't it be possible to >>>>>>>>> scale it back to one >>> single node again? >>>>>>>>>>> /* Check to see that the node is in admin locked >>>>>>>>>>> state before delete */ >>>>>>>>>>> if (node->saAmfNodeAdminState != >>>>>>>>>>> SA_AMF_ADMIN_LOCKED_INSTANTIATION) { >>>>>>>>>>> report_ccb_validation_error(opdata, "Node '%s' is >>>>>>>>>>> not locked instantiation", opdata->objectName.value); diff >>>>>>>>>>> --git a/osaf/services/saf/amf/amfd/role.cc >>>>>>>>>>> b/osaf/services/saf/amf/amfd/role.cc >>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/role.cc >>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/role.cc >>>>>>>>>>> @@ -46,6 +46,7 @@ >>>>>>>>>>> #include <si_dep.h> >>>>>>>>>>> #include "osaf_utility.h" >>>>>>>>>>> #include "role.h" >>>>>>>>>>> +#include "nid_api.h" >>>>>>>>>>> >>>>>>>>>>> extern pthread_mutex_t imm_reinit_mutex; >>>>>>>>>>> >>>>>>>>>>> @@ -73,7 +74,15 @@ void avd_role_change_evh(AVD_CL_CB >*cb, >>>>>>>>>>> AVD_ROLE_CHG_CAUSE_T cause = >>>>>>>>>>> msg->msg_info.d2d_chg_role_req.cause; >>>>>>>>>>> SaAmfHAStateT role = >>>>>>>>>>> msg->msg_info.d2d_chg_role_req.role; >>>>>>>>>>> >>>>>>>>>>> - TRACE_ENTER2("cause=%u, role=%u", cause, role); >>>>>>>>>>> + TRACE_ENTER2("cause=%u, role=%u, current_role=%u", >>>>>>>>>>> + cause, >>>>> role, >>>>>>>>>>> + cb->avail_state_avd); >>>>>>>>>>> + >>>>>>>>>>> + if ((status = initialize_for_assignment(cb, role)) >>>>>>>>>>> + != NCSCC_RC_SUCCESS) { >>>>>>>>>>> + LOG_ER("initialize_for_assignment FAILED %u", >>>>>>>>>>> + (unsigned) status); >>>>>>>>>>> + _exit(EXIT_FAILURE); >>>>>>>>>>> + } >>>>>>>>>>> >>>>>>>>>>> if (cb->avail_state_avd == role) { >>>>>>>>>>> goto done; >>>>>>>>>>> @@ -128,6 +137,13 @@ void avd_role_change_evh(AVD_CL_CB >>> *cb, >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> if ((cause == AVD_FAIL_OVER) && >>>>>>>>>>> + (cb->avail_state_avd == SA_AMF_HA_QUIESCED) && (role >>>>>>>>>>> + == >>>>>>>>>>> SA_AMF_HA_STANDBY)) { >>>>>>>>>>> + /* Fail-over Quiesced to Active */ >>>>>>>>>>> + status = NCSCC_RC_SUCCESS; >>>>>>>>>>> + goto done; >>>>>>>>>>> + } >>>>>>>>>> [Praveen] A quiesced controller never goes from quiesced to >>>>>>>>>> stanby only in swichover and not in failover. >>>>>>>>>> So comment must be /*Fail-over Quiesced to standby (spare >>>>>>>>>> controller role change)*/ >>>>>>>>> [AndersW] Will update the comment. >>>>>>>>>>> + >>>>>>>>>>> + if ((cause == AVD_FAIL_OVER) && >>>>>>>>>>> (cb->avail_state_avd == SA_AMF_HA_QUIESCED) && >>>>>>>>>>> (role == >>>>>>>>>>> SA_AMF_HA_ACTIVE)) { >>>>>>>>>>> /* Fail-over Quiesced to Active */ >>>>>>>>>>> status = avd_role_failover_qsd_actv(cb, role); @@ >>>>>>>>>>> -155,7 +171,73 @@ void avd_role_change_evh(AVD_CL_CB *cb, >>>>>>>>>>> return; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> - >>> >/********************************************************** >>>>> ******************\ >>>>>>>>>>> +uint32_t initialize_for_assignment(cl_cb_tag* cb, >>>>>>>>>>> +SaAmfHAStateT >>>>>>>>>>> ha_state) >>>>>>>>>>> +{ >>>>>>>>>>> + TRACE_ENTER2("ha_state = %d", static_cast<int>(ha_state)); >>>>>>>>>>> + SaVersionT ntfVersion = {'A', 0x01, 0x01}; >>>>>>>>>>> + uint32_t rc = NCSCC_RC_SUCCESS; >>>>>>>>>>> + SaAisErrorT error; >>>>>>>>>>> + if (cb->fully_initialized) goto done; >>>>>>>>>>> + cb->avail_state_avd = ha_state; >>>>>>>>>>> + if (ha_state == SA_AMF_HA_QUIESCED) { >>>>>>>>>>> + if ((rc = nid_notify(const_cast<char*>("AMFD"), >>>>>>>>>>> + NCSCC_RC_SUCCESS, nullptr)) != >>>>>>>>>>> NCSCC_RC_SUCCESS) { >>>>>>>>>>> + LOG_ER("nid_notify failed"); >>>>>>>>>>> + } >>>>>>>>>>> + goto done; >>>>>>>>>>> + } >>>>>>>>>>> + if ((rc = avd_mds_init(cb)) != NCSCC_RC_SUCCESS) { >>>>>>>>>>> + LOG_ER("avd_mds_init FAILED"); >>>>>>>>>>> + goto done; >>>>>>>>>>> + } >>>>>>>>>>> + if ((rc = avsv_mbcsv_register(cb)) != NCSCC_RC_SUCCESS) { >>>>>>>>>>> + LOG_ER("avsv_mbcsv_register FAILED"); >>>>>>>>>>> + goto done; >>>>>>>>>>> + } >>>>>>>>>>> + if (avd_clm_init() != SA_AIS_OK) { >>>>>>>>>>> + LOG_EM("avd_clm_init FAILED"); >>>>>>>>>>> + rc = NCSCC_RC_FAILURE; >>>>>>>>>>> + goto done; >>>>>>>>>>> + } >>>>>>>>>>> + if (avd_imm_init(cb) != SA_AIS_OK) { >>>>>>>>>>> + LOG_ER("avd_imm_init FAILED"); >>>>>>>>>>> + rc = NCSCC_RC_FAILURE; >>>>>>>>>>> + goto done; >>>>>>>>>>> + } >>>>>>>>>>> + if ((error = saNtfInitialize(&cb->ntfHandle, nullptr, >>>>>>>>>>> &ntfVersion)) != >>>>>>>>>>> + SA_AIS_OK) { >>>>>>>>>>> + LOG_ER("saNtfInitialize Failed (%u)", error); >>>>>>>>>>> + rc = NCSCC_RC_FAILURE; >>>>>>>>>>> + goto done; >>>>>>>>>>> + } >>>>>>>>>>> + if ((rc = avd_mds_set_vdest_role(cb, ha_state)) != >>>>>>>>>>> NCSCC_RC_SUCCESS) { >>>>>>>>>>> + LOG_ER("avd_mds_set_vdest_role FAILED"); >>>>>>>>>>> + goto done; >>>>>>>>>>> + } >>>>>>>>>>> + if ((rc = avsv_set_ckpt_role(cb, ha_state)) != >>>>>>>>>>> NCSCC_RC_SUCCESS) { >>>>>>>>>>> + LOG_ER("avsv_set_ckpt_role FAILED"); >>>>>>>>>>> + goto done; >>>>>>>>>>> + } >>>>>>>>>>> + if (ha_state == SA_AMF_HA_ACTIVE) { >>>>>>>>>>> + rc = avd_active_role_initialization(cb, ha_state); >>>>>>>>>>> + if (rc != NCSCC_RC_SUCCESS) { >>>>>>>>>>> + LOG_ER("avd_active_role_initialization FAILED"); >>>>>>>>>>> + goto done; >>>>>>>>>>> + } >>>>>>>>>>> + } else if (ha_state == SA_AMF_HA_STANDBY) { >>>>>>>>>>> + rc = avd_standby_role_initialization(cb); >>>>>>>>>>> + if (rc != NCSCC_RC_SUCCESS) { >>>>>>>>>>> + LOG_ER("avd_standby_role_initialization FAILED"); >>>>>>>>>>> + goto done; >>>>>>>>>>> + } >>>>>>>>>>> + } >>>>>>>>>>> + cb->fully_initialized = true; >>>>>>>>>>> +done: >>>>>>>>>>> + TRACE_LEAVE2("rc = %u", rc); >>>>>>>>>>> + return rc; >>>>>>>>>>> +} >>>>>>>>>>> + >>>>>>>>>>> >>> >+/********************************************************* >>>>> ******************* >>>>>>>>>>> \ >>>>>>>>>>> * Function: avd_init_role_set >>>>>>>>>>> * >>>>>>>>>>> * Purpose: AVSV function to handle AVD's initial role setting. >>>>>>>>>>> diff --git a/osaf/services/saf/amf/amfd/sgproc.cc >>>>>>>>>>> b/osaf/services/saf/amf/amfd/sgproc.cc >>>>>>>>>>> --- a/osaf/services/saf/amf/amfd/sgproc.cc >>>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/sgproc.cc >>>>>>>>>>> @@ -1390,7 +1390,16 @@ void >avd_su_si_assign_evh(AVD_CL_CB >>>>> *cb, >>>>>>>>>>> /* Since a NCS SU has been assigned >>>>>>>>>>> trigger the node FSM. */ >>>>>>>>>>> /* For (ncs_spec == SA_TRUE), su will >>>>>>>>>>> not be external, so su >>>>>>>>>>> will have node attached. */ >>>>>>>>>>> - avd_nd_ncs_su_assigned(cb, >>>>>>>>>>> susi->su->su_on_node); >>>>>>>>>>> + for (AmfDb<uint32_t, >>>>>>>>>>> + AVD_AVND>::const_iterator >>>>>>>>>>> it = node_id_db->begin(); >>>>>>>>>>> + it != node_id_db->end(); it++) { >>>>>>>>>>> + AVD_AVND *node = >>>>>>>>>>> const_cast<AVD_AVND*>((*it).second); >>>>>>>>>>> + >>>>>>>>>>> + if (node->node_state == >>>>>>>>>>> AVD_AVND_STATE_NCS_INIT && node->adest != 0) { >>>>>>>>>>> + avd_nd_ncs_su_assigned(cb, node); >>>>>>>>>>> + } else { >>>>>>>>>>> + TRACE("Node_state: %u adest: %" >>>>>>>>>>> + PRIx64 >>>>>>>>>>> " node not ready for assignments", node->node_state, node- >>>>>> adest); >>>>>>>>>>> + } >>>>>>>>>>> + } >>>>>>>>>> [Praveen] Could not understand this change? Since spare >>>>>>>>>> controllers are also payloads, in which case adest can be 0. >>>>>>>>>> Is this for headless case, there comp and su assignment >>>>>>>>>> information comes before node >>>>> up >>>>>>>>>> message. >>>>>>>>> [AndersW] This loop is needed to ensure set_leds is performed. >>>>>>>>> It is sufficient that we have an active and a standby >>>>>>>>> assignment for the OpenSAF 2N SU, so once we have that we need >>>>>>>>> to loop over all nodes >>>>> and >>>>>>>>> perform set_leds if possible. >>>>>>>>> >>>>>>>>> I think the check that adest is non-zero was added by Hans, to >>>>>>>>> fix some problem that might be related to headless if I >>>>>>>>> remember correctly. @Hans, do you remember why this check was >needed? >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> } else { >>>>>>>>>>> diff --git a/osaf/services/saf/amf/amfnd/clm.cc >>>>>>>>>>> b/osaf/services/saf/amf/amfnd/clm.cc >>>>>>>>>>> --- a/osaf/services/saf/amf/amfnd/clm.cc >>>>>>>>>>> +++ b/osaf/services/saf/amf/amfnd/clm.cc >>>>>>>>>>> @@ -37,6 +37,7 @@ >>>>>>>>>>> #include "avnd.h" >>>>>>>>>>> #include "mds_pvt.h" >>>>>>>>>>> #include "nid_api.h" >>>>>>>>>>> +#include "osaf_time.h" >>>>>>>>>>> >>>>>>>>>>> static void clm_node_left(SaClmNodeIdT node_id) >>>>>>>>>>> { >>>>>>>>>>> @@ -166,7 +167,6 @@ uint32_t >>>>> avnd_evt_avd_node_up_evh(AVND_C >>>>>>>>>>> info = &evt->info.avd->msg_info.d2n_node_up; >>>>>>>>>>> >>>>>>>>>>> /*** update this node with the supplied parameters ***/ >>>>>>>>>>> - cb->type = info->node_type; >>>>>>>>>>> cb->su_failover_max = info->su_failover_max; >>>>>>>>>>> cb->su_failover_prob = info->su_failover_prob; >>>>>>>>>>> >>>>>>>>>>> @@ -249,8 +249,6 @@ done: >>>>>>>>>>> return; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> -static SaVersionT Version = { 'B', 4, 1 }; >>>>>>>>>>> - >>>>>>>>>>> static const SaClmCallbacksT_4 callbacks = { >>>>>>>>>>> 0, >>>>>>>>>>> /*.saClmClusterTrackCallback =*/ clm_track_cb @@ >>>>>>>>>>> -263,11 +261,24 @@ SaAisErrorT avnd_clm_init(void) >>>>>>>>>>> >>>>>>>>>>> TRACE_ENTER(); >>>>>>>>>>> avnd_cb->first_time_up = true; >>>>>>>>>> [Praveen] Did not get the reason for its removal? >>>>>>>>>> Not being updated any where else in the patch. >>>>>>>>> [AndersW] Nothing is removed here. The CLM initialization loop >>>>>>>>> has been enhanced to handle more error codes (TIMEOUT & >>>>> UNAVAILABLE). >>>>>>>>>>> - error = saClmInitialize_4(&avnd_cb->clmHandle, &callbacks, >>>>>>>>>>> &Version); >>>>>>>>>>> - if (SA_AIS_OK != error) { >>>>>>>>>>> - LOG_ER("Failed to Initialize with CLM: %u", error); >>>>>>>>>>> - goto done; >>>>>>>>>>> - } >>>>>>>>>>> + for (;;) { >>>>>>>>>>> + SaVersionT Version = { 'B', 4, 1 }; >>>>>>>>>>> + error = saClmInitialize_4(&avnd_cb->clmHandle, >&callbacks, >>>>>>>>>>> + &Version); >>>>>>>>>>> + if (error == SA_AIS_ERR_TRY_AGAIN || >>>>>>>>>>> + error == SA_AIS_ERR_TIMEOUT || >>>>>>>>>>> + error == SA_AIS_ERR_UNAVAILABLE) { >>>>>>>>>>> + if (error != SA_AIS_ERR_TRY_AGAIN) { >>>>>>>>>>> + LOG_WA("saClmInitialize_4 returned %u", >>>>>>>>>>> + (unsigned) error); >>>>>>>>>>> + } >>>>>>>>>>> + osaf_nanosleep(&kHundredMilliseconds); >>>>>>>>>>> + continue; >>>>>>>>>>> + } >>>>>>>>>>> + if (error == SA_AIS_OK) break; >>>>>>>>>>> + LOG_ER("Failed to Initialize with CLM: %u", error); >>>>>>>>>>> + goto done; >>>>>>>>>>> + } >>>>>>>>>>> error = saClmSelectionObjectGet(avnd_cb->clmHandle, >>>>>>>>>>> &avnd_cb->clm_sel_obj); >>>>>>>>>>> if (SA_AIS_OK != error) { >>>>>>>>>>> LOG_ER("Failed to get CLM selectionObject: >>>>>>>>>>> %u", error); >>>>>>>>>>> >>>>> ------------------------------------------------------------------- >>>>> -- >>>>> --------- >>>>> Transform Data into Opportunity. >>>>> Accelerate data analysis in your applications with Intel Data >>>>> Analytics Acceleration Library. >>>>> Click to learn more. >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140 >>>>> _______________________________________________ >>>>> Opensaf-devel mailing list >>>>> [email protected] >>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel > ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140 _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
