Hi Minh, Following content will go in Readme:
High Level Implementation Notes for LOCK and SHUTDOWN operation on NG: =========================================================== In 2N model, there are broadly two cases for LOCK and SHUTDOWN operation on NG: A) Whole 2N SG is mapped in NG which means all assigned SU are hosted on the nodes of NG. Or B) SG is partially mapped in NG i.e either active SU or standby SU is hosted on one of the nodes of NG. Currently 2N model supports SI dep with in SU. So in LOCK and SHUTDOWN operations quiesced and quiescing HA states, should be given honoring SI deps respectively. In case A), AVD_SG_FSM_SG_ADMIN is used in case A) as it honors si dep while giving quiesced or quiescing assignments. Case B) becomes the case of either lock of standby Node/SU or active Node/Su which still needs to be handled honoring SI dep. Here AVD_SG_FSM_SU_OPER fsm state is used via internal function su_admin_down() as it handles SI dep. So in 2N model, AMFD always uses internal FSM functions by calling them inside the wrapper function ng_admin(). Other red models do not support SI deps within SU as of now. So in these models, AMFD uses AVD_SG_FSM_SG_REALIGN by keeping multiple SUs in operation list while performing LOCK and SHUTDOWN operation on NG. Once SI dep is fully supported in these modles, AMFD can use internal SG FSM state like 2N. However there is one case, where AMFD can still use AVD_SG_FSM_SG_ADMIN state for these red models when whole SG is mapped in NG (all assigned SUs are hosted on on the nodes of NG). But possibility of such a case is more in 2N model where only two SUs can be assigned anytime. In other red models, there can be many assigned SUs so possibility of whole SG is mapped in NG is very less. So, as of now, in other models AVD_SG_FSM_SG_REALIGN states is used by keeping multiple SUs in oper list for this case also. But when SI deps, is completely supported with in SU in these models, then AMFD cannot use realign state and it will have to use internal FSM code as it will be enhanced for that. Thanks, Praveemn On 29-Aug-16 12:44 PM, minh chau wrote: > Hi Praveen, > > Yes it's for #1454 and #1608 > > Thanks, > Minh > > On 29/08/16 17:08, praveen malviya wrote: >> Hi Minh, >> >> Thanks for reviewing and testing. >> I guess the ack is for both #1454 and #1608. >> I will add a note. >> >> >> Thanks, >> Praveen >> >> On 29-Aug-16 12:21 PM, minh chau wrote: >>> Hi Praveen, >>> >>> Ack (normal test only), the abnormal failover is supposed to work in >>> REALIGN in NpM and Nway I think >>> Maybe you can add a note in README to revisit the consistency of using >>> SG FSM for nodegroup in all SGs? >>> >>> Thanks, >>> Minh >>> >>> On 24/08/16 19:02, praveen malviya wrote: >>>> Hi Minh, >>>> >>>> Please responses inline with [Praveen] >>>> >>>> Thanks, >>>> Praveen >>>> >>>> On 24-Aug-16 2:01 PM, minh chau wrote: >>>>> Hi Praveen, >>>>> >>>>> I have tested the patches in case all SUs of NpM/Nway belong to one >>>>> nodegroup, it works for me. Will try the case that SUs are partially >>>>> in/out of nodegroup tomorrow. >>>>> But please, if you have time, can you try to make NpM's ng_admin under >>>>> FSM_SG_ADMIN. Addition to my previous comment, and also as you >>>>> knew, in >>>>> #1725 it's hard to deduce the states (sg fsm state is one of those) >>>>> because different SG is using FSM differently in some cases. This >>>>> inconsistency causes a difficult, instead of applying a generic >>>>> solution >>>>> for all SG, then each SG has to be treated different way. I think we >>>>> could also see this kind of problem in future. >>>> [Praveen] In 5.1, #1725 is targeting only 2N model.In future when >>>> #1725 will support other red models (upcoming releases), I would do >>>> any changes that is required in other red models for NG operations. >>>> >>>> >>>>> Thanks, >>>>> Minh >>>>> >>>>> On 23/08/16 16:58, praveen malviya wrote: >>>>>> Hi Minh, >>>>>> >>>>>> I have attached patches for #1454 and #1608 in the ticket #1454. >>>>>> Please apply them in order. >>>>>> >>>>>> Thanks, >>>>>> Praveen >>>>>> >>>>>> On 23-Aug-16 11:56 AM, minh chau wrote: >>>>>>> Hi Praveen, >>>>>>> >>>>>>> Since AMF longDn has been pushed, can you please attach a longDn >>>>>>> rebased >>>>>>> version to ticket (both #1454 + #1608) so we can do some test? >>>>>>> >>>>>>> Thanks, >>>>>>> Minh >>>>>>> >>>>>>> On 23/08/16 15:56, praveen malviya wrote: >>>>>>>> Hi Minh, >>>>>>>> >>>>>>>> Thanks for reviewing the patch. >>>>>>>> Please see inline with [Praveen] >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Praveen >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 23-Aug-16 5:53 AM, minh chau wrote: >>>>>>>>> Hi Praveen, >>>>>>>>> >>>>>>>>> One comment in line with [Minh] >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Minh >>>>>>>>> >>>>>>>>> On 20/07/16 18:57, praveen.malv...@oracle.com wrote: >>>>>>>>>> osaf/services/saf/amf/amfd/include/sg.h | 1 + >>>>>>>>>> osaf/services/saf/amf/amfd/nodegroup.cc | 4 +- >>>>>>>>>> osaf/services/saf/amf/amfd/sg_npm_fsm.cc | 62 >>>>>>>>>> ++++++++++++++++++++++++++++++- >>>>>>>>>> 3 files changed, 62 insertions(+), 5 deletions(-) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Currently 2N, N-Way Active and NoRed models are supported for >>>>>>>>>> lock, >>>>>>>>>> shutdown, >>>>>>>>>> lock-in and unlock-in admin operations on NGs. >>>>>>>>>> >>>>>>>>>> This patch supports NplusM model also. >>>>>>>>>> >>>>>>>>>> diff --git a/osaf/services/saf/amf/amfd/include/sg.h >>>>>>>>>> b/osaf/services/saf/amf/amfd/include/sg.h >>>>>>>>>> --- a/osaf/services/saf/amf/amfd/include/sg.h >>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/include/sg.h >>>>>>>>>> @@ -507,6 +507,7 @@ public: >>>>>>>>>> uint32_t susi_failed(AVD_CL_CB *cb, AVD_SU *su, >>>>>>>>>> struct avd_su_si_rel_tag *susi, AVSV_SUSI_ACT act, >>>>>>>>>> SaAmfHAStateT state); >>>>>>>>>> void node_fail_si_oper(AVD_CL_CB *cb, AVD_SU *su); >>>>>>>>>> + void ng_admin(AVD_SU *su, AVD_AMF_NG *ng); >>>>>>>>>> private: >>>>>>>>>> uint32_t su_fault_su_oper(AVD_CL_CB *cb, AVD_SU *su); >>>>>>>>>> diff --git a/osaf/services/saf/amf/amfd/nodegroup.cc >>>>>>>>>> b/osaf/services/saf/amf/amfd/nodegroup.cc >>>>>>>>>> --- a/osaf/services/saf/amf/amfd/nodegroup.cc >>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/nodegroup.cc >>>>>>>>>> @@ -687,6 +687,7 @@ void avd_ng_admin_state_set(AVD_AMF_NG* >>>>>>>>>> avd_send_admin_state_chg_ntf(&ng->name, >>>>>>>>>> (SaAmfNotificationMinorIdT)SA_AMF_NTFID_NG_ADMIN_STATE, >>>>>>>>>> old_state, ng->saAmfNGAdminState); >>>>>>>>>> + TRACE_LEAVE(); >>>>>>>>>> } >>>>>>>>>> /** >>>>>>>>>> * @brief Verify if Node is stable for admin operation on >>>>>>>>>> Nodegroup >>>>>>>>>> etc. >>>>>>>>>> @@ -749,8 +750,7 @@ static SaAisErrorT check_red_model_servi >>>>>>>>>> LOG_NO("service outage for '%s' because of >>>>>>>>>> shutdown/lock " >>>>>>>>>> "on '%s'",sg->name.value,ng->name.value); >>>>>>>>>> - if ((sg->sg_redundancy_model == >>>>>>>>>> SA_AMF_N_WAY_REDUNDANCY_MODEL) || >>>>>>>>>> - (sg->sg_redundancy_model == >>>>>>>>>> SA_AMF_NPM_REDUNDANCY_MODEL)) { >>>>>>>>>> + if (sg->sg_redundancy_model == >>>>>>>>>> SA_AMF_N_WAY_REDUNDANCY_MODEL) { >>>>>>>>>> LOG_NO("Admin op on '%s' hosting SUs of '%s' with >>>>>>>>>> redundancy '%u' " >>>>>>>>>> "is not supported",ng->name.value, >>>>>>>>>> sg->name.value, >>>>>>>>>> sg->sg_redundancy_model); >>>>>>>>>> diff --git a/osaf/services/saf/amf/amfd/sg_npm_fsm.cc >>>>>>>>>> b/osaf/services/saf/amf/amfd/sg_npm_fsm.cc >>>>>>>>>> --- a/osaf/services/saf/amf/amfd/sg_npm_fsm.cc >>>>>>>>>> +++ b/osaf/services/saf/amf/amfd/sg_npm_fsm.cc >>>>>>>>>> @@ -120,16 +120,16 @@ static AVD_SU_SI_REL *avd_sg_npm_su_othr >>>>>>>>>> if (i_susi->si->list_of_sisu != i_susi) { >>>>>>>>>> o_susi = i_susi->si->list_of_sisu; >>>>>>>>>> if (o_susi->fsm != AVD_SU_SI_STATE_UNASGN) >>>>>>>>>> - return o_susi; >>>>>>>>>> + break; >>>>>>>>>> } else if (i_susi->si->list_of_sisu->si_next != >>>>>>>>>> AVD_SU_SI_REL_NULL) { >>>>>>>>>> o_susi = i_susi->si->list_of_sisu->si_next; >>>>>>>>>> if (o_susi->fsm != AVD_SU_SI_STATE_UNASGN) >>>>>>>>>> - return o_susi; >>>>>>>>>> + break; >>>>>>>>>> } >>>>>>>>>> i_susi = i_susi->su_next; >>>>>>>>>> } >>>>>>>>>> - >>>>>>>>>> + TRACE_LEAVE2("o_susi:'%p'",o_susi); >>>>>>>>>> return o_susi; >>>>>>>>>> } >>>>>>>>>> @@ -4452,6 +4452,62 @@ uint32_t SG_NPM::sg_admin_down(AVD_CL_CB >>>>>>>>>> return NCSCC_RC_SUCCESS; >>>>>>>>>> } >>>>>>>>>> +/* >>>>>>>>>> + * @brief Handles modification of assignments in SU of >>>>>>>>>> NpM SG >>>>>>>>>> + * because of lock or shutdown operation on Node >>>>>>>>>> group. >>>>>>>>>> + * If SU does not have any SIs assigned to it, AMF >>>>>>>>>> will >>>>>>>>>> try >>>>>>>>>> + * to instantiate new SUs in the SG. If SU has >>>>>>>>>> assignments, >>>>>>>>>> + * then depending upon lock or shutdown operation, >>>>>>>>>> quiesced >>>>>>>>>> + * or quiescing state will be sent for active SIs >>>>>>>>>> in SU. >>>>>>>>>> + * If SU has only standby assignments then remove the >>>>>>>>>> assignments. >>>>>>>>>> + * >>>>>>>>>> + * @param[in] ptr to SU >>>>>>>>>> + * @param[in] ptr to nodegroup AVD_AMF_NG. >>>>>>>>>> + */ >>>>>>>>>> +void SG_NPM::ng_admin(AVD_SU *su, AVD_AMF_NG *ng) >>>>>>>>>> +{ >>>>>>>>>> + SaAmfHAStateT ha_state; >>>>>>>>>> + >>>>>>>>>> + TRACE_ENTER2("'%s', sg_fsm_state:%u",su->name.value, >>>>>>>>>> + su->sg_of_su->sg_fsm_state); >>>>>>>>>> + >>>>>>>>>> + if (su->list_of_susi == nullptr) { >>>>>>>>>> + avd_sg_app_su_inst_func(avd_cb, su->sg_of_su); >>>>>>>>>> + return; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + if (ng->saAmfNGAdminState == SA_AMF_ADMIN_SHUTTING_DOWN) >>>>>>>>>> + ha_state = SA_AMF_HA_QUIESCING; >>>>>>>>>> + else >>>>>>>>>> + ha_state = SA_AMF_HA_QUIESCED; >>>>>>>>>> + >>>>>>>>>> + if (su->list_of_susi->state == SA_AMF_HA_ACTIVE) { >>>>>>>>>> + if (avd_sg_su_si_mod_snd(avd_cb, su, ha_state) == >>>>>>>>>> NCSCC_RC_FAILURE) { >>>>>>>>>> + LOG_ER("quiescing/quiesced state transtion failed for >>>>>>>>>> '%s'",su->name.value); >>>>>>>>>> + goto done; >>>>>>>>>> + } >>>>>>>>>> + } else { >>>>>>>>>> + if (avd_sg_su_si_del_snd(avd_cb, su) == NCSCC_RC_FAILURE) { >>>>>>>>>> + LOG_ER("removal of standby assignment failed for >>>>>>>>>> '%s'",su->name.value); >>>>>>>>>> + goto done; >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + avd_sg_su_oper_list_add(avd_cb, su, false); >>>>>>>>>> + su->sg_of_su->set_fsm_state(AVD_SG_FSM_SG_REALIGN); >>>>>>>>> [Minh]: For ng_admin operation, 2N is using fsm >>>>>>>>> AVD_SG_FSM_SG_ADMIN, the >>>>>>>>> NoRed, NwayActive are using AVD_SG_FSM_SG_REALIGN. Now the >>>>>>>>> support for >>>>>>>>> NpM and Nway are using AVD_SG_FSM_SG_REALIGN also. >>>>>>>>> Quickly browsing through ng_admin() of NoRed, NwayActive, NpM, >>>>>>>>> Nway, >>>>>>>>> they are very similar to the existing of those sg_admin_down(), >>>>>>>>> respectively. I think the NoRed, Nway, NwayACtive, NpM should be >>>>>>>>> using >>>>>>>>> AVD_SG_FSM_SG_ADMIN in the same way as 2N. The reason is the same >>>>>>>>> admin >>>>>>>>> operation should be treated in the same SG FSM state/code >>>>>>>>> between SG >>>>>>>>> types, and if sg_admin_down() works, we should reuse them. >>>>>>>> >>>>>>>> In 2N model, there are only two cases: >>>>>>>> A) Whole SG is mapped in NG or >>>>>>>> B) SG is partially mapped in NG. One of the SU is having active or >>>>>>>> standby assignments in one of the nodes of NG. >>>>>>>> >>>>>>>> So in 2N model, AVD_SG_FSM_SG_ADMIN is used in case A) because 2N >>>>>>>> model supports SI dep within SU. Because of this quiesced/quiescing >>>>>>>> assignments must be given honoring si dep. AVD_SG_FSM_SG_ADMIN >>>>>>>> honors >>>>>>>> that while giving quiesced/quiescing assignments. >>>>>>>> Case B) becomes the case of either lock of standby Node/su or >>>>>>>> active >>>>>>>> node/su which still needs to be handled honoring SI dep in >>>>>>>> su_admin_down(). >>>>>>>> >>>>>>>> Other red models do not support, SI deps within SU. Once it is >>>>>>>> supported we will have to use internal SG FSM state. But without SI >>>>>>>> deps, other red models can still use AVD_SG_FSM_SG_ADMIN when >>>>>>>> whole SG >>>>>>>> is mapped in NG. But possibility of such a case is more in 2N model >>>>>>>> where only two SUs can be assigned anytime. In other red models, >>>>>>>> there >>>>>>>> can be many assigned SUs so possibility of whole SG is mapped in >>>>>>>> NG is >>>>>>>> very less. So in other models AVD_SG_FSM_SG_REALIGN states is >>>>>>>> used by >>>>>>>> keeping multiple SUs in oper list because these red models are >>>>>>>> handling switchover situation in REALIGN state (removal of >>>>>>>> standby is >>>>>>>> always handled in realign state). But when SI deps, is completely >>>>>>>> supported within SU in these models, then we cannot use realign >>>>>>>> state >>>>>>>> and we will have to use internal FSM code because internal FSM code >>>>>>>> will be enhanced for that. >>>>>>>> In 2N model, switchover is being done honoring SI dep but that >>>>>>>> is in >>>>>>>> some bug fix cases. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Praveen >>>>>>>>>> + >>>>>>>>>> + //Increment node counter for tracking status of ng operation. >>>>>>>>>> + if ((su->any_susi_fsm_in_modify() == true) || >>>>>>>>>> + (su->any_susi_fsm_in_unasgn() == true)) { >>>>>>>>>> + su->su_on_node->su_cnt_admin_oper++; >>>>>>>>>> + TRACE("node:%s, su_cnt_admin_oper:%u", >>>>>>>>>> su->su_on_node->name.value, >>>>>>>>>> + su->su_on_node->su_cnt_admin_oper); >>>>>>>>>> + } >>>>>>>>>> +done: >>>>>>>>>> + TRACE_LEAVE(); >>>>>>>>>> + return; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> SG_NPM::~SG_NPM() { >>>>>>>>>> } >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel