> -----Original Message----- > From: praveen malviya [mailto:[email protected]] > Sent: den 26 juni 2013 09:35 > To: Hans Feldt > Cc: Hans Feldt; Mathivanan Naickan Palanivelu; [email protected]; > [email protected] > Subject: Re: [devel] [PATCH 5 of 6] amf: support sufailover at amfnd [#98] > > Please see response inline. > Thanks > Praveen > > On 24-Jun-13 7:30 PM, Hans Feldt wrote: > > Hi, > > > > I think it should be safe to do SU failover for all redundancy models. > > It is a much easier operation than comp failover. It would simplify > > the patches, specially the AMF node director parts in 5/6 patches. > > > > Please explain (with consequences) why it is needed to know the SG > > redundancy model in the AMF node director. > These patches implements su-failover only for 2N and NoRed model. This is > the only reason red model is needed at amfnd. > Once the implementation will be done for all red model there will not be any > need to maintain red model at amfnd.
Let's say amfnd does not care about SG redundancy model (which it should not!). Handling SU failover it terminates all components of the SU and then sends a message to amfd. If now amfd interpretes this as component failover (as it does today) and sends a SUSI-MODIFY(QUIESCED) request to amfnd, that can just be interpreted as a nop in amfnd. Can you try this approach? Should even work without any changes in amfd. Sounds much cleaner to me. > Regarding the implementation of su-failover in all models, this needs again > full assessment in other red models how errors are handled in all sg fsm > states and also unit testing effort will be too huge to implement in one go. > > Component fail-over is different in all red models. In future also red model > attribute needs to be maintained at amfnd in case we are implementing > component failover for any specific red model and not for all red model in > one go. No this should not be needed. Please try the suggested approach below. I strongly think it is architectural wrong that the AMF ND cares about the SG redundancy model. Thanks, Hans > Thanks, > Praveen > > Thanks, > > Hans > > > > > > On 7 June 2013 08:39, <[email protected]> wrote: > >> osaf/services/saf/avsv/avnd/avnd_err.c | 150 > ++++++++++++++++++++++---------- > >> 1 files changed, 103 insertions(+), 47 deletions(-) > >> > >> > >> This patch handles compfailover and sufailover in comformance with the > AMF-B.04.01 spec at amfnd. Currently only 2N model and NoRed models are > supported. For other models, saAmfSUFailover will be ignored and > compFailover will be performed. During suFailover SU will be disabled and all > comps will be abruptly terminated. Also handles the case when > saAmfSUFailover is true and Nodswitchover gets escalated. > >> > >> diff --git a/osaf/services/saf/avsv/avnd/avnd_err.c > >> b/osaf/services/saf/avsv/avnd/avnd_err.c > >> --- a/osaf/services/saf/avsv/avnd/avnd_err.c > >> +++ b/osaf/services/saf/avsv/avnd/avnd_err.c > >> @@ -401,8 +401,15 @@ uint32_t avnd_err_escalate(AVND_CB *cb, > >> *io_esc_rcvr = comp->err_info.def_rec; > >> > >> /* disallow comp-restart if it's disabled */ > >> - if ((SA_AMF_COMPONENT_RESTART == *io_esc_rcvr) && > m_AVND_COMP_IS_RESTART_DIS(comp)) > >> + if ((SA_AMF_COMPONENT_RESTART == *io_esc_rcvr) && > m_AVND_COMP_IS_RESTART_DIS(comp) && (!su->is_ncs)) { > >> + LOG_NO("saAmfCompDisableRestart is true for '%s'",comp- > >name.value); > >> + *io_esc_rcvr = SA_AMF_COMPONENT_FAILOVER; > >> + } > >> + > >> + if ((SA_AMF_COMPONENT_FAILOVER== *io_esc_rcvr) && (su- > >sufailover) && (!su->is_ncs)) { > >> + LOG_NO("saAmfSUFailover is true for > >> + '%s'",comp->su->name.value); > >> *io_esc_rcvr = AVSV_ERR_RCVR_SU_FAILOVER; > >> + } > >> > >> switch (*io_esc_rcvr) { > >> case SA_AMF_COMPONENT_FAILOVER: /* treat it as su failover > >> */ @@ -519,7 +526,6 @@ uint32_t avnd_err_recover(AVND_CB *cb, A > >> break; > >> > >> case SA_AMF_COMPONENT_FAILOVER: > >> - /* not supported */ > >> rc = avnd_err_rcvr_comp_failover(cb, comp); > >> break; > >> > >> @@ -671,45 +677,21 @@ uint32_t avnd_err_rcvr_su_restart(AVND_C > >> return rc; > >> } > >> > >> - > /********************************************************** > ****************** > >> - Name : avnd_err_rcvr_comp_failover > >> - > >> - Description : This routine executes component failover recovery. > >> - > >> - Arguments : cb - ptr to the AvND control block > >> - comp - ptr to the comp > >> - > >> - Return Values : NCSCC_RC_SUCCESS/NCSCC_RC_FAILURE. > >> - > >> - Notes : None. > >> - > ********************************************************** > ********** > >> **********/ -uint32_t avnd_err_rcvr_comp_failover(AVND_CB *cb, > >> AVND_COMP *comp) > >> +/** > >> + * This function performs component failover recovery action. > >> + * > >> + * @param cb: ptr to AvND contol block. > >> + * @param comp: ptr to failed component. > >> + * > >> + * @return NCSCC_RC_SUCCESS/NCSCC_RC_FAILURE. > >> + */ > >> +uint32_t avnd_err_rcvr_comp_failover(AVND_CB *cb, AVND_COMP > >> +*failed_comp) > >> { > >> uint32_t rc = NCSCC_RC_SUCCESS; > >> - LOG_NO("%s, Unsupported",__FUNCTION__); > >> + AVND_SU *su; > >> > >> - return rc; > >> -} > >> - > >> - > /********************************************************** > ****************** > >> - Name : avnd_err_rcvr_su_failover > >> - > >> - Description : This routine executes SU failover recovery. > >> - > >> - Arguments : cb - ptr to the AvND control block > >> - su - ptr to the SU to which the comp belongs > >> - failed_comp - ptr to the failed comp that triggered this > >> - recovery > >> - > >> - Return Values : NCSCC_RC_SUCCESS/NCSCC_RC_FAILURE. > >> - > >> - Notes : None. > >> - > ********************************************************** > ********** > >> **********/ -uint32_t avnd_err_rcvr_su_failover(AVND_CB *cb, > AVND_SU > >> *su, AVND_COMP *failed_comp) -{ > >> - uint32_t rc = NCSCC_RC_SUCCESS; > >> - TRACE_ENTER(); > >> - > >> + TRACE_ENTER2("'%s'", failed_comp->name.value); > >> + su = failed_comp->su; > >> /* mark the comp failed */ > >> m_AVND_COMP_FAILED_SET(failed_comp); > >> m_AVND_SEND_CKPT_UPDT_ASYNC_UPDT(cb, failed_comp, > >> AVND_CKPT_COMP_FLAG_CHANGE); @@ -732,7 +714,7 @@ uint32_t > avnd_err_rcvr_su_failover(AVND_ > >> m_AVND_SEND_CKPT_UPDT_ASYNC_UPDT(cb, su, > >> AVND_CKPT_SU_OPER_STATE); > >> > >> /* inform AvD */ > >> - rc = avnd_di_oper_send(cb, su, AVSV_ERR_RCVR_SU_FAILOVER); > >> + rc = avnd_di_oper_send(cb, su, SA_AMF_COMPONENT_FAILOVER); > >> > >> /* > >> * su-sis may be in assigning/removing state. signal csi @@ > >> -763,6 +745,52 @@ uint32_t avnd_err_rcvr_su_failover(AVND_ > >> return rc; > >> } > >> > >> +/** > >> + * This function performs SU failover recovery action. > >> + * > >> + * @param cb: ptr to AvND contol block. > >> + * @param su: ptr to the SU which contains the failed component. > >> + * @param comp: ptr to failed component. > >> + * > >> + * @return NCSCC_RC_SUCCESS/NCSCC_RC_FAILURE. > >> + */ > >> +uint32_t avnd_err_rcvr_su_failover(AVND_CB *cb, AVND_SU *su, > >> +AVND_COMP *failed_comp) { > >> + AVND_COMP *comp; > >> + uint32_t rc = NCSCC_RC_SUCCESS; > >> + > >> + > >> + TRACE_ENTER2("'%s' '%s'", su->name.value, failed_comp- > >name.value); > >> + if ((su->sg_redundancy_model != > SA_AMF_2N_REDUNDANCY_MODEL) && > >> + (su->sg_redundancy_model != > SA_AMF_NO_REDUNDANCY_MODEL)) { > >> + rc = avnd_err_rcvr_comp_failover(cb, failed_comp); > >> + goto done; > >> + } > >> + m_AVND_COMP_FAILED_SET(failed_comp); > >> + m_AVND_COMP_OPER_STATE_SET(failed_comp, > SA_AMF_OPERATIONAL_DISABLED); > >> + m_AVND_SU_FAILED_SET(su); > >> + m_AVND_SU_OPER_STATE_SET(su, > SA_AMF_OPERATIONAL_DISABLED); > >> + > >> + LOG_NO("Terminating components of '%s'(abruptly & > unordered)",su->name.value); > >> + /* Unordered cleanup of components of failed SU */ > >> + for (comp = > m_AVND_COMP_FROM_SU_DLL_NODE_GET(m_NCS_DBLIST_FIND_FIRST( > &su->comp_list)); > >> + comp; > >> + comp = > m_AVND_COMP_FROM_SU_DLL_NODE_GET(m_NCS_DBLIST_FIND_NEXT( > &comp->su_dll_node))) { > >> + if (comp->su->su_is_external) > >> + continue; > >> + > >> + rc = avnd_comp_clc_fsm_run(cb, comp, > AVND_COMP_CLC_PRES_FSM_EV_CLEANUP); > >> + if (NCSCC_RC_SUCCESS != rc) { > >> + LOG_ER("'%s' termination failed", > >> comp->name.value); > >> + goto done; > >> + } > >> + } > >> +done: > >> + > >> + TRACE_LEAVE2("%u", rc); > >> + return rc; > >> +} > >> + > >> > /********************************************************** > ****************** > >> Name : avnd_err_rcvr_node_switchover > >> > >> @@ -781,7 +809,7 @@ uint32_t avnd_err_rcvr_node_switchover(A > >> { > >> uint32_t rc = NCSCC_RC_SUCCESS; > >> TRACE_ENTER(); > >> - > >> + AVND_COMP *comp; > >> /* increase log level to info */ > >> setlogmask(LOG_UPTO(LOG_INFO)); > >> > >> @@ -836,11 +864,33 @@ uint32_t avnd_err_rcvr_node_switchover(A > >> if (NCSCC_RC_SUCCESS != rc) > >> goto done; > >> > >> - /* terminate the failed comp */ > >> - if (m_AVND_SU_IS_PREINSTANTIABLE(failed_su)) { > >> - rc = avnd_comp_clc_fsm_run(cb, failed_comp, > AVND_COMP_CLC_PRES_FSM_EV_CLEANUP); > >> - if (NCSCC_RC_SUCCESS != rc) > >> - goto done; > >> + if (m_AVND_SU_IS_FAILED(failed_comp->su) && (failed_comp->su- > >sufailover) && > >> + ((failed_comp->su->sg_redundancy_model == > SA_AMF_NO_REDUNDANCY_MODEL) || > >> + (failed_comp->su->sg_redundancy_model == > SA_AMF_2N_REDUNDANCY_MODEL))) > >> + { > >> + LOG_NO("Terminating components of '%s'(abruptly & > unordered)",failed_su->name.value); > >> + /* Unordered cleanup of components of failed SU */ > >> + for (comp = > m_AVND_COMP_FROM_SU_DLL_NODE_GET(m_NCS_DBLIST_FIND_FIRST( > &failed_su->comp_list)); > >> + comp; > >> + comp = > m_AVND_COMP_FROM_SU_DLL_NODE_GET(m_NCS_DBLIST_FIND_NEXT( > &comp->su_dll_node))) { > >> + if (comp->su->su_is_external) > >> + continue; > >> + > >> + rc = avnd_comp_clc_fsm_run(cb, comp, > AVND_COMP_CLC_PRES_FSM_EV_CLEANUP); > >> + if (NCSCC_RC_SUCCESS != rc) { > >> + LOG_ER("'%s' termination failed", > >> comp->name.value); > >> + goto done; > >> + } > >> + } > >> + avnd_su_si_del(cb, &failed_comp->su->name); > >> + } > >> + else { > >> + /* terminate the failed comp */ > >> + if (m_AVND_SU_IS_PREINSTANTIABLE(failed_su)) { > >> + rc = avnd_comp_clc_fsm_run(cb, failed_comp, > AVND_COMP_CLC_PRES_FSM_EV_CLEANUP); > >> + if (NCSCC_RC_SUCCESS != rc) > >> + goto done; > >> + } > >> } > >> > >> done: > >> @@ -1216,7 +1266,10 @@ uint32_t avnd_err_restart_esc_level_2(AV > >> TRACE_ENTER(); > >> > >> /* first time in this level */ > >> - *esc_rcvr = AVSV_ERR_RCVR_SU_FAILOVER; > >> + if (su->sufailover) > >> + *esc_rcvr = AVSV_ERR_RCVR_SU_FAILOVER; > >> + else > >> + *esc_rcvr = SA_AMF_COMPONENT_FAILOVER; > >> > >> /* External components are not supposed to escalate SU Failover of > >> cluster components. For Ext component, SU Failover will > >> be limited to @@ -1278,7 +1331,10 @@ AVSV_ERR_RCVR > avnd_err_esc_su_failover(A > >> TRACE_ENTER(); > >> > >> /* initalize */ > >> - *esc_rcvr = AVSV_ERR_RCVR_SU_FAILOVER; > >> + if (su->sufailover) > >> + *esc_rcvr = AVSV_ERR_RCVR_SU_FAILOVER; > >> + else > >> + *esc_rcvr = SA_AMF_COMPONENT_FAILOVER; > >> > >> if (true == su->su_is_external) { > >> /* External component should not contribute to NODE > >> FAILOVER of cluster > >> > >> --------------------------------------------------------------------- > >> --------- How ServiceNow helps IT people transform IT departments: > >> 1. A cloud service to automate IT design, transition and operations > >> 2. Dashboards that offer high-level views of enterprise services 3. A > >> single system of record for all IT processes > >> http://p.sf.net/sfu/servicenow-d2d-j > >> _______________________________________________ > >> Opensaf-devel mailing list > >> [email protected] > >> https://lists.sourceforge.net/lists/listinfo/opensaf-devel ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
