>>Can you try this approach?

I am not getting this approach. We need not send Quisced for su 
failover. As of now, the approach taken looks easier for me.

 >> Handling SU failover it terminates all components of the SU and then 
sends a message to amfd.
As said before, I am not doing it for all red model as of now. As 
updated in ticket, we are only doing it for No and 2N Red models.

>>No this should not be needed. Please try the suggested approach below. I 
>>strongly think it is architectural wrong that the AMF ND cares about the SG 
>>redundancy model.

I should agree with it but as i said, i am not ready to implement in all 
red models and it may be required, when we support for all red models, 
may be we can remove it from amfnd.


Thanks
-Praveen

On 26-Jun-13 2:33 PM, Hans Feldt wrote:
>> -----Original Message-----
>> From: praveen malviya [mailto:[email protected]]
>> Sent: den 26 juni 2013 09:35
>> To: Hans Feldt
>> Cc: Hans Feldt; Mathivanan Naickan Palanivelu; [email protected];
>> [email protected]
>> Subject: Re: [devel] [PATCH 5 of 6] amf: support sufailover at amfnd [#98]
>>
>> Please see response inline.
>> Thanks
>> Praveen
>>
>> On 24-Jun-13 7:30 PM, Hans Feldt wrote:
>>> Hi,
>>>
>>> I think it should be safe to do SU failover for all redundancy models.
>>> It is a much easier operation than comp failover. It would simplify
>>> the patches, specially the AMF node director parts in 5/6 patches.
>>>
>>> Please explain (with consequences) why it is needed to know the SG
>>> redundancy model in the AMF node director.
>> These patches implements su-failover only for 2N and NoRed model. This is
>> the only reason red model is needed at amfnd.
>> Once the implementation will be done for all red model there will not be any
>> need to maintain red model at amfnd.
> Let's say amfnd does not care about SG redundancy model (which it should 
> not!). Handling SU failover it terminates all components of the SU and then 
> sends a message to amfd. If now amfd interpretes this as component failover 
> (as it does today) and sends a SUSI-MODIFY(QUIESCED) request to amfnd, that 
> can just be interpreted as a nop in amfnd.
>
> Can you try this approach?
>
> Should even work without any changes in amfd.
>
> Sounds much cleaner to me.
>
>> Regarding the implementation of su-failover  in all models,  this needs again
>> full assessment in other red models how errors are handled in all sg fsm
>> states and also unit testing effort will be too huge to implement in one go.
>>
>> Component fail-over is different in all red models. In future also red model
>> attribute needs to be maintained at amfnd in case we are implementing
>> component failover for any specific red model and not for all red model in
>> one go.
> No this should not be needed. Please try the suggested approach below. I 
> strongly think it is architectural wrong that the AMF ND cares about the SG 
> redundancy model.
>
> Thanks,
> Hans
>
>> Thanks,
>> Praveen
>>> Thanks,
>>> Hans
>>>
>>>
>>> On 7 June 2013 08:39,  <[email protected]> wrote:
>>>>    osaf/services/saf/avsv/avnd/avnd_err.c |  150
>> ++++++++++++++++++++++----------
>>>>    1 files changed, 103 insertions(+), 47 deletions(-)
>>>>
>>>>
>>>>    This patch handles compfailover and sufailover in comformance with the
>> AMF-B.04.01 spec at amfnd. Currently only 2N model and NoRed models are
>> supported.  For other models, saAmfSUFailover will be ignored and
>> compFailover will be performed. During suFailover SU will be disabled and all
>> comps will be abruptly terminated. Also handles the case when
>> saAmfSUFailover is true and Nodswitchover gets escalated.
>>>> diff --git a/osaf/services/saf/avsv/avnd/avnd_err.c
>>>> b/osaf/services/saf/avsv/avnd/avnd_err.c
>>>> --- a/osaf/services/saf/avsv/avnd/avnd_err.c
>>>> +++ b/osaf/services/saf/avsv/avnd/avnd_err.c
>>>> @@ -401,8 +401,15 @@ uint32_t avnd_err_escalate(AVND_CB *cb,
>>>>                   *io_esc_rcvr = comp->err_info.def_rec;
>>>>
>>>>           /* disallow comp-restart if it's disabled */
>>>> -       if ((SA_AMF_COMPONENT_RESTART == *io_esc_rcvr) &&
>> m_AVND_COMP_IS_RESTART_DIS(comp))
>>>> +       if ((SA_AMF_COMPONENT_RESTART == *io_esc_rcvr) &&
>> m_AVND_COMP_IS_RESTART_DIS(comp) && (!su->is_ncs)) {
>>>> +               LOG_NO("saAmfCompDisableRestart is true for '%s'",comp-
>>> name.value);
>>>> +               *io_esc_rcvr = SA_AMF_COMPONENT_FAILOVER;
>>>> +       }
>>>> +
>>>> +       if ((SA_AMF_COMPONENT_FAILOVER== *io_esc_rcvr) && (su-
>>> sufailover) && (!su->is_ncs)) {
>>>> +               LOG_NO("saAmfSUFailover is true for
>>>> + '%s'",comp->su->name.value);
>>>>                   *io_esc_rcvr = AVSV_ERR_RCVR_SU_FAILOVER;
>>>> +       }
>>>>
>>>>           switch (*io_esc_rcvr) {
>>>>           case SA_AMF_COMPONENT_FAILOVER: /* treat it as su failover
>>>> */ @@ -519,7 +526,6 @@ uint32_t avnd_err_recover(AVND_CB *cb, A
>>>>                   break;
>>>>
>>>>           case SA_AMF_COMPONENT_FAILOVER:
>>>> -               /* not supported */
>>>>                   rc = avnd_err_rcvr_comp_failover(cb, comp);
>>>>                   break;
>>>>
>>>> @@ -671,45 +677,21 @@ uint32_t avnd_err_rcvr_su_restart(AVND_C
>>>>           return rc;
>>>>    }
>>>>
>>>> -
>> /**********************************************************
>> ******************
>>>> -  Name          : avnd_err_rcvr_comp_failover
>>>> -
>>>> -  Description   : This routine executes component failover recovery.
>>>> -
>>>> -  Arguments     : cb   - ptr to the AvND control block
>>>> -                  comp - ptr to the comp
>>>> -
>>>> -  Return Values : NCSCC_RC_SUCCESS/NCSCC_RC_FAILURE.
>>>> -
>>>> -  Notes         : None.
>>>> -
>> **********************************************************
>> **********
>>>> **********/ -uint32_t avnd_err_rcvr_comp_failover(AVND_CB *cb,
>>>> AVND_COMP *comp)
>>>> +/**
>>>> + * This function performs component failover recovery action.
>>>> + *
>>>> + * @param cb: ptr to AvND contol block.
>>>> + * @param comp: ptr to failed component.
>>>> + *
>>>> + * @return NCSCC_RC_SUCCESS/NCSCC_RC_FAILURE.
>>>> + */
>>>> +uint32_t avnd_err_rcvr_comp_failover(AVND_CB *cb, AVND_COMP
>>>> +*failed_comp)
>>>>    {
>>>>           uint32_t rc = NCSCC_RC_SUCCESS;
>>>> -       LOG_NO("%s, Unsupported",__FUNCTION__);
>>>> +       AVND_SU *su;
>>>>
>>>> -       return rc;
>>>> -}
>>>> -
>>>> -
>> /**********************************************************
>> ******************
>>>> -  Name          : avnd_err_rcvr_su_failover
>>>> -
>>>> -  Description   : This routine executes SU failover recovery.
>>>> -
>>>> -  Arguments     : cb          - ptr to the AvND control block
>>>> -                  su          - ptr to the SU to which the comp belongs
>>>> -                  failed_comp - ptr to the failed comp that triggered this
>>>> -                                recovery
>>>> -
>>>> -  Return Values : NCSCC_RC_SUCCESS/NCSCC_RC_FAILURE.
>>>> -
>>>> -  Notes         : None.
>>>> -
>> **********************************************************
>> **********
>>>> **********/ -uint32_t avnd_err_rcvr_su_failover(AVND_CB *cb,
>> AVND_SU
>>>> *su, AVND_COMP *failed_comp) -{
>>>> -       uint32_t rc = NCSCC_RC_SUCCESS;
>>>> -       TRACE_ENTER();
>>>> -
>>>> +       TRACE_ENTER2("'%s'", failed_comp->name.value);
>>>> +       su = failed_comp->su;
>>>>           /* mark the comp failed */
>>>>           m_AVND_COMP_FAILED_SET(failed_comp);
>>>>           m_AVND_SEND_CKPT_UPDT_ASYNC_UPDT(cb, failed_comp,
>>>> AVND_CKPT_COMP_FLAG_CHANGE); @@ -732,7 +714,7 @@ uint32_t
>> avnd_err_rcvr_su_failover(AVND_
>>>>           m_AVND_SEND_CKPT_UPDT_ASYNC_UPDT(cb, su,
>>>> AVND_CKPT_SU_OPER_STATE);
>>>>
>>>>           /* inform AvD */
>>>> -       rc = avnd_di_oper_send(cb, su, AVSV_ERR_RCVR_SU_FAILOVER);
>>>> +       rc = avnd_di_oper_send(cb, su, SA_AMF_COMPONENT_FAILOVER);
>>>>
>>>>           /*
>>>>            *  su-sis may be in assigning/removing state. signal csi @@
>>>> -763,6 +745,52 @@ uint32_t avnd_err_rcvr_su_failover(AVND_
>>>>           return rc;
>>>>    }
>>>>
>>>> +/**
>>>> + * This function performs SU failover recovery action.
>>>> + *
>>>> + * @param cb: ptr to AvND contol block.
>>>> + * @param su: ptr to the SU which contains the failed component.
>>>> + * @param comp: ptr to failed component.
>>>> + *
>>>> + * @return NCSCC_RC_SUCCESS/NCSCC_RC_FAILURE.
>>>> + */
>>>> +uint32_t avnd_err_rcvr_su_failover(AVND_CB *cb, AVND_SU *su,
>>>> +AVND_COMP *failed_comp) {
>>>> +       AVND_COMP *comp;
>>>> +       uint32_t rc = NCSCC_RC_SUCCESS;
>>>> +
>>>> +
>>>> +       TRACE_ENTER2("'%s' '%s'", su->name.value, failed_comp-
>>> name.value);
>>>> +       if ((su->sg_redundancy_model !=
>> SA_AMF_2N_REDUNDANCY_MODEL) &&
>>>> +                       (su->sg_redundancy_model !=
>> SA_AMF_NO_REDUNDANCY_MODEL)) {
>>>> +               rc = avnd_err_rcvr_comp_failover(cb, failed_comp);
>>>> +               goto done;
>>>> +       }
>>>> +       m_AVND_COMP_FAILED_SET(failed_comp);
>>>> +       m_AVND_COMP_OPER_STATE_SET(failed_comp,
>> SA_AMF_OPERATIONAL_DISABLED);
>>>> +       m_AVND_SU_FAILED_SET(su);
>>>> +       m_AVND_SU_OPER_STATE_SET(su,
>> SA_AMF_OPERATIONAL_DISABLED);
>>>> +
>>>> +       LOG_NO("Terminating components of '%s'(abruptly &
>> unordered)",su->name.value);
>>>> +       /* Unordered cleanup of components of failed SU */
>>>> +       for (comp =
>> m_AVND_COMP_FROM_SU_DLL_NODE_GET(m_NCS_DBLIST_FIND_FIRST(
>> &su->comp_list));
>>>> +                       comp;
>>>> +                       comp =
>> m_AVND_COMP_FROM_SU_DLL_NODE_GET(m_NCS_DBLIST_FIND_NEXT(
>> &comp->su_dll_node))) {
>>>> +               if (comp->su->su_is_external)
>>>> +                       continue;
>>>> +
>>>> +               rc = avnd_comp_clc_fsm_run(cb, comp,
>> AVND_COMP_CLC_PRES_FSM_EV_CLEANUP);
>>>> +               if (NCSCC_RC_SUCCESS != rc) {
>>>> +                       LOG_ER("'%s' termination failed", 
>>>> comp->name.value);
>>>> +                       goto done;
>>>> +               }
>>>> +       }
>>>> +done:
>>>> +
>>>> +       TRACE_LEAVE2("%u", rc);
>>>> +       return rc;
>>>> +}
>>>> +
>>>>
>> /**********************************************************
>> ******************
>>>>      Name          : avnd_err_rcvr_node_switchover
>>>>
>>>> @@ -781,7 +809,7 @@ uint32_t avnd_err_rcvr_node_switchover(A
>>>>    {
>>>>           uint32_t rc = NCSCC_RC_SUCCESS;
>>>>           TRACE_ENTER();
>>>> -
>>>> +       AVND_COMP *comp;
>>>>           /* increase log level to info */
>>>>           setlogmask(LOG_UPTO(LOG_INFO));
>>>>
>>>> @@ -836,11 +864,33 @@ uint32_t avnd_err_rcvr_node_switchover(A
>>>>           if (NCSCC_RC_SUCCESS != rc)
>>>>                   goto done;
>>>>
>>>> -       /* terminate the failed comp */
>>>> -       if (m_AVND_SU_IS_PREINSTANTIABLE(failed_su)) {
>>>> -               rc = avnd_comp_clc_fsm_run(cb, failed_comp,
>> AVND_COMP_CLC_PRES_FSM_EV_CLEANUP);
>>>> -               if (NCSCC_RC_SUCCESS != rc)
>>>> -                       goto done;
>>>> +       if (m_AVND_SU_IS_FAILED(failed_comp->su) && (failed_comp->su-
>>> sufailover) &&
>>>> +                       ((failed_comp->su->sg_redundancy_model ==
>> SA_AMF_NO_REDUNDANCY_MODEL) ||
>>>> +                        (failed_comp->su->sg_redundancy_model ==
>> SA_AMF_2N_REDUNDANCY_MODEL)))
>>>> +       {
>>>> +               LOG_NO("Terminating components of '%s'(abruptly &
>> unordered)",failed_su->name.value);
>>>> +               /* Unordered cleanup of components of failed SU */
>>>> +               for (comp =
>> m_AVND_COMP_FROM_SU_DLL_NODE_GET(m_NCS_DBLIST_FIND_FIRST(
>> &failed_su->comp_list));
>>>> +                               comp;
>>>> +                               comp =
>> m_AVND_COMP_FROM_SU_DLL_NODE_GET(m_NCS_DBLIST_FIND_NEXT(
>> &comp->su_dll_node))) {
>>>> +                       if (comp->su->su_is_external)
>>>> +                               continue;
>>>> +
>>>> +                       rc = avnd_comp_clc_fsm_run(cb, comp,
>> AVND_COMP_CLC_PRES_FSM_EV_CLEANUP);
>>>> +                       if (NCSCC_RC_SUCCESS != rc) {
>>>> +                               LOG_ER("'%s' termination failed", 
>>>> comp->name.value);
>>>> +                               goto done;
>>>> +                       }
>>>> +               }
>>>> +               avnd_su_si_del(cb, &failed_comp->su->name);
>>>> +       }
>>>> +       else {
>>>> +               /* terminate the failed comp */
>>>> +               if (m_AVND_SU_IS_PREINSTANTIABLE(failed_su)) {
>>>> +                       rc = avnd_comp_clc_fsm_run(cb, failed_comp,
>> AVND_COMP_CLC_PRES_FSM_EV_CLEANUP);
>>>> +                       if (NCSCC_RC_SUCCESS != rc)
>>>> +                               goto done;
>>>> +               }
>>>>           }
>>>>
>>>>     done:
>>>> @@ -1216,7 +1266,10 @@ uint32_t avnd_err_restart_esc_level_2(AV
>>>>           TRACE_ENTER();
>>>>
>>>>           /* first time in this level */
>>>> -       *esc_rcvr = AVSV_ERR_RCVR_SU_FAILOVER;
>>>> +       if (su->sufailover)
>>>> +               *esc_rcvr = AVSV_ERR_RCVR_SU_FAILOVER;
>>>> +       else
>>>> +               *esc_rcvr = SA_AMF_COMPONENT_FAILOVER;
>>>>
>>>>           /* External components are not supposed to escalate SU Failover 
>>>> of
>>>>              cluster components. For Ext component, SU Failover will
>>>> be limited to @@ -1278,7 +1331,10 @@ AVSV_ERR_RCVR
>> avnd_err_esc_su_failover(A
>>>>           TRACE_ENTER();
>>>>
>>>>           /* initalize */
>>>> -       *esc_rcvr = AVSV_ERR_RCVR_SU_FAILOVER;
>>>> +       if (su->sufailover)
>>>> +               *esc_rcvr = AVSV_ERR_RCVR_SU_FAILOVER;
>>>> +       else
>>>> +               *esc_rcvr = SA_AMF_COMPONENT_FAILOVER;
>>>>
>>>>           if (true == su->su_is_external) {
>>>>                   /* External component should not contribute to NODE
>>>> FAILOVER of cluster
>>>>
>>>> ---------------------------------------------------------------------
>>>> --------- How ServiceNow helps IT people transform IT departments:
>>>> 1. A cloud service to automate IT design, transition and operations
>>>> 2. Dashboards that offer high-level views of enterprise services 3. A
>>>> single system of record for all IT processes
>>>> http://p.sf.net/sfu/servicenow-d2d-j
>>>> _______________________________________________
>>>> Opensaf-devel mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel


------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to