Can we conclude on the rest of this defect?

Seems like the behaviour when autorepair is false is not correct. Here's a cut 
and paste of your previous comments:

Praveen:

"Some initial comments/observations when repair related attributes are false:
   1) If one of the components enters TERM_FAILED state, then it leads to
termination of all other healthy components as well.
       This is old behavior, but it leads to service outage for work
loads assigned to these healthy components also.

2) In one of the cases like lock on SU, when a component is getting
quiesced assignments and its faults leads it to termination
       failure. AMF is still removing the assignments and performing
failover.
    Reason: AMFND sends assignment responses to AMFD in faults also. AMFD
processes assignment message and
     sends removal of assignments to AMFND and it goes on.
     In the patches su_oper_event for disabled SU is blocked in
TERM_FAILED state. But same should be done
     for response of the assignments. Other way is such assignment
responses  can be dropped at AMFD if it sees SU in TERM_FAILED state like:"

and:

"The problem of SG becoming unstable also occurs during si-swap admin
operation and faults during its execution.
During si-swap, if  the component which is receiving active assignments
faults and its fault leads to TERM_FAILED state, then SG remains
unstable. In the case one SU remains in Quiesced state which another SU
is disabled with  having active assignments. Since SG is unstable admin
repair does not
work."

Nags:

"I have tested with saAmfNodeAutoRepair and 
saAmfNodeFailfastOnTerminationFailure both set to false.
When I locked Act SU and csi timeout and cleanup script return error, then 
admin command just hangs.
Further admin command returns error as " Admin operation is already going on "."

Thanks,
Hans

On 02/28/2014 08:54 AM, Hans Feldt wrote:
>   osaf/services/saf/amf/amfnd/clc.cc  |   3 +-
>   osaf/services/saf/amf/amfnd/su.cc   |   1 -
>   osaf/services/saf/amf/amfnd/susm.cc |  45 
> +++++-------------------------------
>   3 files changed, 8 insertions(+), 41 deletions(-)
>
>
> Problem: possible split brain on application level and spec violation.
>
> Analysis: The AMF node director requests a comp/SU failover from the AMF
> director despite that a comp is in TERM-FAILED presence state.
>
> Change: Correct this behavior and just disable the SU and let the AMF director
> handle possible node reboot or manual repair.
>
> diff --git a/osaf/services/saf/amf/amfnd/clc.cc 
> b/osaf/services/saf/amf/amfnd/clc.cc
> --- a/osaf/services/saf/amf/amfnd/clc.cc
> +++ b/osaf/services/saf/amf/amfnd/clc.cc
> @@ -927,8 +927,7 @@ uint32_t avnd_comp_clc_st_chng_prc(AVND_
>       }
>
>       if ((SA_AMF_PRESENCE_RESTARTING == prv_st) &&
> -             ((SA_AMF_PRESENCE_INSTANTIATION_FAILED == final_st) ||
> -              (SA_AMF_PRESENCE_TERMINATION_FAILED == final_st))) {
> +                     (SA_AMF_PRESENCE_INSTANTIATION_FAILED == final_st)) {
>               avnd_instfail_su_failover(cb, comp->su, comp);
>       }
>
> diff --git a/osaf/services/saf/amf/amfnd/su.cc 
> b/osaf/services/saf/amf/amfnd/su.cc
> --- a/osaf/services/saf/amf/amfnd/su.cc
> +++ b/osaf/services/saf/amf/amfnd/su.cc
> @@ -519,7 +519,6 @@ uint32_t avnd_evt_su_admin_op_req(AVND_C
>
>               m_AVND_SU_STATE_RESET(su);
>               m_AVND_SU_OPER_STATE_SET(su, SA_AMF_OPERATIONAL_ENABLED);
> -             avnd_di_uns32_upd_send(AVSV_SA_AMF_SU, saAmfSUOperState_ID, 
> &su->name, su->oper);
>               avnd_su_pres_state_set(su, SA_AMF_PRESENCE_UNINSTANTIATED);
>               rc = avnd_di_oper_send(cb, su, 0);
>
> diff --git a/osaf/services/saf/amf/amfnd/susm.cc 
> b/osaf/services/saf/amf/amfnd/susm.cc
> --- a/osaf/services/saf/amf/amfnd/susm.cc
> +++ b/osaf/services/saf/amf/amfnd/susm.cc
> @@ -1529,9 +1529,7 @@ uint32_t avnd_su_pres_st_chng_prc(AVND_C
>                               goto done;
>               }
>
> -             /* instantiating -> term-failed */
> -             if ((SA_AMF_PRESENCE_INSTANTIATING == prv_st) &&
> -                             (SA_AMF_PRESENCE_TERMINATION_FAILED == 
> final_st)) {
> +             if (final_st == SA_AMF_PRESENCE_TERMINATION_FAILED) {
>                       TRACE("SU Instantiating -> Termination Failed");
>                       m_AVND_SU_OPER_STATE_SET(su, 
> SA_AMF_OPERATIONAL_DISABLED);
>                       m_AVND_SEND_CKPT_UPDT_ASYNC_UPDT(cb, su, 
> AVND_CKPT_SU_OPER_STATE);
> @@ -1558,27 +1556,6 @@ uint32_t avnd_su_pres_st_chng_prc(AVND_C
>                       else
>                               TRACE("SU oper state is disabled");
>               }
> -
> -             /* terminating -> term-failed */
> -             if (((prv_st == SA_AMF_PRESENCE_RESTARTING) || 
> (SA_AMF_PRESENCE_TERMINATING == prv_st))
> -                             && (SA_AMF_PRESENCE_TERMINATION_FAILED == 
> final_st)) {
> -                     TRACE("Terminating -> Termination Failed");
> -                     if (sufailover_in_progress(su)) {
> -                             /*Do not reset any flag, this will be done as a 
> part of repair.*/
> -                             rc = avnd_di_oper_send(cb, su, 
> AVSV_ERR_RCVR_SU_FAILOVER);
> -                             osafassert(NCSCC_RC_SUCCESS == rc);
> -                             avnd_su_si_del(avnd_cb, &su->name);
> -                             goto done;
> -                     }
> -                     m_AVND_SU_OPER_STATE_SET(su, 
> SA_AMF_OPERATIONAL_DISABLED);
> -                     m_AVND_SEND_CKPT_UPDT_ASYNC_UPDT(cb, su, 
> AVND_CKPT_SU_OPER_STATE);
> -                     /* inform AvD about oper state change */
> -                     rc = avnd_di_oper_send(cb, su, 
> SA_AMF_COMPONENT_FAILOVER);
> -                     if (NCSCC_RC_SUCCESS != rc)
> -                             goto done;
> -
> -             }
> -
>       }
>
>       /* npi su */
> @@ -1650,22 +1627,14 @@ uint32_t avnd_su_pres_st_chng_prc(AVND_C
>                       }
>               }
>
> -             /* terminating/instantiated/restarting -> term-failed */
> -             if (((SA_AMF_PRESENCE_TERMINATING == prv_st) ||
> -                  (SA_AMF_PRESENCE_INSTANTIATED == prv_st) ||
> -                  (SA_AMF_PRESENCE_RESTARTING == prv_st)) && 
> (SA_AMF_PRESENCE_TERMINATION_FAILED == final_st)) {
> -                     TRACE("Terminating/Instantiated/Restarting -> 
> Termination Failed");
> -                      if (sufailover_in_progress(su)) {
> -                                /*Do not reset any flag, this will be done 
> as a part of repair.*/
> -                                rc = avnd_di_oper_send(cb, su, 
> AVSV_ERR_RCVR_SU_FAILOVER);
> -                                osafassert(NCSCC_RC_SUCCESS == rc);
> -                                avnd_su_si_del(avnd_cb, &su->name);
> -                                goto done;
> -                        }
> +             /* xxx -> term-failed */
> +             if (final_st == SA_AMF_PRESENCE_TERMINATION_FAILED) {
>                       m_AVND_SU_OPER_STATE_SET(su, 
> SA_AMF_OPERATIONAL_DISABLED);
>                       m_AVND_SEND_CKPT_UPDT_ASYNC_UPDT(cb, su, 
> AVND_CKPT_SU_OPER_STATE);
> -                     /* inform AvD about oper state change */
> -                     rc = avnd_di_oper_send(cb, su, 
> SA_AMF_COMPONENT_FAILOVER);
> +                     /* Don't send su-oper state msg, just update su oper 
> state
> +                      * AMF has lost control over this component and the 
> operator needs
> +                      * to repair this node. Failover is not possible in 
> this state. */
> +                     avnd_di_uns32_upd_send(AVSV_SA_AMF_SU, 
> saAmfSUOperState_ID, &su->name, su->oper);
>
>                       /* si assignment/removal failed.. inform AvD */
>                       rc = avnd_di_susi_resp_send(cb, su, 
> m_AVND_SU_IS_ALL_SI(su) ? 0 : si);
>
> ------------------------------------------------------------------------------
> Flow-based real-time traffic analytics software. Cisco certified tool.
> Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
> Customize your own dashboards, set traffic alerts and generate reports.
> Network behavioral analysis & security monitoring. All-in-one tool.
> http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
> _______________________________________________
> Opensaf-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to