Hi Praveen,

This patch has also fixed the coredump in the other tests are failing in 
test report of #1725 part 1, which  are 14, 64, 68, 84, 124, 128
In the above test cases, still get "ER avd_sg_su_oper_list_del: su not 
found".
Can we change ER to WA?
Ack from me with this minor comment.

Thanks,
Minh

On 11/05/16 02:26, praveen.malv...@oracle.com wrote:
>   osaf/services/saf/amf/amfd/sg_2n_fsm.cc |  24 +++++++++++++++++++-----
>   1 files changed, 19 insertions(+), 5 deletions(-)
>
>
> In the reported problem, AMFND asserted when SU was unlocked.
>
> For complete analysis, please refer ticket. In short, when AMFND was removing
> the assignments, it gets a duplicate removal of assignment for the same SU 
> because
> of reboot of node hosting the active su. This duplicate message gets buffered 
> and is picked
> up when ongoing removal completes. After completion of ongoing removal of 
> assignment, AMFND picks
> buffered assignment and sets assignment related flags. Since SUSIs were 
> deleted during previos
> removal, no callbacks processing and response to AMFD is done for it. During 
> response to AMFD,
> AMFND resets all assignment related flags and it remained undone for buffered 
> assignments.
> Later on when SU was unlocked and fresh assignments were given to it. After 
> completion of callback
> when AMFND tries to respond to AMFND expects valid SI pointer for fresh 
> assignment and checks it through
> a assert statement. Here AMFND asserts because of side effects of assignment 
> related flags being set.
>
> Patch fixes the problem by avoiding sending duplicate removal of assignments 
> to AMFND.
>
> diff --git a/osaf/services/saf/amf/amfd/sg_2n_fsm.cc 
> b/osaf/services/saf/amf/amfd/sg_2n_fsm.cc
> --- a/osaf/services/saf/amf/amfd/sg_2n_fsm.cc
> +++ b/osaf/services/saf/amf/amfd/sg_2n_fsm.cc
> @@ -3339,9 +3339,7 @@ void SG_2N::node_fail(AVD_CL_CB *cb, AVD
>   
>               if ((avd_su_state_determine(su) != SA_AMF_HA_STANDBY) &&
>                   !((avd_su_state_determine(su) == SA_AMF_HA_QUIESCED) &&
> -                   (avd_su_fsm_state_determine(su) == AVD_SU_SI_STATE_UNASGN)
> -                 )
> -                 ) {
> +                   (avd_su_fsm_state_determine(su) == 
> AVD_SU_SI_STATE_UNASGN))) {
>                       /* SU is not standby */
>                       a_susi = avd_sg_2n_act_susi(cb, su->sg_of_su, &s_susi);
>   
> @@ -3388,11 +3386,27 @@ void SG_2N::node_fail(AVD_CL_CB *cb, AVD
>                               } else {
>                                       /* the other SU has quiesced or standby 
> assigned and is in the
>                                        * operation list and is out of service.
> -                                      * Send a D2N-INFO_SU_SI_ASSIGN with 
> remove all to that SU.
> +                                      * Send a D2N-INFO_SU_SI_ASSIGN with 
> remove all to that SU
> +                                      * if not sent already.
>                                        * Remove this SU from operation list. 
> Free the
>                                        * SU SI relationships of this SU.
>                                        */
> -                                     avd_sg_su_si_del_snd(cb, o_su);
> +
> +
> +                                     /*
> +                                        As mentioned above other su (o_su) 
> is OOS for quiesced or
> +                                        standby state, it means some admin 
> operation is going on it or
> +                                        it has faulted (su level) which led 
> to OOS.
> +                                        In this function, we are processing 
> node_fail of active/quiesced
> +                                        su. These active/quiesced 
> assignments will be deleted because of
> +                                        node fault and also other su cannot 
> be made active as it is OOS.
> +                                        So AMF will have to remove 
> assignments of other su (o_su) also.
> +                                        Since o_su is OOS, there is a 
> possibility that AMF would have
> +                                        sent deletion of assignment to it 
> because of admin op or fault.
> +                                        If not sent then send it now.
> +                                      */
> +                                     if (all_unassigned(o_su) == false)
> +                                             avd_sg_su_si_del_snd(cb, o_su);
>                                       su->delete_all_susis();
>                                       avd_sg_su_oper_list_del(cb, su, false);
>                                       m_AVD_CHK_OPLIST(o_su, flag);
>


------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to