Hi Minh, Thanks for reviewing. I will change Error to Warning before pushing.
Thanks, Praveen On 09-Aug-16 4:48 AM, minh chau wrote: > Hi Praveen, > > This patch has also fixed the coredump in the other tests are failing in > test report of #1725 part 1, which are 14, 64, 68, 84, 124, 128 > In the above test cases, still get "ER avd_sg_su_oper_list_del: su not > found". > Can we change ER to WA? > Ack from me with this minor comment. > > Thanks, > Minh > > On 11/05/16 02:26, praveen.malv...@oracle.com wrote: >> osaf/services/saf/amf/amfd/sg_2n_fsm.cc | 24 +++++++++++++++++++----- >> 1 files changed, 19 insertions(+), 5 deletions(-) >> >> >> In the reported problem, AMFND asserted when SU was unlocked. >> >> For complete analysis, please refer ticket. In short, when AMFND was >> removing >> the assignments, it gets a duplicate removal of assignment for the >> same SU because >> of reboot of node hosting the active su. This duplicate message gets >> buffered and is picked >> up when ongoing removal completes. After completion of ongoing removal >> of assignment, AMFND picks >> buffered assignment and sets assignment related flags. Since SUSIs >> were deleted during previos >> removal, no callbacks processing and response to AMFD is done for it. >> During response to AMFD, >> AMFND resets all assignment related flags and it remained undone for >> buffered assignments. >> Later on when SU was unlocked and fresh assignments were given to it. >> After completion of callback >> when AMFND tries to respond to AMFND expects valid SI pointer for >> fresh assignment and checks it through >> a assert statement. Here AMFND asserts because of side effects of >> assignment related flags being set. >> >> Patch fixes the problem by avoiding sending duplicate removal of >> assignments to AMFND. >> >> diff --git a/osaf/services/saf/amf/amfd/sg_2n_fsm.cc >> b/osaf/services/saf/amf/amfd/sg_2n_fsm.cc >> --- a/osaf/services/saf/amf/amfd/sg_2n_fsm.cc >> +++ b/osaf/services/saf/amf/amfd/sg_2n_fsm.cc >> @@ -3339,9 +3339,7 @@ void SG_2N::node_fail(AVD_CL_CB *cb, AVD >> if ((avd_su_state_determine(su) != SA_AMF_HA_STANDBY) && >> !((avd_su_state_determine(su) == SA_AMF_HA_QUIESCED) && >> - (avd_su_fsm_state_determine(su) == AVD_SU_SI_STATE_UNASGN) >> - ) >> - ) { >> + (avd_su_fsm_state_determine(su) == >> AVD_SU_SI_STATE_UNASGN))) { >> /* SU is not standby */ >> a_susi = avd_sg_2n_act_susi(cb, su->sg_of_su, &s_susi); >> @@ -3388,11 +3386,27 @@ void SG_2N::node_fail(AVD_CL_CB *cb, AVD >> } else { >> /* the other SU has quiesced or standby assigned >> and is in the >> * operation list and is out of service. >> - * Send a D2N-INFO_SU_SI_ASSIGN with remove all >> to that SU. >> + * Send a D2N-INFO_SU_SI_ASSIGN with remove all >> to that SU >> + * if not sent already. >> * Remove this SU from operation list. Free the >> * SU SI relationships of this SU. >> */ >> - avd_sg_su_si_del_snd(cb, o_su); >> + >> + >> + /* >> + As mentioned above other su (o_su) is OOS for >> quiesced or >> + standby state, it means some admin operation >> is going on it or >> + it has faulted (su level) which led to OOS. >> + In this function, we are processing node_fail >> of active/quiesced >> + su. These active/quiesced assignments will be >> deleted because of >> + node fault and also other su cannot be made >> active as it is OOS. >> + So AMF will have to remove assignments of >> other su (o_su) also. >> + Since o_su is OOS, there is a possibility that >> AMF would have >> + sent deletion of assignment to it because of >> admin op or fault. >> + If not sent then send it now. >> + */ >> + if (all_unassigned(o_su) == false) >> + avd_sg_su_si_del_snd(cb, o_su); >> su->delete_all_susis(); >> avd_sg_su_oper_list_del(cb, su, false); >> m_AVD_CHK_OPLIST(o_su, flag); >> > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel