Please see inline.

Thanks,
Praveen
On 21-Feb-14 2:38 AM, William R Elliott wrote:
> We are Using opensaf 4.4.RC1
>
> We have a SU with 4 components (no redundancy SG): 1 "AMQMRate" component and 
> 3 "Rater" components.  The AMQMRate component has a recovery mode of 3 and 
> the Rater components have a recovery mode of 2.  Both component types have a 
> component category of 9. The AMQMRate component has a instantiation level of 
> 1.  The Rater component type has a instantiation level of 2.  The same script 
> is defined for the terminate and cleanup script.
>
> When we perform a lock for instantiate on the SU,  the behavior is 
> unpredictable.
> e.g. 2 raters will receive the terminate and then have cleanup called
>           PFSaAmfMgr.cc : 579The amfTerminateCallback() was received
>           ========== amfRaterComp1.5.2 CLEANUP STARTED at Thu Feb 20 15:09:13 
> EST 2014 ============
In LOCK_IN operation components will terminated in reverse order of 
instantiation level. So first Raters will be terminated and then 
AMQMRate component.
When the terminate callback was called for Rater either it crashed 
before giving response to AMF or it did not respond within the callback 
timeout limit.
A component has to call saAmfResponse() in terminate callback. As 
specified, cleanup script was called after terminate callback, which 
means callback got timed out or component
crashed or exited before invoking saAmfResponse() . In these situation, 
AMF will treat such a component as a faulty and it will cleanup it by 
calling cleanup script.
> 1 rater and the amqm, do not have their cleanup called :
>            PFSaAmfMgr.cc : 579The amfTerminateCallback() was received
>            ERROR! The saAmfHealthcheckStop() API call received a returncode 
> of 12
Here in health check stop return code is 12 which is 
SA_AIS_ERR_NOT_EXIST. As per AMF spec, it returns this code :
" Either one or both of the cases that follow apply:
• The Availability Management Framework is not aware of a component 
designated
by the name to which compName points.
• No healthcheck has been started for the component designated by the 
name to
which compName points and for the key to which healthcheckKey refers."
I think here case2 may be applicable.
> We have also seen where 3 components come down cleanly and one does not.
>
>
> We are also confused by what we see in the node director trace.   We can see 
> that we are terminating rater 1.2.1 but then we switch to amqm rater in the 
> call to avnd_su_pres_terming_compuninst_hdler.  We did not think that we 
> would be changing components within that code.  We are confused.
>
> Feb 19 14:52:00.573921 osafamfnd [5039:clc.cc:0858] T1 
> 'safComp=amfRaterComp1.2.1,safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp':FSM
>  Enter presence state: 'SA_AMF_PRESENCE_TERMINATING(4)':FSM Exit presence 
> state:SA_AMF_PRESENCE_UNINSTANTIATED(1)
> Feb 19 14:52:00.573933 osafamfnd [5039:clc.cc:0891] >> 
> avnd_comp_clc_st_chng_prc: Comp 
> 'safComp=amfRaterComp1.2.1,safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp',
>  Prv_state '4', Final_state '1'
> Feb 19 14:52:00.573945 osafamfnd [5039:clc.cc:0954] TR SU and Comp Preinst. 
> comp->su->flag '4098', comp->flag '25'
> Feb 19 14:52:00.573956 osafamfnd [5039:susm.cc:1356] >> avnd_su_pres_fsm_run: 
> 'safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp'
> ...
> Feb 19 14:52:00.573968 osafamfnd [5039:susm.cc:1361] T1 Entering SU presence 
> state FSM: current state: 4, event: 8, su 
> name:safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp
> Feb 19 14:52:00.573979 osafamfnd [5039:susm.cc:2383] >> 
> avnd_su_pres_terming_compuninst_hdler: Component Uninstantiated event in the 
> Terminating state:'safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp' : 
> 'safComp=amfRaterComp1.2.1,safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp'
> Feb 19 14:52:00.573991 osafamfnd [5039:susm.cc:2409] TR PI SU
> Feb 19 14:52:00.574002 osafamfnd [5039:susm.cc:2419] TR Running the component 
> clc FSM
> Feb 19 14:52:00.574012 osafamfnd [5039:clc.cc:0763] >> avnd_comp_clc_fsm_run: 
> Comp 
> 'safComp=amfAMQMRaterComp1.2.1,safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp',
>  Ev '4'
> Feb 19 14:52:00.574024 osafamfnd [5039:clc.cc:0817] TR stopping all 
> monitoring for this component
> Feb 19 14:52:00.574035 osafamfnd [5039:cpm.cc:0634] >> avnd_comp_pm_finalize: 
> Comp 
> 'safComp=amfAMQMRaterComp1.2.1,safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp'
> Feb 19 14:52:00.574046 osafamfnd [5039:cpm.cc:0650] << avnd_comp_pm_finalize
> Feb 19 14:52:00.574057 osafamfnd [5039:clc.cc:0835] T1 
> 'safComp=amfAMQMRaterComp1.2.1,safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp':Entering
>  CLC FSM: presence state:'SA_AMF_PRESENCE_INSTANTIATED(3)', 
> Event:'AVND_COMP_CLC_PRES_FSM_EV_TERM'
The traces are correct. As amfRaterComp1.2.1 got terminated or cleaned 
up successfully, AMF is picking afComp=amfAMQMRaterComp1.2.1 to 
terminate it. As told above
AMF will terminate components by picking them one by one in the reverse 
order of their instantiation level.

Following things can be checked:
1) In terminate callback saAmfResponse() within the configured time limit.
2) Stop healthcheck only if it was started.

> We would appreciate any help! Thanks
>
>
> William Elliott | Sr. Analyst, Forward R&D | Office: +1 (800) 519-8360 Ext. 
> 3859 | Mobile: +1 (407) 619-0337 |--www.NetCracker.com--
> Proven Partner to Communications Service Providers
>
>
>
>
> ________________________________
> The information transmitted herein is intended only for the person or entity 
> to which it is addressed and may contain confidential, proprietary and/or 
> privileged material. Any review, retransmission, dissemination or other use 
> of, or taking of any action in reliance upon, this information by persons or 
> entities other than the intended recipient is prohibited. If you received 
> this in error, please contact the sender and delete the material from any 
> computer.
> ------------------------------------------------------------------------------
> Managing the Performance of Cloud-Based Applications
> Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
> Read the Whitepaper.
> http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
> _______________________________________________
> Opensaf-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-users


------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to