A couple more questions

thanks

William Elliott | Sr. Analyst, Forward R&D | Office: +1 (407) 708-5059 Ext. 
3859 | Mobile: +1 (407) 619-0337 |--www.NetCracker.com--
Proven Partner to Communications Service Providers

-----Original Message-----
From: Lisa Ann Lentz-Liddell
Sent: Monday, February 24, 2014 6:48 AM
To: William R Elliott
Subject: Re: Help with lock instantiate of SUs

Could you please forward this morning?  Added 2 questions.



Additional questions inline.

>> Date: Fri, 21 Feb 2014 10:47:36 +0530
>> From: praveen malviya <[email protected]>
>> Subject: Re: [users] Help with lock instantiate of SUs
>> To: William R Elliott <[email protected]>
>> Cc: "[email protected]"
>>      <[email protected]>
>> Message-ID: <[email protected]>
>> Content-Type: text/plain; charset=windows-1252; format=flowed
>>
>> Please see inline.
>>
>> Thanks,
>> Praveen
>> On 21-Feb-14 2:38 AM, William R Elliott wrote:
>> > We are Using opensaf 4.4.RC1
>> >
>> > We have a SU with 4 components (no redundancy SG): 1 "AMQMRate" component 
>> > and 3 "Rater" components.  The  AMQMRate component has a recovery mode of 
>> > 3 and the Rater components have a recovery mode of 2.  Both component 
>> > types have a component category of 9. The AMQMRate component has a 
>> > instantiation level of 1.  The Rater component type has a instantiation 
>> > level of 2.  The same script is defined for the terminate and cleanup 
>> > script.
>> >
>> > When we perform a lock for instantiate on the SU,  the behavior is 
>> > unpredictable.
>> > e.g. 2 raters will receive the terminate and then have cleanup called
>> >           PFSaAmfMgr.cc : 579The amfTerminateCallback() was received
>> >           ========== amfRaterComp1.5.2 CLEANUP STARTED at Thu Feb
>> > 20
>> > 15:09:13 EST 2014 ============
>> In LOCK_IN operation components will terminated in reverse order of 
>> instantiation level. So first Raters will be terminated and >> then AMQMRate 
>> component.
>> When the terminate callback was called for Rater either it crashed before 
>> giving response to AMF or it did not respond within the >> callback timeout 
>> limit.
>> A component has to call saAmfResponse() in terminate callback. As specified, 
>> cleanup script was called after terminate callback, >> which means callback 
>> got timed out or component crashed or exited before invoking saAmfResponse() 
>> . In these situation, AMF >> will treat such a component as a faulty and it 
>> will cleanup it by calling cleanup script.
>> > 1 rater and the amqm, do not have their cleanup called :
>> >            PFSaAmfMgr.cc : 579The amfTerminateCallback() was received
>> >            ERROR! The saAmfHealthcheckStop() API call received a
>> > returncode of 12
>> Here in health check stop return code is 12 which is SA_AIS_ERR_NOT_EXIST. 
>> As per AMF spec, it returns this code :
>> " Either one or both of the cases that follow apply:
>>? The Availability Management Framework is not aware of a component 
>>designated by the name to which compName points.
>>? No healthcheck has been started for the component designated by the
>>name to which compName points and for the key to  which healthcheckKey 
>>refers."
>> I think here case2 may be applicable.

We are calling the health check stop after we have responded to the terminate.  
We think that is why we are getting a returncode of 12.   Is there a need to 
call  saAmfHealthcheckStop() or saAmfFinalize() after we respond to a terminate?

>> > We have also seen where 3 components come down cleanly and one does not.
>> >
>> >
>> > We are also confused by what we see in the node director trace.   We can 
>> > see that we are terminating rater 1.2.1 but then we switch to amqm rater 
>> > in the call to avnd_su_pres_terming_compuninst_hdler.  We did not think 
>> > that we would be changing components within that code.  We are confused.
>> >
>> > Feb 19 14:52:00.573921 osafamfnd [5039:clc.cc:0858] T1
>> > 'safComp=amfRaterComp1.2.1,safSu=amfRaterSU1.2,safSg=amfRaterSG1,sa
>> > fAp p=olcApp':FSM Enter presence state:
>> > 'SA_AMF_PRESENCE_TERMINATING(4)':FSM Exit presence
>> > state:SA_AMF_PRESENCE_UNINSTANTIATED(1)
>> > Feb 19 14:52:00.573933 osafamfnd [5039:clc.cc:0891] >> 
>> > avnd_comp_clc_st_chng_prc: Comp 
>> > 'safComp=amfRaterComp1.2.1,safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp',
>> >  Prv_state '4', Final_state '1'
>> > Feb 19 14:52:00.573945 osafamfnd [5039:clc.cc:0954] TR SU and Comp 
>> > Preinst. comp->su->flag '4098', comp->flag '25'
>> > Feb 19 14:52:00.573956 osafamfnd [5039:susm.cc:1356] >> 
>> > avnd_su_pres_fsm_run: 'safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp'
>> > ...
>> > Feb 19 14:52:00.573968 osafamfnd [5039:susm.cc:1361] T1 Entering SU
>> > presence state FSM: current state: 4, event: 8, su
>> > name:safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp
>> > Feb 19 14:52:00.573979 osafamfnd [5039:susm.cc:2383] >> 
>> > avnd_su_pres_terming_compuninst_hdler: Component Uninstantiated event in 
>> > the Terminating 
>> > state:'safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp' : 
>> > 'safComp=amfRaterComp1.2.1,safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp'
>> > Feb 19 14:52:00.573991 osafamfnd [5039:susm.cc:2409] TR PI SU Feb
>> > 19
>> > 14:52:00.574002 osafamfnd [5039:susm.cc:2419] TR Running the
>> > component clc FSM Feb 19 14:52:00.574012 osafamfnd [5039:clc.cc:0763] >> 
>> > avnd_comp_clc_fsm_run: Comp 
>> > 'safComp=amfAMQMRaterComp1.2.1,safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp',
>> >  Ev '4'
>> > Feb 19 14:52:00.574024 osafamfnd [5039:clc.cc:0817] TR stopping all
>> > monitoring for this component Feb 19 14:52:00.574035 osafamfnd 
>> > [5039:cpm.cc:0634] >> avnd_comp_pm_finalize: Comp 
>> > 'safComp=amfAMQMRaterComp1.2.1,safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp'
>> > Feb 19 14:52:00.574046 osafamfnd [5039:cpm.cc:0650] <<
>> > avnd_comp_pm_finalize Feb 19 14:52:00.574057 osafamfnd [5039:clc.cc:0835] 
>> > T1 
>> > 'safComp=amfAMQMRaterComp1.2.1,safSu=amfRaterSU1.2,safSg=amfRaterSG1,safApp=olcApp':Entering
>> >  CLC FSM: presence state:'SA_AMF_PRESENCE_INSTANTIATED(3)', 
>> > Event:'AVND_COMP_CLC_PRES_FSM_EV_TERM'
>> The traces are correct. As amfRaterComp1.2.1 got terminated or cleaned up 
>> successfully, AMF is picking >> afComp=amfAMQMRaterComp1.2.1 to terminate 
>> it. As told above AMF will terminate components by picking them one by one 
>> in >> the reverse order of their instantiation level.
>>
>> Following things can be checked:
>> 1) In terminate callback saAmfResponse() within the configured time limit.
>> 2) Stop healthcheck only if it was started.
>>

Our processes are multithreaded and one thread is responsible for the 
interaction with the AMF.  That thread responds within the configured time 
limit with the saAmfResponse to the receipt of terminate.  The process main 
thread continues to run to be able to do a clean stop (completes the current 
unit of work).    Does OpenSAF expect the process to be stopped after 
responding to the terminate?



>> > We would appreciate any help! Thanks
>> >
>> >
>> > William Elliott | Sr. Analyst, Forward R&D | Office: +1 (800)
>> > 519-8360 Ext. 3859 | Mobile: +1 (407) 619-0337
>> > |--www.NetCracker.com-- Proven Partner to Communications Service
>> > Providers




________________________________
The information transmitted herein is intended only for the person or entity to 
which it is addressed and may contain confidential, proprietary and/or 
privileged material. Any review, retransmission, dissemination or other use of, 
or taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and delete the material from any computer.

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to