Hi,

OPENSAF_TERMTIMEOUT is not used by AMF for any component related activities. It is used when a user stops OpenSAF.

I think in your application, callback timeouts are not properly tuned.
From the log I see that AMF is instantiating the component by invoking CLC-CLI script for instantiation. Component registration is also seems to be successful as AMF is sending component health check callback. But component is still completing non-amf related instantiation activities and it does not respond to health check callback in the configured timeout value. After timeout AMF generated error report on the component with the reason "healthCheckcallbackTimeout". Since component is declared faulty, AMF is cleaning up calling clean up script which also failed and component was moved to TERM_FAILED state. You need to analyze why component is not responding to health check callback in time. Based on your analysis you need to decide if health check callback time out needs to be increased.


Thanks
Praveen


On 30-Aug-17 7:52 PM, Dheeroj Ram wrote:
Hi Praveen,

Thank you so much for the info.

I have another same setup where Opensaf 4.2.2 is running.  As per your input I have modified imm.xml as below:

         <object class="SaAmfComp">

<dn>safComp=HMSComp_n11s4,safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4</dn>

                 <attr>

                         <name>saAmfCompType</name>

<value>safVersion=4.0.0,safCompType=hmsCompType_n11s4</value>

                 </attr>

                 <attr>

                         <name>saAmfCompInstantiationLevel</name>

                         <value>1</value>

                 </attr>

*                <attr>*

*                        <name>saAmfCompCleanupTimeout </name>*

*                        <value>10000000000</value>*

*                </attr>*

                 <attr>

                         <name>saAmfCompCmdEnv</name>

                         <value>AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2</value>

                         <value>AMF_DEMO_VAR3=COMP1_VALUE3</value>

                         <value>AMF_DEMO_VAR4=COMP1_VALUE4</value>

                 </attr>

         </object>

But this didn’t work. Issue still exist.

Do I need to test this on 4.4.2 version only.

However I observed that by changing the value “OPENSAF_TERMTIMEOUT=1000” in nid.conf giving me expected result but only once or twice.

Below are the captured  logs while working:

1^st attempt:

=  =  =  =  =

Aug 30 17:22:14 localhost kernel: grsec: From 172.16.11.2: signal 11 sent to /hegw/gsw/bin/hms[hms:31056] uid/euid:0/0 gid/egid:0/0, parent /sbin/init[init:1] uid/euid:0/0 gid/egid:0/0 by /bin/bash[bash:29706] uid/euid:0/0 gid/egid:0/0, parent /bin/login[login:29705] uid/euid:0/0 gid/egid:0/0

Aug 30 17:22:28 localhost osafamfnd[30931]: 'safComp=HMSComp_n11s4,safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4' faulted due to 'healthCheckcallbackTimeout' : Recovery is 'componentRestart'

Aug 30 17:22:29 localhost AMF_DEMO: CMD=cleanup

Aug 30 17:22:29 localhost AMF_DEMO_VAR: AMF_DEMO_VAR4=COMP1_VALUE4

Aug 30 17:22:29 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1

Aug 30 17:22:29 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2

Aug 30 17:22:29 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3

Aug 30 17:22:29 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2

Aug 30 17:22:29 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3

Aug 30 17:22:32 localhost AMF_DEMO: CMD=instantiate

Aug 30 17:22:32 localhost AMF_DEMO_VAR: AMF_DEMO_VAR4=COMP1_VALUE4

Aug 30 17:22:32 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1

Aug 30 17:22:32 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2

Aug 30 17:22:32 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3

Aug 30 17:22:32 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3

Aug 30 17:22:39 localhost osafamfnd[30931]: 'safComp=HMSComp_n11s4,safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4' faulted due to 'healthCheckcallbackTimeout' : Recovery is 'componentRestart'

Aug 30 17:22:39 localhost AMF_DEMO: CMD=cleanup

Aug 30 17:22:39 localhost AMF_DEMO_VAR: AMF_DEMO_VAR4=COMP1_VALUE4

Aug 30 17:22:39 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1

Aug 30 17:22:39 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2

Aug 30 17:22:39 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3

Aug 30 17:22:39 localhost AMF_DEMO: CMD=instantiate

Aug 30 17:22:39 localhost AMF_DEMO_VAR: AMF_DEMO_VAR4=COMP1_VALUE4

Aug 30 17:22:39 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1

Aug 30 17:22:39 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2

Aug 30 17:22:39 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3

localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3

Aug 30 17:22:46 localhost osafamfnd[30931]: 'safComp=HMSComp_n11s4,safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4' faulted due to 'healthCheckcallbackTimeout' : Recovery is 'componentRestart'

Aug 30 17:22:46 localhost AMF_DEMO: CMD=cleanup

Aug 30 17:22:46 localhost AMF_DEMO_VAR: AMF_DEMO_VAR4=COMP1_VALUE4

Aug 30 17:22:46 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1

Aug 30 17:22:46 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1

Aug 30 17:22:46 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2

Aug 30 17:22:46 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3

Aug 30 17:22:49 localhost AMF_DEMO: CMD=instantiate

Aug 30 17:22:49 localhost AMF_DEMO_VAR: AMF_DEMO_VAR4=COMP1_VALUE4

Aug 30 17:22:49 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1

Aug 30 17:22:49 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2

Aug 30 17:22:49 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3

2^nd attempt :

=  = = = = =

Aug 30 17:32:16 localhost osafamfnd[1726]: 'safComp=HMSComp_n11s4,safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4' faulted due to 'healthCheckcallbackTimeout' : Recovery is 'componentRestart'

Aug 30 17:32:16 localhost AMF_DEMO: CMD=cleanup

Aug 30 17:32:16 localhost AMF_DEMO_VAR: AMF_DEMO_VAR4=COMP1_VALUE4

Aug 30 17:32:16 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1

Aug 30 17:32:16 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2

Aug 30 17:32:16 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3

Aug 30 17:32:16 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3

Aug 30 17:32:26 localhost osafamfnd[1726]: Cleanup of 'safComp=HMSComp_n11s4,safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4' failed

Aug 30 17:32:26 localhost osafamfnd[1726]: Reason:'Script did not exit within time'

Aug 30 17:32:26 localhost osafamfnd[1726]: SU Failover trigerred for 'safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4': Failed component: 'safComp=HMSComp_n11s4,safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'

Aug 30 17:32:26 localhost osafamfnd[1726]: 'safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4' Presence State INSTANTIATED => TERMINATION_FAILED

Aug 30 17:32:26 localhost osafamfnd[1726]: Assigning 'safSi=HenbGw,safApp=HenbGwApp_PL_n11s4' QUIESCED to 'safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'

Aug 30 17:32:26 localhost osafamfnd[1726]: Assigned 'safSi=HenbGw,safApp=HenbGwApp_PL_n11s4' QUIESCED to 'safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'

Aug 30 17:32:26 localhost osafamfnd[1726]: Removing 'safSi=HenbGw,safApp=HenbGwApp_PL_n11s4' from 'safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'

Aug 30 17:32:26 localhost osafamfnd[1726]: Removed 'safSi=HenbGw,safApp=HenbGwApp_PL_n11s4' from 'safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'

Aug 30 17:32:26 localhost osafamfnd[1726]: Removed 'safSi=HenbGw,safApp=HenbGwApp_PL_n11s4' from 'safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'

Aug 30 17:36:37 localhost xinetd[1321]: START: shell pid=2307 from=172.16.11.2

Aug 30 17:36:37 localhost rshd[2308]: root@n11s2 as root: cmd='/hegw/hgsm/sbin/computeOtherModuleUsage 4'

Aug 30 17:36:42 localhost xinetd[1321]: EXIT: shell status=0 pid=2307 duration=5(sec)

Aug 30 17:36:49 localhost xinetd[1321]: START: shell pid=2341 from=172.16.11.2

Aug 30 17:36:49 localhost rshd[2342]: root@n11s2 as root: cmd='/hegw/hgsm/sbin/computeOtherProcsUsage 4'

Aug 30 17:36:49 localhost xinetd[1321]: EXIT: shell status=0 pid=2341 duration=0(sec)

My question is

Why changing the value of OPENSAF_TERMTIMEOUT is not working every time.

My expectation is Opensaf will try for 'componentRestart' several times as in 1^st attempt log.

Need your help.

Thanks  You.

Regards

Dheeraj

 >>

[Praveen] I guess intention is to increase timeout for clean up script.

It can be done by changing saAmfCompCleanupTimeout in component (object of class "SaAmfComp") or by changing saAmfCtDefClcCliTimeout in comptype (class "SaAmfCompType") of component. If changed in comptype, it will be applicable to each component of this comptype provided comp is not overriding it by configuring saAmfCompCleanupTimeout.

Thanks

Praveen

 >>

============================================================================================================================

Disclaimer:  This message and the information contained herein is proprietary and confidential and subject to the Tech Mahindra policy statement, you may review the policy at http://www.techmahindra.com/Disclaimer.html <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.techmahindra.com_Disclaimer.html&d=DwMGaQ&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=Thofzr9WGUqJiISVOngWeTWh2oQx_OxTZ0BdD_G2W9o&s=VaW-Es6d4T-hmKCs4CmrSbNVn0NIY97GVu5Wnwc403M&e=> externally http://tim.techmahindra.com/tim/disclaimer.html <https://urldefense.proofpoint.com/v2/url?u=http-3A__tim.techmahindra.com_tim_disclaimer.html&d=DwMGaQ&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=Thofzr9WGUqJiISVOngWeTWh2oQx_OxTZ0BdD_G2W9o&s=KE-sFBn2Nnp9ORaR8tLXHOdzcbhDk3WSrTxvkzasiHg&e=> internally within TechMahindra.

============================================================================================================================


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to