Hi,
OPENSAF_TERMTIMEOUT is not used by AMF for any component related
activities. It is used when a user stops OpenSAF.
I think in your application, callback timeouts are not properly tuned.
From the log I see that AMF is instantiating the component by invoking
CLC-CLI script for instantiation. Component registration is also seems
to be successful as AMF is sending component health check callback. But
component is still completing non-amf related instantiation activities
and it does not respond to health check callback in the configured
timeout value. After timeout AMF generated error report on the component
with the reason "healthCheckcallbackTimeout". Since component is
declared faulty, AMF is cleaning up calling clean up script which also
failed and component was moved to TERM_FAILED state.
You need to analyze why component is not responding to health check
callback in time. Based on your analysis you need to decide if health
check callback time out needs to be increased.
Thanks
Praveen
On 30-Aug-17 7:52 PM, Dheeroj Ram wrote:
Hi Praveen,
Thank you so much for the info.
I have another same setup where Opensaf 4.2.2 is running. As per your
input I have modified imm.xml as below:
<object class="SaAmfComp">
<dn>safComp=HMSComp_n11s4,safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4</dn>
<attr>
<name>saAmfCompType</name>
<value>safVersion=4.0.0,safCompType=hmsCompType_n11s4</value>
</attr>
<attr>
<name>saAmfCompInstantiationLevel</name>
<value>1</value>
</attr>
* <attr>*
* <name>saAmfCompCleanupTimeout </name>*
* <value>10000000000</value>*
* </attr>*
<attr>
<name>saAmfCompCmdEnv</name>
<value>AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2</value>
<value>AMF_DEMO_VAR3=COMP1_VALUE3</value>
<value>AMF_DEMO_VAR4=COMP1_VALUE4</value>
</attr>
</object>
But this didn’t work. Issue still exist.
Do I need to test this on 4.4.2 version only.
However I observed that by changing the value “OPENSAF_TERMTIMEOUT=1000”
in nid.conf giving me expected result but only once or twice.
Below are the captured logs while working:
1^st attempt:
= = = = =
Aug 30 17:22:14 localhost kernel: grsec: From 172.16.11.2: signal 11
sent to /hegw/gsw/bin/hms[hms:31056] uid/euid:0/0 gid/egid:0/0, parent
/sbin/init[init:1] uid/euid:0/0 gid/egid:0/0 by /bin/bash[bash:29706]
uid/euid:0/0 gid/egid:0/0, parent /bin/login[login:29705] uid/euid:0/0
gid/egid:0/0
Aug 30 17:22:28 localhost osafamfnd[30931]:
'safComp=HMSComp_n11s4,safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'
faulted due to 'healthCheckcallbackTimeout' : Recovery is 'componentRestart'
Aug 30 17:22:29 localhost AMF_DEMO: CMD=cleanup
Aug 30 17:22:29 localhost AMF_DEMO_VAR: AMF_DEMO_VAR4=COMP1_VALUE4
Aug 30 17:22:29 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1
Aug 30 17:22:29 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2
Aug 30 17:22:29 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3
Aug 30 17:22:29 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2
Aug 30 17:22:29 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3
Aug 30 17:22:32 localhost AMF_DEMO: CMD=instantiate
Aug 30 17:22:32 localhost AMF_DEMO_VAR: AMF_DEMO_VAR4=COMP1_VALUE4
Aug 30 17:22:32 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1
Aug 30 17:22:32 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2
Aug 30 17:22:32 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3
Aug 30 17:22:32 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3
Aug 30 17:22:39 localhost osafamfnd[30931]:
'safComp=HMSComp_n11s4,safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'
faulted due to 'healthCheckcallbackTimeout' : Recovery is 'componentRestart'
Aug 30 17:22:39 localhost AMF_DEMO: CMD=cleanup
Aug 30 17:22:39 localhost AMF_DEMO_VAR: AMF_DEMO_VAR4=COMP1_VALUE4
Aug 30 17:22:39 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1
Aug 30 17:22:39 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2
Aug 30 17:22:39 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3
Aug 30 17:22:39 localhost AMF_DEMO: CMD=instantiate
Aug 30 17:22:39 localhost AMF_DEMO_VAR: AMF_DEMO_VAR4=COMP1_VALUE4
Aug 30 17:22:39 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1
Aug 30 17:22:39 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2
Aug 30 17:22:39 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3
localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3
Aug 30 17:22:46 localhost osafamfnd[30931]:
'safComp=HMSComp_n11s4,safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'
faulted due to 'healthCheckcallbackTimeout' : Recovery is 'componentRestart'
Aug 30 17:22:46 localhost AMF_DEMO: CMD=cleanup
Aug 30 17:22:46 localhost AMF_DEMO_VAR: AMF_DEMO_VAR4=COMP1_VALUE4
Aug 30 17:22:46 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1
Aug 30 17:22:46 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1
Aug 30 17:22:46 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2
Aug 30 17:22:46 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3
Aug 30 17:22:49 localhost AMF_DEMO: CMD=instantiate
Aug 30 17:22:49 localhost AMF_DEMO_VAR: AMF_DEMO_VAR4=COMP1_VALUE4
Aug 30 17:22:49 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1
Aug 30 17:22:49 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2
Aug 30 17:22:49 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3
2^nd attempt :
= = = = = =
Aug 30 17:32:16 localhost osafamfnd[1726]:
'safComp=HMSComp_n11s4,safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'
faulted due to 'healthCheckcallbackTimeout' : Recovery is 'componentRestart'
Aug 30 17:32:16 localhost AMF_DEMO: CMD=cleanup
Aug 30 17:32:16 localhost AMF_DEMO_VAR: AMF_DEMO_VAR4=COMP1_VALUE4
Aug 30 17:32:16 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1
Aug 30 17:32:16 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2
Aug 30 17:32:16 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3
Aug 30 17:32:16 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3
Aug 30 17:32:26 localhost osafamfnd[1726]: Cleanup of
'safComp=HMSComp_n11s4,safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'
failed
Aug 30 17:32:26 localhost osafamfnd[1726]: Reason:'Script did not exit
within time'
Aug 30 17:32:26 localhost osafamfnd[1726]: SU Failover trigerred for
'safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4': Failed
component:
'safComp=HMSComp_n11s4,safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'
Aug 30 17:32:26 localhost osafamfnd[1726]:
'safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4' Presence
State INSTANTIATED => TERMINATION_FAILED
Aug 30 17:32:26 localhost osafamfnd[1726]: Assigning
'safSi=HenbGw,safApp=HenbGwApp_PL_n11s4' QUIESCED to
'safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'
Aug 30 17:32:26 localhost osafamfnd[1726]: Assigned
'safSi=HenbGw,safApp=HenbGwApp_PL_n11s4' QUIESCED to
'safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'
Aug 30 17:32:26 localhost osafamfnd[1726]: Removing
'safSi=HenbGw,safApp=HenbGwApp_PL_n11s4' from
'safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'
Aug 30 17:32:26 localhost osafamfnd[1726]: Removed
'safSi=HenbGw,safApp=HenbGwApp_PL_n11s4' from
'safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'
Aug 30 17:32:26 localhost osafamfnd[1726]: Removed
'safSi=HenbGw,safApp=HenbGwApp_PL_n11s4' from
'safSu=SU-n11s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n11s4'
Aug 30 17:36:37 localhost xinetd[1321]: START: shell pid=2307
from=172.16.11.2
Aug 30 17:36:37 localhost rshd[2308]: root@n11s2 as root:
cmd='/hegw/hgsm/sbin/computeOtherModuleUsage 4'
Aug 30 17:36:42 localhost xinetd[1321]: EXIT: shell status=0 pid=2307
duration=5(sec)
Aug 30 17:36:49 localhost xinetd[1321]: START: shell pid=2341
from=172.16.11.2
Aug 30 17:36:49 localhost rshd[2342]: root@n11s2 as root:
cmd='/hegw/hgsm/sbin/computeOtherProcsUsage 4'
Aug 30 17:36:49 localhost xinetd[1321]: EXIT: shell status=0 pid=2341
duration=0(sec)
My question is
Why changing the value of OPENSAF_TERMTIMEOUT is not working every time.
My expectation is Opensaf will try for 'componentRestart' several times
as in 1^st attempt log.
Need your help.
Thanks You.
Regards
Dheeraj
>>
[Praveen] I guess intention is to increase timeout for clean up script.
It can be done by changing saAmfCompCleanupTimeout in component (object
of class "SaAmfComp") or by changing saAmfCtDefClcCliTimeout in comptype
(class "SaAmfCompType") of component. If changed in comptype, it will be
applicable to each component of this comptype provided comp is not
overriding it by configuring saAmfCompCleanupTimeout.
Thanks
Praveen
>>
============================================================================================================================
Disclaimer: This message and the information contained herein is
proprietary and confidential and subject to the Tech Mahindra policy
statement, you may review the policy at
http://www.techmahindra.com/Disclaimer.html
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.techmahindra.com_Disclaimer.html&d=DwMGaQ&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=Thofzr9WGUqJiISVOngWeTWh2oQx_OxTZ0BdD_G2W9o&s=VaW-Es6d4T-hmKCs4CmrSbNVn0NIY97GVu5Wnwc403M&e=>
externally http://tim.techmahindra.com/tim/disclaimer.html
<https://urldefense.proofpoint.com/v2/url?u=http-3A__tim.techmahindra.com_tim_disclaimer.html&d=DwMGaQ&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=Thofzr9WGUqJiISVOngWeTWh2oQx_OxTZ0BdD_G2W9o&s=KE-sFBn2Nnp9ORaR8tLXHOdzcbhDk3WSrTxvkzasiHg&e=>
internally within TechMahindra.
============================================================================================================================
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users