Hi All,

I am new to opensaf. Need your help.

Please find my Opensaf Setup as below:

I am using Opensaf 4.4.2 Version and below is my opensaf status output:

atcafs-n10s2:~# /etc/init.d/opensafd status
safSISU=safSu=n10s2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed10,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=n10s2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU-n10s2\,safSg=HenbGw-SG\,safApp=HenbGwApp,safSi=HenbGw,safApp=HenbGwApp
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=n10s1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=n10s1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
        saAmfSISUHAState=STANDBY(2)
safSISU=safSu=SU-n10s1\,safSg=HenbGw-SG\,safApp=HenbGwApp,safSi=HenbGw,safApp=HenbGwApp
        saAmfSISUHAState=STANDBY(2)
safSISU=safSu=SU-n10s5\,safSg=HenbGw-SG\,safApp=HenbGwApp_PL_n10s5,safSi=HenbGw,safApp=HenbGwApp_PL_n10s5
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU-n10s4\,safSg=HenbGw-SG\,safApp=HenbGwApp_PL_n10s4,safSi=HenbGw,safApp=HenbGwApp_PL_n10s4
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=n10s5\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=n10s4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
atcafs-n10s2:~#

whereas n10s1, n10s2 are my controllers and n10s4,n105 are Payloads.

Below applications are running on Payloads:

atcafs-n10s4:~# ps -aef | grep ins
root      3379     1 21 11:34 ?        00:21:36 /hegw/gsw/bin/hms instantiate
root      3396     1 11 11:34 ?        00:11:49 /hegw/gsw/bin/mms instantiate
root      3410     1  2 11:34 ?        00:02:05 /hegw/gsw/bin/dra instantiate
root      3424     1  2 11:34 ?        00:02:15 /hegw/gsw/bin/bcm instantiate

Problem Detail:

When I killed the application (hms) with signal 11 "kill -11 3379 " , it 
generates a core ( about size 7GB). Opensaf trying to restart the process in 
60s , but by that time my process was busy with writing the core and till then 
PID is active.
So opensaf failed with below error:

Aug 29 13:26:12 localhost kernel: grsec: From 172.16.10.1: signal 11 sent to 
/hegw/gsw/bin/hms[hms:11902] uid/euid:0/0 gid/egid:0/0, parent 
/sbin/init[init:1] uid/euid:0/0 gid/egid:0/0 by /bin/bash[bash:10442] 
uid/euid:0/0 gid/egid:0/0, parent /bin/login[login:10441] uid/euid:0/0 
gid/egid:0/0
Aug 29 13:26:27 localhost osafamfnd[11779]: 
'safComp=HMSComp_n10s4,safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4'
 faulted due to 'healthCheckcallbackTimeout' : Recovery is 'componentRestart'
Aug 29 13:26:27 localhost AMF_DEMO: CMD=cleanup
Aug 29 13:26:27 localhost AMF_DEMO_VAR: AMF_DEMO_VAR4=COMP1_VALUE4
Aug 29 13:26:27 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1
Aug 29 13:26:27 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2
Aug 29 13:26:27 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3
Aug 29 13:26:37 localhost osafamfnd[11779]: Cleanup of 
'safComp=HMSComp_n10s4,safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4'
 failed
Aug 29 13:26:37 localhost osafamfnd[11779]: Reason:'Script did not exit within 
time'
Aug 29 13:26:37 localhost osafamfnd[11779]: SU Failover trigerred for 
'safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4': Failed component: 
'safComp=HMSComp_n10s4,safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4'
Aug 29 13:26:37 localhost osafamfnd[11779]: 
'safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4' Presence State 
INSTANTIATED => TERMINATION_FAILED
Aug 29 13:26:37 localhost osafamfnd[11779]: Assigning 
'safSi=HenbGw,safApp=HenbGwApp_PL_n10s4' QUIESCED to 
'safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4'
Aug 29 13:26:37 localhost osafamfnd[11779]: Assigned 
'safSi=HenbGw,safApp=HenbGwApp_PL_n10s4' QUIESCED to 
'safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4'
Aug 29 13:26:37 localhost osafamfnd[11779]: Removing 
'safSi=HenbGw,safApp=HenbGwApp_PL_n10s4' from 
'safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4'
Aug 29 13:26:37 localhost osafamfnd[11779]: Removed 
'safSi=HenbGw,safApp=HenbGwApp_PL_n10s4' from 
'safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4'


I have given a try by modifying "OPENSAF_TERMTIMEOUT=1000" in nid.conf file.

But it didn't work. Issue still exist.

Please let me know if you need any more detail.

Thanks
Dheeraj







============================================================================================================================

Disclaimer:  This message and the information contained herein is proprietary 
and confidential and subject to the Tech Mahindra policy statement, you may 
review the policy at http://www.techmahindra.com/Disclaimer.html 
<http://www.techmahindra.com/Disclaimer.html> externally 
http://tim.techmahindra.com/tim/disclaimer.html 
<http://tim.techmahindra.com/tim/disclaimer.html> internally within 
TechMahindra.

============================================================================================================================
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to