- **status**: accepted --> review


---

** [tickets:#574] amfnd abort causing node reboot**

**Status:** review
**Created:** Tue Sep 24, 2013 08:33 AM UTC by Hans Feldt
**Last Updated:** Tue Sep 24, 2013 08:33 AM UTC
**Owner:** Hans Feldt

amfnd fails to read from IMM (comp capability) due to some unknown reason which 
causes an abort in immutils and a core dump. Which in turn causes the amf 
watchdog to reboot the node.

This particular IMM read is in the criticial switch-over logic when the 
application is already added and up providing service. The read of comp 
capability can easily be avoided with just some more information included in an 
amfd-amfnd message.

==================================================================================

2013-09-09 11:49:52  osafamfnd SC-2-1 notice osafamfnd[5336]: NO Assigning 'all 
(37) SIs' STANDBY to 'safSu=1,safSg=2N,safApp=SomeApp'
2013-09-09 11:49:52  osafamfnd SC-2-1 notice osafamfnd[5336]: NO Assigning 
'safSi=CS,safApp=SomeApp' STANDBY to 'safSu=1,safSg=2N,safApp=SomeApp'
2013-09-09 11:50:02  osafamfnd SC-2-1 err osafamfnd[5336]: saImmOmInitialize 
FAILED, rc = 5
2013-09-09 11:50:04  osafrded SC-2-1 alert osafrded[5113]: AL AMF Node Director 
is down, terminate this process
2013-09-09 11:50:04  osaffmd SC-2-1 alert osaffmd[5122]: AL AMF Node Director 
is down, terminate this process
2013-09-09 11:50:04  osafimmnd SC-2-1 alert osafimmnd[5142]: AL AMF Node 
Director is down, terminate this process
2013-09-09 11:50:06  osafpmnd SC-2-1 alert osafpmnd[5405]: AL AMF Node Director 
is down, terminate this process
2013-09-09 11:50:04  osafpmd SC-2-1 alert osafpmd[5421]: AL AMF Node Director 
is down, terminate this process
2013-09-09 11:50:04  osafamfwd SC-2-1 crit osafamfwd[5463]: Rebooting OpenSAF 
NodeId = 0 EE Name = No EE Mapped, Reason: AMF unexpectedly crashed, OwnNodeId 
= 131343, SupervisionTime = 60
2013-09-09 11:50:04  osafckptd SC-2-1 alert osafckptd[5520]: AL AMF Node 
Director is down, terminate this process 

(gdb) bt full
#0 0x00007fab46742b35 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007fab46744111 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00000000004051f8 in defaultImmutilError (fmt=0x43fef0 "rc = %d")
at ../../../../../osaf/tools/safimm/src/immutil.c:72
ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7fffa48dc300, 
reg_save_area = 0x7fffa48dc230}}
ap2 = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7fffa48dc300,
reg_save_area = 0x7fffa48dc230}}
#3 0x00000000004065f4 in immutil_saImmOmInitialize (immHandle=0x7fffa48dc490, 
immCallbacks=0x0,
version=0x7fffa48dc4b0) at ../../../../../osaf/tools/safimm/src/immutil.c:1127
localVer = {releaseCode = 65 'A', majorVersion = 2 '\002', minorVersion = 12 
'\f'}
rc = SA_AIS_ERR_TIMEOUT
nTries = 6886170
#4 0x000000000041df97 in avnd_comp_cap_x_act_or_1_act_check 
(comp_type=0x69131a, csi_type=0x6f0142)
at avnd_comp.c:911
rc = <optimized out>
error = <optimized out>
dn = {length = 97,
value = 
"safSupportedCsType=safVersion=1.0.0\\,safCSType=X,safVersion=R2B,safCompType=X",
 '\000' <repeats 158 times>}
accessorHandle = 0
attributes = <optimized out>
---Type <return> to continue, or q <return> to quit---
comp_cap = <optimized out>
attributeNames = {0x443552 "ULL", 0x0}
immOmHandle = 0
immVersion = {releaseCode = 65 'A', majorVersion = 2 '\002', minorVersion = 1 
'\001'}
__FUNCTION__ = "avnd_comp_cap_x_act_or_1_act_check"
#5 0x000000000041e43b in avnd_comp_csi_assign (cb=0x6578c0, comp=0x6911e0, 
csi=0x0) at avnd_comp.c:1017
npi_prv_inst = <optimized out>
npi_curr_inst = <optimized out>
curr_csi = 0x6f0010
comp_ev = <optimized out>
rc = <optimized out>
csiname = 0x4434f1 "%u"
__FUNCTION__ = "avnd_comp_csi_assign"
#6 0x0000000000436d9c in assign_si_to_su (si=0x69ccc0, su=0x66f770, 
single_csi=0) at avnd_susm.c:561
npi_prv_inst = <optimized out>
npi_curr_inst = 6
su_ev = 4294967295
rc = 6933746
curr_csi = 0x6f0010
__FUNCTION__ = "assign_si_to_su"
#7 0x0000000000437219 in avnd_su_si_assign (cb=<optimized out>, su=0x66f770, 
si=0x69ccc0) at avnd_susm.c:606
rc = <optimized out>
rank = <optimized out>
---Type <return> to continue, or q <return> to quit---
curr_si = <optimized out>
curr_csi = <optimized out>
__FUNCTION__ = "avnd_su_si_assign"
#8 0x0000000000434b9d in avnd_su_si_msg_prc (cb=0x6578c0, su=0x66f770, 
info=<optimized out>) at avnd_susm.c:349
csi_param = 0x6f8df8
si = <optimized out>
rc = 1
csi = <optimized out>
__FUNCTION__ = "avnd_su_si_msg_prc"
#9 0x000000000043216e in avnd_evt_avd_info_su_si_assign_evh (cb=0x6578c0, 
evt=<optimized out>) at avnd_su.c:258
info = <optimized out>
siq = <optimized out>
su = 0x66f770
rc = <optimized out>
__FUNCTION__ = "avnd_evt_avd_info_su_si_assign_evh"
#10 0x0000000000430190 in avnd_main_process () at avnd_proc.c:218
ret = 0
mbx_fd = <optimized out>
fds = {{fd = 11, events = 1, revents = 1}, {fd = 15, events = 1, revents = 0}, 
{fd = 13, events = 1,
revents = 0}, {fd = 0, events = 0, revents = 0}}
evt = 0x6c5190
__FUNCTION__ = "avnd_main_process"
#11 0x0000000000408815 in main (argc=1, argv=0x7fffa48dc7a8) at amfnd_main.c:61
---Type <return> to continue, or q <return> to quit---
error = 32767
ret = <optimized out>




---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to