changeset:   5134:22fcb906e04e
branch:      opensaf-4.3.x
parent:      5130:4419501105e0
user:        Hans Feldt <[email protected]>
date:        Thu Apr 10 13:22:41 2014 +0200
summary:     avsv: include and use sirank in SUSI msg [#574]

changeset:   5135:1457cc1bc8eb
branch:      opensaf-4.3.x
user:        Hans Feldt <[email protected]>
date:        Thu Apr 10 13:32:04 2014 +0200
summary:     avsv: include and use comp capability in SUSI msg [#574]

changeset:   5136:46ea86024d6a
branch:      opensaf-4.3.x
user:        Hans Feldt <[email protected]>
date:        Thu Apr 10 13:33:13 2014 +0200
summary:     avsv: include su_failover in REG_SU msg [#574]

changeset:   5137:70fee6ec323f
branch:      opensaf-4.4.x
parent:      5131:6a2171548ea4
user:        Hans Feldt <[email protected]>
date:        Thu Apr 10 13:36:39 2014 +0200
summary:     amf: include and use sirank in SUSI msg [#574]

changeset:   5138:8eb969f21971
branch:      opensaf-4.4.x
user:        Hans Feldt <[email protected]>
date:        Thu Apr 10 13:36:39 2014 +0200
summary:     amf: include and use comp capability in SUSI msg [#574]

changeset:   5139:b7a8df1a58d6
branch:      opensaf-4.4.x
user:        Hans Feldt <[email protected]>
date:        Thu Apr 10 13:36:40 2014 +0200
summary:     amf: include and use su_failover in REG_SU msg [#574]

changeset:   5140:f90ee633a89f
parent:      5133:a2360a43963f
user:        Hans Feldt <[email protected]>
date:        Thu Apr 10 13:36:54 2014 +0200
summary:     amf: include and use sirank in SUSI msg [#574]

changeset:   5141:d6e7d85efb8b
user:        Hans Feldt <[email protected]>
date:        Thu Apr 10 13:36:54 2014 +0200
summary:     amf: include and use comp capability in SUSI msg [#574]

changeset:   5142:8dcb7f6a6762
tag:         tip
user:        Hans Feldt <[email protected]>
date:        Thu Apr 10 13:36:55 2014 +0200
summary:     amf: include and use su_failover in REG_SU msg [#574]



---

** [tickets:#574] amfnd abort causing node reboot**

**Status:** fixed
**Milestone:** 4.3.3
**Created:** Tue Sep 24, 2013 08:33 AM UTC by Hans Feldt
**Last Updated:** Tue Apr 08, 2014 05:42 PM UTC
**Owner:** nobody

amfnd fails to read from IMM (comp capability) due to some unknown reason which 
causes an abort in immutils and a core dump. Which in turn causes the amf 
watchdog to reboot the node.

This particular IMM read is in the criticial switch-over logic when the 
application is already added and up providing service. The read of comp 
capability can easily be avoided with just some more information included in an 
amfd-amfnd message.

==================================================================================

2013-09-09 11:49:52  osafamfnd SC-2-1 notice osafamfnd[5336]: NO Assigning 'all 
(37) SIs' STANDBY to 'safSu=1,safSg=2N,safApp=SomeApp'
2013-09-09 11:49:52  osafamfnd SC-2-1 notice osafamfnd[5336]: NO Assigning 
'safSi=CS,safApp=SomeApp' STANDBY to 'safSu=1,safSg=2N,safApp=SomeApp'
2013-09-09 11:50:02  osafamfnd SC-2-1 err osafamfnd[5336]: saImmOmInitialize 
FAILED, rc = 5
2013-09-09 11:50:04  osafrded SC-2-1 alert osafrded[5113]: AL AMF Node Director 
is down, terminate this process
2013-09-09 11:50:04  osaffmd SC-2-1 alert osaffmd[5122]: AL AMF Node Director 
is down, terminate this process
2013-09-09 11:50:04  osafimmnd SC-2-1 alert osafimmnd[5142]: AL AMF Node 
Director is down, terminate this process
2013-09-09 11:50:06  osafpmnd SC-2-1 alert osafpmnd[5405]: AL AMF Node Director 
is down, terminate this process
2013-09-09 11:50:04  osafpmd SC-2-1 alert osafpmd[5421]: AL AMF Node Director 
is down, terminate this process
2013-09-09 11:50:04  osafamfwd SC-2-1 crit osafamfwd[5463]: Rebooting OpenSAF 
NodeId = 0 EE Name = No EE Mapped, Reason: AMF unexpectedly crashed, OwnNodeId 
= 131343, SupervisionTime = 60
2013-09-09 11:50:04  osafckptd SC-2-1 alert osafckptd[5520]: AL AMF Node 
Director is down, terminate this process 

(gdb) bt full
#0 0x00007fab46742b35 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007fab46744111 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00000000004051f8 in defaultImmutilError (fmt=0x43fef0 "rc = %d")
at ../../../../../osaf/tools/safimm/src/immutil.c:72
ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7fffa48dc300, 
reg_save_area = 0x7fffa48dc230}}
ap2 = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7fffa48dc300,
reg_save_area = 0x7fffa48dc230}}
#3 0x00000000004065f4 in immutil_saImmOmInitialize (immHandle=0x7fffa48dc490, 
immCallbacks=0x0,
version=0x7fffa48dc4b0) at ../../../../../osaf/tools/safimm/src/immutil.c:1127
localVer = {releaseCode = 65 'A', majorVersion = 2 '\002', minorVersion = 12 
'\f'}
rc = SA_AIS_ERR_TIMEOUT
nTries = 6886170
#4 0x000000000041df97 in avnd_comp_cap_x_act_or_1_act_check 
(comp_type=0x69131a, csi_type=0x6f0142)
at avnd_comp.c:911
rc = <optimized out>
error = <optimized out>
dn = {length = 97,
value = 
"safSupportedCsType=safVersion=1.0.0\\,safCSType=X,safVersion=R2B,safCompType=X",
 '\000' <repeats 158 times>}
accessorHandle = 0
attributes = <optimized out>
---Type <return> to continue, or q <return> to quit---
comp_cap = <optimized out>
attributeNames = {0x443552 "ULL", 0x0}
immOmHandle = 0
immVersion = {releaseCode = 65 'A', majorVersion = 2 '\002', minorVersion = 1 
'\001'}
__FUNCTION__ = "avnd_comp_cap_x_act_or_1_act_check"
#5 0x000000000041e43b in avnd_comp_csi_assign (cb=0x6578c0, comp=0x6911e0, 
csi=0x0) at avnd_comp.c:1017
npi_prv_inst = <optimized out>
npi_curr_inst = <optimized out>
curr_csi = 0x6f0010
comp_ev = <optimized out>
rc = <optimized out>
csiname = 0x4434f1 "%u"
__FUNCTION__ = "avnd_comp_csi_assign"
#6 0x0000000000436d9c in assign_si_to_su (si=0x69ccc0, su=0x66f770, 
single_csi=0) at avnd_susm.c:561
npi_prv_inst = <optimized out>
npi_curr_inst = 6
su_ev = 4294967295
rc = 6933746
curr_csi = 0x6f0010
__FUNCTION__ = "assign_si_to_su"
#7 0x0000000000437219 in avnd_su_si_assign (cb=<optimized out>, su=0x66f770, 
si=0x69ccc0) at avnd_susm.c:606
rc = <optimized out>
rank = <optimized out>
---Type <return> to continue, or q <return> to quit---
curr_si = <optimized out>
curr_csi = <optimized out>
__FUNCTION__ = "avnd_su_si_assign"
#8 0x0000000000434b9d in avnd_su_si_msg_prc (cb=0x6578c0, su=0x66f770, 
info=<optimized out>) at avnd_susm.c:349
csi_param = 0x6f8df8
si = <optimized out>
rc = 1
csi = <optimized out>
__FUNCTION__ = "avnd_su_si_msg_prc"
#9 0x000000000043216e in avnd_evt_avd_info_su_si_assign_evh (cb=0x6578c0, 
evt=<optimized out>) at avnd_su.c:258
info = <optimized out>
siq = <optimized out>
su = 0x66f770
rc = <optimized out>
__FUNCTION__ = "avnd_evt_avd_info_su_si_assign_evh"
#10 0x0000000000430190 in avnd_main_process () at avnd_proc.c:218
ret = 0
mbx_fd = <optimized out>
fds = {{fd = 11, events = 1, revents = 1}, {fd = 15, events = 1, revents = 0}, 
{fd = 13, events = 1,
revents = 0}, {fd = 0, events = 0, revents = 0}}
evt = 0x6c5190
__FUNCTION__ = "avnd_main_process"
#11 0x0000000000408815 in main (argc=1, argv=0x7fffa48dc7a8) at amfnd_main.c:61
---Type <return> to continue, or q <return> to quit---
error = 32767
ret = <optimized out>




---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to