Finally I could get hold of the traces! Turned out to be a simpler case than
the initial analysis. Thanks surender for re-running and sharing the traces.
It’s a simple case of
- Issue CLM Lock of a node (PL-5)
- Make the PL-5 node non-member
- Lock callback timesout and the nodeentry is not found(whichis fine) and
the abort gets hit.
While the root cause is of an incorrectly placed abort, the fix is to lookup
based on name than on id because the node with that id has gone down and is not
relevant any more.
Cheers,
Mathi.
---
** [tickets:#227] clmd asserts on active controller during node lock timeout**
**Status:** accepted
**Created:** Wed May 15, 2013 10:23 AM UTC by Mathi Naickan
**Last Updated:** Fri Jul 05, 2013 11:05 AM UTC
**Owner:** Mathi Naickan
I have asked for traces from the submitter.
changeset : 4007 with patch 2865
scenario:
========
Trying to do lock/lock-in of PL-5.
amf-adm lock safNode=PL-5,safCluster=myClmCluster
error - saImmOmAdminOperationInvoke_2 FAILED: SA_AIS_ERR_TIMEOUT (5)
error: failed to eval/store amf-adm lock safNode=PL-5,safCluster=myClmCluster
failed. Aborting script! exitCode: 1
#0 0x00007fb446240b55 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007fb446240b55 in raise () from /lib64/libc.so.6
#1 0x00007fb446242131 in abort () from /lib64/libc.so.6
#2 0x00007fb447881e44 in osafassert_fail (file=0x420380 "clms_evt.c", line=390,
func=0x420680 "proc_node_lock_tmr_exp_msg", assertion=0x42069b "op_node !=
NULL") at sysf_def.c:301
#3 0x000000000040954a in proc_node_lock_tmr_exp_msg (evt=0x655290) at
clms_evt.c:390
#4 0x000000000040bc42 in clms_process_mbx (mbx=0x6298a0) at clms_evt.c:1272
#5 0x0000000000412b3b in main (argc=1, argv=0x7fff3162cb28) at clms_main.c:455
(gdb) bt full
#0 0x00007fb446240b55 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007fb446242131 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00007fb447881e44 in osafassert_fail (file=0x420380 "clms_evt.c", line=390,
func=0x420680 "proc_node_lock_tmr_exp_msg", assertion=0x42069b "op_node !=
NULL") at sysf_def.c:301
No locals.
#3 0x000000000040954a in proc_node_lock_tmr_exp_msg (evt=0x655290) at
clms_evt.c:390
rc = 1
node_id = 132367
op_node = 0x0
FUNCTION = "proc_node_lock_tmr_exp_msg"
#4 0x000000000040bc42 in clms_process_mbx (mbx=0x6298a0) at clms_evt.c:1272
msg = 0x655290
FUNCTION = "clms_process_mbx"
#5 0x0000000000412b3b in main (argc=1, argv=0x7fff3162cb28) at clms_main.c:455
ret = 1
mbx_fd = {raise_obj = 11, rmv_obj = 12}
error = SA_AIS_OK
rc = 1
FUNCTION = "main"
syslog on sc-1:
==============
Mar 13 12:27:23 SLES1 osafclmd[6575]: clms_evt.c:390:
proc_node_lock_tmr_exp_msg: Assertion 'op_node != NULL' failed.
Mar 13 12:27:23 SLES1 osafamfnd[6604]: NO
'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
Mar 13 12:27:23 SLES1 osafamfnd[6604]: ER
safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery
is:nodeFailfast
Mar 13 12:27:23 SLES1 osafamfnd[6604]: Rebooting OpenSAF NodeId? = 131343 EE
Name = , Reason: Component faulted: recovery is node failfast
Mar 13 12:27:23 SLES1 opensaf_reboot: Rebooting local node
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets