This is really a very strange ticket, because this assert get hit when the node 
lock timeout message does not finds the entry for the node on which lock was 
attempted.

The *only* way this can happen is the following impossible scenario:
- If the lock timer was not stopped upon recieving a OK/ERROR message from the 
CLM subscribers, 
- A subsequent immcfg command by user deletes the node, 
- Eventually the lock timer times out and the timeout message is unable to find 
the node entry because the node was deleted in the above step.

The other routine options for such an issue are well handled
like:-

- A client fails to respond within a small timeout for the imm admin(with 
default sync API timeout value) command and a immcfg for node delete is 
attempted
- A client fails to respond within a small timeout for the imm admin(with 
default async API) command and a immcfg for node delete is attempted
- A node delete command when a lock admin operation(with default sync API 
timeout value) with a bigtimeout is in progress
- A node delete command when a lock admin operation(with default async API) 
with a bigtimeout is in progress


Hmmm pondering how else this problem could get trigerred. Some help from the 
ticket submitter to reproduce, would be helpful in the meantime.


---

** [tickets:#227] clmd asserts on active controller during node lock timeout**

**Status:** unassigned
**Created:** Wed May 15, 2013 10:23 AM UTC by Mathi Naickan
**Last Updated:** Wed May 15, 2013 11:09 AM UTC
**Owner:** Mathi Naickan

I have asked for traces from the submitter.

changeset : 4007 with patch 2865
scenario:
========
Trying to do lock/lock-in of PL-5.
amf-adm lock safNode=PL-5,safCluster=myClmCluster
error - saImmOmAdminOperationInvoke_2 FAILED: SA_AIS_ERR_TIMEOUT (5)
error: failed to eval/store amf-adm lock safNode=PL-5,safCluster=myClmCluster 
failed. Aborting script! exitCode: 1
#0 0x00007fb446240b55 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007fb446240b55 in raise () from /lib64/libc.so.6
#1 0x00007fb446242131 in abort () from /lib64/libc.so.6
#2 0x00007fb447881e44 in osafassert_fail (file=0x420380 "clms_evt.c", line=390, 

func=0x420680 "proc_node_lock_tmr_exp_msg", assertion=0x42069b "op_node != 
NULL") at sysf_def.c:301
#3 0x000000000040954a in proc_node_lock_tmr_exp_msg (evt=0x655290) at 
clms_evt.c:390
#4 0x000000000040bc42 in clms_process_mbx (mbx=0x6298a0) at clms_evt.c:1272
#5 0x0000000000412b3b in main (argc=1, argv=0x7fff3162cb28) at clms_main.c:455
(gdb) bt full
#0 0x00007fb446240b55 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007fb446242131 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00007fb447881e44 in osafassert_fail (file=0x420380 "clms_evt.c", line=390, 

func=0x420680 "proc_node_lock_tmr_exp_msg", assertion=0x42069b "op_node != 
NULL") at sysf_def.c:301
No locals.
#3 0x000000000040954a in proc_node_lock_tmr_exp_msg (evt=0x655290) at 
clms_evt.c:390
rc = 1
node_id = 132367
op_node = 0x0
FUNCTION = "proc_node_lock_tmr_exp_msg"
#4 0x000000000040bc42 in clms_process_mbx (mbx=0x6298a0) at clms_evt.c:1272
msg = 0x655290
FUNCTION = "clms_process_mbx"
#5 0x0000000000412b3b in main (argc=1, argv=0x7fff3162cb28) at clms_main.c:455
ret = 1
mbx_fd = {raise_obj = 11, rmv_obj = 12}
error = SA_AIS_OK
rc = 1
FUNCTION = "main"
syslog on sc-1:
==============
Mar 13 12:27:23 SLES1 osafclmd[6575]: clms_evt.c:390: 
proc_node_lock_tmr_exp_msg: Assertion 'op_node != NULL' failed.
Mar 13 12:27:23 SLES1 osafamfnd[6604]: NO 
'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Mar 13 12:27:23 SLES1 osafamfnd[6604]: ER 
safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Mar 13 12:27:23 SLES1 osafamfnd[6604]: Rebooting OpenSAF NodeId? = 131343 EE 
Name = , Reason: Component faulted: recovery is node failfast
Mar 13 12:27:23 SLES1 opensaf_reboot: Rebooting local node


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to