Finally I could get hold of the traces! Turned out to be a simpler case than the initial analysis. Thanks surender for re-running and sharing the traces.
It's a simple case of - Issue CLM Lock of a node (PL-5) - Make the PL-5 node non-member - Lock callback timesout and the nodeentry is not found(whichis fine) and the abort gets hit. While the root cause is of an incorrectly placed abort, the fix is to lookup based on name than on id because the node with that id has gone down and is not relevant any more. Cheers, Mathi. From: surender khetavath [mailto:[email protected]] Sent: Friday, July 05, 2013 4:36 PM To: [opensaf:tickets] Subject: [opensaf:tickets] #227 clmd asserts on active controller during node lock timeout The issue is always reproducible. Test: A campaign is modeled to include PL-5 and an SU on this node. For this the script '/usr/share/opensaf/immxml/immxml-modify-config' is being used. While doing rollback clm crash is observed. It is seen that the campaign is doing a lock/lock-in op on PL-5 and simultaneously the script immxml-modify-config is also trying to perform admin lock i.e the lines below if commented in immxml-modify-config, then the rollback goes fine. if enabled then clm crashes. PLMNODE=`cat $CURRENT_NODECFG | grep ".. $node " | awk '{ print $ 3 }'` trace "PLMNODE: $PLMNODE" cmd="amf-adm lock safNode=$PLMNODE,safCluster=myClmCluster" The scripts, configuration are attached. Attachment: scripts.tgz (4.9 kB; application/x-compressed-tar) _____ HYPERLINK "http://sourceforge.net/p/opensaf/tickets/227/"[tickets:#227] clmd asserts on active controller during node lock timeout Status: unassigned Created: Wed May 15, 2013 10:23 AM UTC by Mathi Naickan Last Updated: Fri Jun 28, 2013 10:45 AM UTC Owner: Mathi Naickan I have asked for traces from the submitter. changeset : 4007 with patch 2865 scenario: ======== Trying to do lock/lock-in of PL-5. amf-adm lock safNode=PL-5,safCluster=myClmCluster error - saImmOmAdminOperationInvoke_2 FAILED: SA_AIS_ERR_TIMEOUT (5) error: failed to eval/store amf-adm lock safNode=PL-5,safCluster=myClmCluster failed. Aborting script! exitCode: 1 0 0x00007fb446240b55 in raise () from /lib64/libc.so.6 (gdb) bt 0 0x00007fb446240b55 in raise () from /lib64/libc.so.6 1 0x00007fb446242131 in abort () from /lib64/libc.so.6 2 0x00007fb447881e44 in osafassert_fail (file=0x420380 "clms_evt.c", line=390, func=0x420680 "proc_node_lock_tmr_exp_msg", assertion=0x42069b "op_node != NULL") at sysf_def.c:301 3 0x000000000040954a in proc_node_lock_tmr_exp_msg (evt=0x655290) at clms_evt.c:390 4 0x000000000040bc42 in clms_process_mbx (mbx=0x6298a0) at clms_evt.c:1272 5 0x0000000000412b3b in main (argc=1, argv=0x7fff3162cb28) at clms_main.c:455 (gdb) bt full 0 0x00007fb446240b55 in raise () from /lib64/libc.so.6 No symbol table info available. 1 0x00007fb446242131 in abort () from /lib64/libc.so.6 No symbol table info available. 2 0x00007fb447881e44 in osafassert_fail (file=0x420380 "clms_evt.c", line=390, func=0x420680 "proc_node_lock_tmr_exp_msg", assertion=0x42069b "op_node != NULL") at sysf_def.c:301 No locals. 3 0x000000000040954a in proc_node_lock_tmr_exp_msg (evt=0x655290) at clms_evt.c:390 rc = 1 node_id = 132367 op_node = 0x0 FUNCTION = "proc_node_lock_tmr_exp_msg" 4 0x000000000040bc42 in clms_process_mbx (mbx=0x6298a0) at clms_evt.c:1272 msg = 0x655290 FUNCTION = "clms_process_mbx" 5 0x0000000000412b3b in main (argc=1, argv=0x7fff3162cb28) at clms_main.c:455 ret = 1 mbx_fd = {raise_obj = 11, rmv_obj = 12} error = SA_AIS_OK rc = 1 FUNCTION = "main" syslog on sc-1: ============== Mar 13 12:27:23 SLES1 osafclmd[6575]: clms_evt.c:390: proc_node_lock_tmr_exp_msg: Assertion 'op_node != NULL' failed. Mar 13 12:27:23 SLES1 osafamfnd[6604]: NO 'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Mar 13 12:27:23 SLES1 osafamfnd[6604]: ER safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Mar 13 12:27:23 SLES1 osafamfnd[6604]: Rebooting OpenSAF NodeId? = 131343 EE Name = , Reason: Component faulted: recovery is node failfast Mar 13 12:27:23 SLES1 opensaf_reboot: Rebooting local node _____ Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/227/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
