---
** [tickets:#727] clmna core dumped on payload when the cluster is going down**
**Status:** unassigned
**Created:** Fri Jan 17, 2014 07:47 AM UTC by Sirisha Alla
**Last Updated:** Fri Jan 17, 2014 07:47 AM UTC
**Owner:** nobody
The issue is seen on 4 node SLES VM setup with changeset 4733 and with the
patches corresponding to #220.
There seems to be a tipc link flap(??) which led to the reset of the cluster.
When the payload PL-3 is going down CLMNA core dump is observed
Syslog of SC-1:
Jan 16 11:54:24 SLES-64BIT-SLOT1 osafimmd[2555]: NO 2PBE configured with
IMMSV_PEER_SC_MAX_WAIT: 30 seconds
Jan 16 11:54:28 SLES-64BIT-SLOT1 osafimmnd[2565]: Started
Jan 16 11:54:28 SLES-64BIT-SLOT1 osafimmnd[2565]: NO Persistent Back-End
capability configured, Pbe file:imm.db (suffix may get added)
Jan 16 11:54:28 SLES-64BIT-SLOT1 osafimmd[2555]: NO 2PBE wait. Passed time:3698
new timeout: 26302 msecs
Jan 16 11:54:28 SLES-64BIT-SLOT1 osafimmnd[2565]: NO SERVER STATE:
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Jan 16 11:54:28 SLES-64BIT-SLOT1 osafimmd[2555]: NO 2PBE wait. Passed time:3803
new timeout: 26197 msecs
Jan 16 11:54:28 SLES-64BIT-SLOT1 osafimmnd[2565]: NO 2PBE configured,
IMMSV_PBE_FILE_SUFFIX:.2010f (sync)
Jan 16 11:54:28 SLES-64BIT-SLOT1 osafimmnd[2565]: NO SERVER STATE:
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Jan 16 11:54:28 SLES-64BIT-SLOT1 osafimmnd[2565]: NO SERVER STATE:
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Jan 16 11:54:28 SLES-64BIT-SLOT1 osafimmnd[2565]: NO NODE STATE->
IMM_NODE_ISOLATED
Jan 16 11:54:29 SLES-64BIT-SLOT1 osafimmd[2555]: NO 2PBE wait. Passed time:4110
new timeout: 25890 msecs
Jan 16 11:54:29 SLES-64BIT-SLOT1 osafimmd[2555]: NO 2PBE wait. Passed time:4112
new timeout: 25888 msecs
Jan 16 11:54:29 SLES-64BIT-SLOT1 osafimmnd[2565]: NO Sync client discarded
classimplementer set. Impl-id:1 Class:SaLogStreamConfig
Jan 16 11:54:29 SLES-64BIT-SLOT1 osafimmnd[2565]: NO Sync client discarded
classimplementer set. Impl-id:1 Class:OpenSafLogConfig
Jan 16 11:54:29 SLES-64BIT-SLOT1 osafimmd[2555]: NO SBY: Ruling epoch noted
as:59
Jan 16 11:54:29 SLES-64BIT-SLOT1 osafimmd[2555]: NO IMMND coord at 2020f
Jan 16 11:54:29 SLES-64BIT-SLOT1 osafimmd[2555]: NO SBY:
SaImmRepositoryInitModeT changed and noted as 'SA_IMM_KEEP_REPOSITORY'
Jan 16 11:54:29 SLES-64BIT-SLOT1 osafimmnd[2565]: NO NODE STATE->
IMM_NODE_W_AVAILABLE
Jan 16 11:54:29 SLES-64BIT-SLOT1 osafimmnd[2565]: NO Implementer connected: 3
(safClmService) <0, 2020f>
Jan 16 11:54:29 SLES-64BIT-SLOT1 osafimmnd[2565]: NO SERVER STATE:
IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
Jan 16 11:54:30 SLES-64BIT-SLOT1 kernel: [ 63.163540] TIPC: Resetting link
<1.1.1:eth0-1.1.4:eth0>, requested by peer
Jan 16 11:54:30 SLES-64BIT-SLOT1 kernel: [ 63.163540] TIPC: Lost link
<1.1.1:eth0-1.1.4:eth0> on network plane A
Jan 16 11:54:30 SLES-64BIT-SLOT1 kernel: [ 63.163540] TIPC: Lost contact with
<1.1.4>
Jan 16 11:54:30 SLES-64BIT-SLOT1 kernel: [ 63.163540] TIPC: Resetting link
<1.1.1:eth0-1.1.3:eth0>, requested by peer
Jan 16 11:54:30 SLES-64BIT-SLOT1 kernel: [ 63.163540] TIPC: Lost link
<1.1.1:eth0-1.1.3:eth0> on network plane A
Jan 16 11:54:30 SLES-64BIT-SLOT1 kernel: [ 63.163540] TIPC: Lost contact with
<1.1.3>
Jan 16 11:54:30 SLES-64BIT-SLOT1 kernel: [ 63.163540] TIPC: Resetting link
<1.1.1:eth0-1.1.2:eth0>, requested by peer
Jan 16 11:54:30 SLES-64BIT-SLOT1 kernel: [ 63.163540] TIPC: Lost link
<1.1.1:eth0-1.1.2:eth0> on network plane A
Jan 16 11:54:30 SLES-64BIT-SLOT1 kernel: [ 63.163540] TIPC: Lost contact with
<1.1.2>
Jan 16 11:54:30 SLES-64BIT-SLOT1 kernel: [ 63.163540] TIPC: Established link
<1.1.1:eth0-1.1.4:eth0> on network plane A
Jan 16 11:54:30 SLES-64BIT-SLOT1 kernel: [ 63.163540] TIPC: Established link
<1.1.1:eth0-1.1.2:eth0> on network plane A
Jan 16 11:54:30 SLES-64BIT-SLOT1 osaffmd[2545]: NO Role: STANDBY, Node Down for
node id: 2020f
Jan 16 11:54:30 SLES-64BIT-SLOT1 osaffmd[2545]: Rebooting OpenSAF NodeId = 0 EE
Name = No EE Mapped, Reason: Failover occurred, but this node is not yet ready,
OwnNodeId = 131343, SupervisionTime = 60
Syslog of SC-2:
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmd[2343]: NO New IMMND process is on
STANDBY Controller at 2010f
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmd[2343]: NO Extended intro from node
2010f
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmpbed: IN arg[0] ==
'/usr/lib64/opensaf/osafimmpbed'
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmpbed: IN arg[1] == '--pbe2A'
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmpbed: IN arg[2] ==
'/home/sirisha/immsv/immpbe/imm.db.2020f'
Jan 16 11:54:20 SLES-64BIT-SLOT2 osaflogd[2385]: Started
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmpbed: IN Generating DB file from
current IMM state. DB file: /home/sirisha/immsv/immpbe/imm.db.2020f
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmpbed: NO Successfully opened empty
local sqlite pbe file /tmp/imm.db.O23tIO
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmd[2343]: WA IMMND on controller (not
currently coord) requests sync
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmd[2343]: NO Node 2010f request sync
sync-pid:2565 epoch:0
Jan 16 11:54:20 SLES-64BIT-SLOT2 osaflogd[2385]: NO log root directory is:
/var/log/opensaf/saflog
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmnd[2353]: NO Implementer connected: 1
(safLogService) <7, 2020f>
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmnd[2353]: NO implementer for class
'SaLogStreamConfig' is safLogService => class extent is safe.
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmnd[2353]: NO implementer for class
'OpenSafLogConfig' is safLogService => class extent is safe.
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafntfd[2401]: Started
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmnd[2353]: NO Implementer (applier)
connected: 2 (@OpenSafImmReplicatorA) <16, 2020f>
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafntfimcnd[2408]: NO Started
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafclmd[2415]: Started
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmnd[2353]: NO Announce sync, epoch:59
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmnd[2353]: NO SERVER STATE:
IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmnd[2353]: NO NODE STATE->
IMM_NODE_R_AVAILABLE
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmd[2343]: NO Successfully announced
sync. New ruling epoch:59
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmnd[2353]: NO Implementer connected: 3
(safClmService) <18, 2020f>
Jan 16 11:54:20 SLES-64BIT-SLOT2 osafimmloadd: NO Sync starting
Jan 16 11:54:21 SLES-64BIT-SLOT2 osafimmnd[2353]: WA Cannot allow official
dump/backup when imm-sync is in progress
Jan 16 11:54:22 SLES-64BIT-SLOT2 osafimmnd[2353]: WA Cannot allow official
dump/backup when imm-sync is in progress
Jan 16 11:54:23 SLES-64BIT-SLOT2 osafimmnd[2353]: WA Cannot allow official
dump/backup when imm-sync is in progress
Jan 16 11:54:23 SLES-64BIT-SLOT2 osaffmd[2333]: NO Role: ACTIVE, Node Down for
node id: 2010f
Jan 16 11:54:23 SLES-64BIT-SLOT2 osaffmd[2333]: Rebooting OpenSAF NodeId = 0 EE
Name = No EE Mapped, Reason: Failover occurred, but this node is not yet ready,
OwnNodeId = 131599, SupervisionTime = 60
Jan 16 11:54:23 SLES-64BIT-SLOT2 kernel: [ 95.696195] TIPC: Resetting link
<1.1.2:eth0-1.1.1:eth0>, peer not responding
Syslog of PL-3:
Jan 16 11:54:33 SLES-64BIT-SLOT3 kernel: [ 432.560674] TIPC: Established link
<1.1.3:eth0-1.1.1:eth0> on network plane A
Jan 16 11:54:33 SLES-64BIT-SLOT3 osafimmnd[4300]: WA DISCARD DUPLICATE FEVS
message:24684
Jan 16 11:54:33 SLES-64BIT-SLOT3 osafimmnd[4300]: WA Error code 2 returned for
message type 57 - ignoring
Jan 16 11:54:33 SLES-64BIT-SLOT3 osafimmnd[4300]: WA DISCARD DUPLICATE FEVS
message:24763
Jan 16 11:54:33 SLES-64BIT-SLOT3 osafimmnd[4300]: WA Error code 2 returned for
message type 57 - ignoring
Jan 16 11:54:33 SLES-64BIT-SLOT3 osafimmnd[4300]: WA DISCARD DUPLICATE FEVS
message:24764
Jan 16 11:54:33 SLES-64BIT-SLOT3 osafimmnd[4300]: WA Error code 2 returned for
message type 57 - ignoring
Jan 16 11:54:33 SLES-64BIT-SLOT3 osafimmnd[4300]: NO Global discard node
received for nodeId:2020f pid:2353
Jan 16 11:54:33 SLES-64BIT-SLOT3 osafimmnd[4300]: NO Implementer disconnected 1
<0, 2020f(down)> (safLogService)
Jan 16 11:54:33 SLES-64BIT-SLOT3 osafimmnd[4300]: NO Implementer disconnected 3
<0, 2020f(down)> (safClmService)
Jan 16 11:54:33 SLES-64BIT-SLOT3 osafimmnd[4300]: NO Implementer disconnected 2
<0, 2020f(down)> (@OpenSafImmReplicatorA)
Jan 16 11:54:34 SLES-64BIT-SLOT3 osafimmnd[4300]: WA DISCARD DUPLICATE FEVS
message:24765
Jan 16 11:54:34 SLES-64BIT-SLOT3 osafimmnd[4300]: WA Error code 2 returned for
message type 57 - ignoring
Jan 16 11:54:34 SLES-64BIT-SLOT3 osafimmnd[4300]: WA DISCARD DUPLICATE FEVS
message:24778
Jan 16 11:54:34 SLES-64BIT-SLOT3 osafimmnd[4300]: WA Error code 2 returned for
message type 57 - ignoring
Jan 16 11:54:34 SLES-64BIT-SLOT3 osafimmnd[4300]: WA DISCARD DUPLICATE FEVS
message:24779
Jan 16 11:54:34 SLES-64BIT-SLOT3 osafimmnd[4300]: WA Error code 2 returned for
message type 57 - ignoring
Jan 16 11:54:34 SLES-64BIT-SLOT3 osafimmnd[4300]: NO Global discard node
received for nodeId:2020f pid:0
Jan 16 11:54:35 SLES-64BIT-SLOT3 osafimmnd[4300]: ER IMMND forced to restart on
order from IMMD, exiting
Jan 16 11:54:39 SLES-64BIT-SLOT3 osafclmna[4325]: ER Exiting
Jan 16 11:54:39 SLES-64BIT-SLOT3 opensafd[4230]: ER Failed DESC:CLMNA
Jan 16 11:54:39 SLES-64BIT-SLOT3 opensafd[4230]: ER Going for recovery
Jan 16 11:54:39 SLES-64BIT-SLOT3 opensafd[4230]: ER Trying To RESPAWN
/usr/lib64/opensaf/clc-cli/osaf-clmna attempt #1
Jan 16 11:54:39 SLES-64BIT-SLOT3 opensafd[4230]: ER Sending SIGKILL to CLMNA,
pid=4320
Jan 16 11:54:40 SLES-64BIT-SLOT3 kernel: [ 439.296429] TIPC: Resetting link
<1.1.3:eth0-1.1.2:eth0>, peer not responding
Jan 16 11:54:40 SLES-64BIT-SLOT3 kernel: [ 439.296460] TIPC: Lost link
<1.1.3:eth0-1.1.2:eth0> on network plane A
Jan 16 11:54:40 SLES-64BIT-SLOT3 kernel: [ 439.296469] TIPC: Lost contact with
<1.1.2>
Jan 16 11:54:40 SLES-64BIT-SLOT3 kernel: [ 439.548194] TIPC: Resetting link
<1.1.3:eth0-1.1.1:eth0>, peer not responding
Jan 16 11:54:40 SLES-64BIT-SLOT3 kernel: [ 439.548208] TIPC: Lost link
<1.1.3:eth0-1.1.1:eth0> on network plane A
Jan 16 11:54:40 SLES-64BIT-SLOT3 kernel: [ 439.548221] TIPC: Lost contact with
<1.1.1>
Jan 16 11:54:54 SLES-64BIT-SLOT3 osafclmna[4348]: Started
Jan 16 11:55:23 SLES-64BIT-SLOT3 kernel: [ 482.130080] TIPC: Established link
<1.1.3:eth0-1.1.2:eth0> on network plane A
Jan 16 11:55:26 SLES-64BIT-SLOT3 kernel: [ 485.847290] TIPC: Established link
<1.1.3:eth0-1.1.1:eth0> on network plane A
Jan 16 11:55:34 SLES-64BIT-SLOT3 opensafd[4230]: ER Timed-out for response from
CLMNA
Jan 16 11:55:34 SLES-64BIT-SLOT3 opensafd[4230]: ER Could Not RESPAWN CLMNA
Jan 16 11:55:34 SLES-64BIT-SLOT3 opensafd[4230]: ER
Jan 16 11:55:34 SLES-64BIT-SLOT3 opensafd[4230]: ER Trying To RESPAWN
/usr/lib64/opensaf/clc-cli/osaf-clmna attempt #2
Jan 16 11:55:34 SLES-64BIT-SLOT3 opensafd[4230]: ER Sending SIGKILL to CLMNA,
pid=4343
Jan 16 11:55:34 SLES-64BIT-SLOT3 osafclmna[4348]: exiting on signal 15
Jan 16 11:55:49 SLES-64BIT-SLOT3 osafclmna[4375]: Started
Jan 16 11:56:27 SLES-64BIT-SLOT3 osafclmna[4375]: NO
safNode=PL-3,safCluster=myClmCluster Joined cluster, nodeid=2030f
Jan 16 11:56:27 SLES-64BIT-SLOT3 osafamfnd[4396]: Started
Jan 16 12:06:34 SLES-64BIT-SLOT3 kernel: [ 1153.316104] TIPC: Resetting link
<1.1.3:eth0-1.1.4:eth0>, peer not responding
Jan 16 12:06:34 SLES-64BIT-SLOT3 kernel: [ 1153.316111] TIPC: Lost link
<1.1.3:eth0-1.1.4:eth0> on network plane A
Jan 16 12:06:34 SLES-64BIT-SLOT3 kernel: [ 1153.316116] TIPC: Lost contact with
<1.1.4>
Jan 16 12:09:07 SLES-64BIT-SLOT3 osafamfnd[4396]: saImmOmInitialize FAILED, rc
= 6
Jan 16 12:12:57 SLES-64BIT-SLOT3 opensafd[4230]: ER Timed-out for response from
AMFND
Jan 16 12:12:57 SLES-64BIT-SLOT3 opensafd[4230]: ER
Jan 16 12:12:57 SLES-64BIT-SLOT3 opensafd[4230]: ER Going for recovery
Jan 16 12:12:57 SLES-64BIT-SLOT3 osafclmna[4375]: exiting on signal 15
At 11:54 on PL-3 crash is observed. Following is the backtrace of clmna crash:
Core was generated by `/usr/lib64/opensaf/osafclmna --tracemask=0xffffffff'.
Program terminated with signal 6, Aborted.
#0 0x00007f6a3a0ccb55 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f6a3a0ccb55 in raise () from /lib64/libc.so.6
#1 0x00007f6a3a0ce131 in abort () from /lib64/libc.so.6
#2 0x00007f6a3a109c2f in __libc_message () from /lib64/libc.so.6
#3 0x00007f6a3a10f358 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007f6a3a1142fc in free () from /lib64/libc.so.6
#5 0x000000000040251b in clmna_process_mbx (mbx=<optimized out>) at main.c:515
#6 0x0000000000402c12 in main (argc=<optimized out>, argv=<optimized out>) at
main.c:634
(gdb) thread apply all bt
Thread 3 (Thread 0x7f6a3b0afb00 (LWP 4328)):
#0 0x00007f6a3a1684f6 in poll () from /lib64/libc.so.6
#1 0x00007f6a3aca49ae in mdtm_process_recv_events () at mds_dt_tipc.c:580
#2 0x00007f6a3a4157b6 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f6a3a1719cd in clone () from /lib64/libc.so.6
#4 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7f6a3b0deb00 (LWP 4327)):
#0 0x00007f6a3a1684f6 in poll () from /lib64/libc.so.6
#1 0x00007f6a3ac695ba in osaf_poll_no_timeout (io_fds=0x7f6a3b0de290,
i_nfds=1) at osaf_poll.c:31
#2 0x00007f6a3ac697b5 in osaf_ppoll (io_fds=0x7f6a3b0de290, i_nfds=1,
i_timeout_ts=0xffffffffffffffff, i_sigmask=0xffffffffffffffff) at osaf_poll.c:78
#3 0x00007f6a3ac6fe2f in ncs_tmr_wait () at sysf_tmr.c:411
#4 0x00007f6a3a4157b6 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f6a3a1719cd in clone () from /lib64/libc.so.6
#6 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7f6a3b0b2700 (LWP 4325)):
#0 0x00007f6a3a0ccb55 in raise () from /lib64/libc.so.6
#1 0x00007f6a3a0ce131 in abort () from /lib64/libc.so.6
#2 0x00007f6a3a109c2f in __libc_message () from /lib64/libc.so.6
#3 0x00007f6a3a10f358 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007f6a3a1142fc in free () from /lib64/libc.so.6
#5 0x000000000040251b in clmna_process_mbx (mbx=<optimized out>) at main.c:515
#6 0x0000000000402c12 in main (argc=<optimized out>, argv=<optimized out>) at
main.c:634
(gdb) q
Attached the CLMNA traces. This issue may not be reproducible.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets