- **summary**: 2pbe: immnd crashed on all nodes and led to cluster reset -->
mds: immnd crashes and massive fevs duplicate messages seen
- **status**: assigned --> unassigned
- **assigned_to**: Anders Bjornerstedt --> nobody
- **Component**: imm --> mds
- **Priority**: major --> critical
- **Milestone**: 4.3.3 --> 4.5.0
- **Comment**:
Our testers have reported problems very similar to the symptoms reported in
this ticket. IMMND coredumps and massive ammounts of duplicate,
triplicate, quadruplicate fevs messages reported by IMMNDs to syslog.
The problem dssapeared when changeset:
5577:fd9b07b46fe5 mds: use TIPC multicast for MDS broadcast [#851]
the problems dissapeared. So I have to assume that we have a problem
with that changeset.
---
** [tickets:#1112] mds: immnd crashes and massive fevs duplicate messages seen**
**Status:** unassigned
**Milestone:** 4.5.0
**Created:** Thu Sep 18, 2014 11:07 AM UTC by surender khetavath
**Last Updated:** Thu Sep 18, 2014 03:01 PM UTC
**Owner:** nobody
changeset : 5697
As part of failovers the crash was observed
gdb on sc-1
(gdb) dir /home/staging/osaf/services/saf/immsv/immnd
Source directories searched:
/home/staging/osaf/services/saf/immsv/immnd:$cdir:$cwd
(gdb) bt
#0 0x00007f91649a0b55 in raise () from /lib64/libc.so.6
#1 0x00007f91649a2131 in abort () from /lib64/libc.so.6
#2 0x0000000000426a43 in ImmModel::prepareForSync(bool) () at ImmModel.cc:2184
#3 0x0000000000425d69 in immModel_prepareForSync () at ImmModel.cc:1805
#4 0x0000000000418686 in immnd_process_evt () at immnd_evt.c:8152
#5 0x000000000040b83b in main () at immnd_main.c:343
(gdb) fr 2
#2 0x0000000000426a43 in ImmModel::prepareForSync(bool) () at ImmModel.cc:2184
2184 abort();
(gdb) fr 3
#3 0x0000000000425d69 in immModel_prepareForSync () at ImmModel.cc:1805
1805 ImmModel::instance(&cb->immModel)->prepareForSync(isJoining);
gdb on sc-2,pl3&4
#0 0x00007f753fae3b55 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f753fae3b55 in raise () from /lib64/libc.so.6
#1 0x00007f753fae5131 in abort () from /lib64/libc.so.6
#2 0x0000000000418e40 in immnd_process_evt () at immnd_evt.c:8167
#3 0x000000000040b83b in main () at immnd_main.c:343
syslog on sc-1
Sep 18 13:54:53 SC-1 osafimmnd[2298]: ER Node is in a state that cannot accept
start of sync, will terminate
Sep 18 13:54:53 SC-1 osafimmd[2288]: WA IMMND DOWN on active controller f2
detected at standby immd!! f1. Possible failover
Sep 18 13:54:53 SC-1 osafimmd[2288]: ER Standby IMMD recieved reset message.
All IMMNDs will restart.
Sep 18 13:54:53 SC-1 osafimmd[2288]: ER IMM RELOAD => ensure cluster restart
by IMMD exit at both SCs, exiting
Sep 18 13:54:59 SC-1 kernel: [ 54.360115] eth3: no IPv6 routers present
Sep 18 13:54:59 SC-1 osaffmd[2278]: NO Node Down event for node id 2020f:
Sep 18 13:54:59 SC-1 osaffmd[2278]: NO Current role: STANDBY
Sep 18 13:54:59 SC-1 osaffmd[2278]: Rebooting OpenSAF NodeId = 0 EE Name = No
EE Mapped, Reason: Failover occurred, but this node is not yet ready, OwnNodeId
= 131343, SupervisionTime = 60
Sep 18 13:54:59 SC-1 kernel: [ 54.680115] TIPC: Resetting link
<1.1.1:eth3-1.1.2:eth2>, peer not responding
Sep 18 13:54:59 SC-1 kernel: [ 54.680128] TIPC: Lost link
<1.1.1:eth3-1.1.2:eth2> on network plane A
Sep 18 13:54:59 SC-1 kernel: [ 54.680137] TIPC: Lost contact with <1.1.2>
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Slashdot TV. Video for Nerds. Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets