Sourceforge ticket system is currently down so just sending this email instead 
of updateing ticket.
I hae a problem with the syslogs provided for #855.

Only the syslog provided for PL4 covers the time period of the crash of the 
immnd which also occurrs on PL4.
The crash of PL4 is right after the commit of CCB 260 at

    Apr 15 15:12:01 PL-4 osafimmnd[7650]: ER Sync-verify: ........

The syslogs for the two sc's and pl3 all START six seconds later
in apparent time and with ccb-id 335:

    Apr 15 15:20:00 SC-1 osafimmnd[2763]: NO Ccb 335 COMMITTED ..

    Apr 15 15:20:15 SC-2 osafimmnd[3130]: NO Ccb 335 COMMITTED....

    Apr 15 15:20:14 PL-3 osafimmnd[3097]: NO Ccb 335 COMMITTED


Since the problem has to do with the sync protocol, which is driven by
the immnd at one of the SCs and received by all nodes. I need to have
syslogs from at least the SCs and PL4 (where the crash occurs).

The ticket description says the cluster went for reboot. But I interpret
that as "expected", i.e. part of some test and not due to the problem
this ticket is focused on.

/AndersBj



________________________________
From: surender khetavath [mailto:[email protected]]
Sent: den 15 april 2014 12:12
To: [email protected]
Subject: [tickets] [opensaf:tickets] #855 immnd crash on payload node

________________________________

[tickets:#855]<http://sourceforge.net/p/opensaf/tickets/855/> immnd crash on 
payload node

Status: unassigned
Milestone: future
Created: Tue Apr 15, 2014 10:11 AM UTC by surender khetavath
Last Updated: Tue Apr 15, 2014 10:11 AM UTC
Owner: nobody

case:
1) A component calls exit() when active_cbk is received.

All the components on all the nodes, due to continuous faults received 
active-cbk and called exit() within the comp and cluster went for reboot. That 
is expected. But immnd crashed on PL-4 and PL-5

/var/log/messages on PL-4 show:

Apr 15 15:12:01 PL-4 osafimmnd[7650]: ER Sync-verify: Established node has 
different Implementer-id: 0 for name: @COMP2SU1TWONAPP, sync says 109.
Apr 15 15:12:01 PL-4 osafamfnd[7668]: NO 
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'

(gdb) fr 2

2 0x000000000043b7ec in ImmModel::finalizeSync (this=0x6baa00, 
req=0x7fffcb25d160, isCoord=false,

isSyncClient=false) at ImmModel.cc:14482


14482 abort();
(gdb) l
14477
14478 if(!explained) {
14479 LOG_ER("Sync-verify: Established node has different "
14480 "Implementer-id: %u for name: %s, sync says %u. ",
14481 info->mId, implName.c_str(), ii->id);
14482 abort();
14483 }
14484
14485 } else if(info->mNodeId != ii->nodeId) {
14486 LOG_ER("Sync-verify: Missmatch on node-id "
(gdb) p explained
$1 = false
(gdb) q

logs attached and gdb output attached.

________________________________

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to