- **Milestone**: future --> 4.3.3
- **Comment**:

I have a problem with the syslogs provided for #855.

Only the syslog provided for PL4 covers the time period of the crash of the 
immnd which also occurrs on PL4.
The crash of PL4 is right after the commit of CCB 260 at

    Apr 15 15:12:01 PL-4 osafimmnd[7650]: ER Sync-verify: ........

The syslogs for the two sc's and pl3 all START six seconds later
in apparent time and with ccb-id 335:

    Apr 15 15:20:00 SC-1 osafimmnd[2763]: NO Ccb 335 COMMITTED ..

    Apr 15 15:20:15 SC-2 osafimmnd[3130]: NO Ccb 335 COMMITTED....

    Apr 15 15:20:14 PL-3 osafimmnd[3097]: NO Ccb 335 COMMITTED


Since the problem has to do with the sync protocol, which is driven by
the immnd at one of the SCs and received by all nodes. I need to have
syslogs from at least the SCs and PL4 (where the crash occurs).

The ticket description says the cluster went for reboot. But I interpret
that as "expected", i.e. part of some test and not due to the problem
this ticket is focused on.



---

** [tickets:#855] immnd crash on payload node**

**Status:** assigned
**Milestone:** 4.3.3
**Created:** Tue Apr 15, 2014 10:11 AM UTC by surender khetavath
**Last Updated:** Tue Apr 15, 2014 11:23 AM UTC
**Owner:** Anders Bjornerstedt

case:
1) A component calls exit() when active_cbk is received. 

All the components on all the nodes, due to continuous faults received 
active-cbk and called exit() within the comp and cluster went for reboot. That 
is expected. But immnd crashed on PL-4 and PL-5

/var/log/messages on PL-4 show:

Apr 15 15:12:01 PL-4 osafimmnd[7650]: ER Sync-verify: Established node has 
different Implementer-id: 0 for name: @COMP2SU1TWONAPP, sync says 109.
Apr 15 15:12:01 PL-4 osafamfnd[7668]: NO 
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'


(gdb) fr 2
#2  0x000000000043b7ec in ImmModel::finalizeSync (this=0x6baa00, 
req=0x7fffcb25d160, isCoord=false, 
    isSyncClient=false) at ImmModel.cc:14482
14482                           abort();
(gdb) l
14477   
14478                       if(!explained) {
14479                           LOG_ER("Sync-verify: Established node has 
different "
14480                                  "Implementer-id: %u for name: %s, sync 
says %u. ",
14481                                  info->mId, implName.c_str(), ii->id);
14482                           abort();
14483                       }
14484   
14485                   } else if(info->mNodeId != ii->nodeId) {
14486                       LOG_ER("Sync-verify: Missmatch on node-id "
(gdb) p explained
$1 = false
(gdb) q

logs attached and gdb output attached. 


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to