- **Milestone**: future --> never


---

** [tickets:#37] cluster reset when IMMND is killed on new active during 
switchover**

**Status:** invalid
**Milestone:** never
**Created:** Tue May 07, 2013 11:24 AM UTC by Sirisha Alla
**Last Updated:** Mon May 13, 2013 06:59 AM UTC
**Owner:** Neelakanta Reddy

The issue is seen on OEL6.4 TCP with IPV6(link-local). OpenSAF is up and 
running with cs4241(4.3 GA Tag) and patches #2794 and #3117.

Steps performed:

1) At the beginning of the test, SC-1(SLOT1) is standby and SC-2(SLOT-2) is 
Active.
2) SI Swap of the controllers is initiated and IMMND is killed on SC-1

May  7 15:43:05 OEL-64BIT-SLOT2 osafamfd[18014]: NO safSi=SC-2N,safApp=OpenSAF 
Swap initiated
May  7 15:43:05 OEL-64BIT-SLOT2 osafamfd[18014]: NO Controller switch over 
initiated
May  7 15:43:06 OEL-64BIT-SLOT2 osafamfnd[18031]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'

May  7 15:43:08 OEL-64BIT-SLOT1 osafntfimcnd[18589]: ER saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
May  7 15:43:08 OEL-64BIT-SLOT1 osafamfd[13669]: NO Re-initializing with IMM
May  7 15:43:08 OEL-64BIT-SLOT1 osafamfnd[13686]: NO 
'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'
May  7 15:43:08 OEL-64BIT-SLOT1 osafimmd[13398]: WA Error returned from 
processing message err:0 msg-type:11
May  7 15:43:08 OEL-64BIT-SLOT1 osafimmnd[18616]: Started

SC-1 went for reboot citing:

May  7 15:43:08 OEL-64BIT-SLOT1 osafimmd[13398]: NO New IMMND process is on 
ACTIVE Controller at 2010f
May  7 15:43:08 OEL-64BIT-SLOT1 osafimmnd[18616]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
May  7 15:43:09 OEL-64BIT-SLOT1 osafimmnd[18616]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
May  7 15:43:09 OEL-64BIT-SLOT1 osafimmnd[18616]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
May  7 15:43:09 OEL-64BIT-SLOT1 osafimmd[13398]: WA IMMND on controller (not 
currently coord) requests sync
May  7 15:43:09 OEL-64BIT-SLOT1 osafimmnd[18616]: NO NODE STATE-> 
IMM_NODE_ISOLATED
May  7 15:43:09 OEL-64BIT-SLOT1 osafimmd[13398]: NO Node 2010f request sync 
sync-pid:18616 epoch:0
May  7 15:43:09 OEL-64BIT-SLOT1 osafimmnd[18616]: NO NODE STATE-> 
IMM_NODE_W_AVAILABLE
May  7 15:43:09 OEL-64BIT-SLOT1 osafimmd[13398]: NO Successfully announced 
sync. New ruling epoch:33
May  7 15:43:09 OEL-64BIT-SLOT1 osafimmnd[18616]: NO SERVER STATE: 
IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
May  7 15:43:18 OEL-64BIT-SLOT1 osaflogd[13604]: ER saImmOiClassImplementerSet 
(safLogService) failed: 9
May  7 15:43:18 OEL-64BIT-SLOT1 osafclmd[13638]: ER saImmOiClassImplementerSet 
failed for class SaClmNode rc:9, exiting
May  7 15:43:18 OEL-64BIT-SLOT1 osafamfnd[13686]: NO 
'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
May  7 15:43:18 OEL-64BIT-SLOT1 osafamfnd[13686]: ER 
safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
May  7 15:43:18 OEL-64BIT-SLOT1 osafamfnd[13686]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Component faulted: recovery is node failfast
May  7 15:43:18 OEL-64BIT-SLOT1 opensaf_reboot: Rebooting local node
May  7 15:43:18 OEL-64BIT-SLOT1 osafsmfd[13745]: ER amf_active_state_handler oi 
activate FAILED

IMMD on SC-2 core dumped as follows:

May  7 15:43:55 OEL-64BIT-SLOT2 osafimmnd[17768]: NO Announce sync, epoch:34
May  7 15:43:55 OEL-64BIT-SLOT2 osafimmnd[17768]: NO SERVER STATE: 
IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER
May  7 15:43:55 OEL-64BIT-SLOT2 osafimmd[17753]: WA Wrong Epoch 32 != 33
May  7 15:43:55 OEL-64BIT-SLOT2 osafimmd[17753]: WA Error returned from 
processing message err:2 msg-type:6
May  7 15:44:16 OEL-64BIT-SLOT2 osafimmnd[17768]: NO Still waiting for existing 
Ccbs to terminate after 21 seconds. Aborting this sync attempt
May  7 15:44:16 OEL-64BIT-SLOT2 osafimmnd[17768]: WA Abort sync received while 
being fully available, should not happen.
May  7 15:44:16 OEL-64BIT-SLOT2 osafimmnd[17768]: NO Epoch set to 34 in ImmModel
May  7 15:44:16 OEL-64BIT-SLOT2 osafimmd[17753]: NO ACT: New Epoch for IMMND 
process at node 2020f old epoch: 32  new epoch:34
May  7 15:44:16 OEL-64BIT-SLOT2 osafimmd[17753]: ER immd_evt_proc_immnd_intro: 
syncStarted true for node with strange epoch node_info->epoch(34) != 
cb->mRulingEpoc(33)
May  7 15:44:16 OEL-64BIT-SLOT2 osafimmnd[17768]: NO Coord broadcasting 
ABORT_SYNC, epoch:34
May  7 15:44:16 OEL-64BIT-SLOT2 rpc.idmapd[1143]: nss_getpwnam: name 'sirisha' 
not found in domain 'localdomain'
May  7 15:44:16 OEL-64BIT-SLOT2 osafimmnd[17768]: NO SERVER STATE: 
IMM_SERVER_SYNC_SERVER --> IMM SERVER READY
May  7 15:44:16 OEL-64BIT-SLOT2 osafimmnd[17768]: WA DISCARD DUPLICATE FEVS 
message:43301
May  7 15:44:16 OEL-64BIT-SLOT2 osafimmnd[17768]: WA Error code 2 returned for 
message type 57 - ignoring
May  7 15:44:16 OEL-64BIT-SLOT2 osafamfnd[18031]: NO 
'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
May  7 15:44:16 OEL-64BIT-SLOT2 osafamfnd[18031]: ER 
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
May  7 15:44:16 OEL-64BIT-SLOT2 osafamfnd[18031]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Component faulted: recovery is node failfast
May  7 15:44:16 OEL-64BIT-SLOT2 opensaf_reboot: Rebooting local node


The backtrace of the corefile is as follows:

   (gdb) bt
   #0  0x0000003c0be328a5 in raise () from /lib64/libc.so.6
   #1  0x0000003c0be34085 in abort () from /lib64/libc.so.6
   #2  0x00000000004066db in immd_evt_proc_immnd_intro (cb=0x62a940, 
evt=0x7fc0d4001e80, sinfo=0x7fc0d4001fc0) at immd_evt.c:1167
   #3  0x000000000040366f in immd_process_evt () at immd_evt.c:105
   #4  0x0000000000409762 in main (argc=2, argv=0x7fff7f526308) at 
immd_main.c:242
   (gdb) fr 2
    #2  0x00000000004066db in immd_evt_proc_immnd_intro (cb=0x62a940, 
evt=0x7fc0d4001e80, sinfo=0x7fc0d4001fc0) at immd_evt.c:1167
1167                                            abort();
    (gdb) bt full
   #0  0x0000003c0be328a5 in raise () from /lib64/libc.so.6
    No symbol table info available.
   #1  0x0000003c0be34085 in abort () from /lib64/libc.so.6
   No symbol table info available.
   #2  0x00000000004066db in immd_evt_proc_immnd_intro (cb=0x62a940, 
evt=0x7fc0d4001e80, sinfo=0x7fc0d4001fc0) at immd_evt.c:1167
      proc_rc = 1
      node_info = 0x17ca6a0
      oldPid = 17768
      newPid = 17768
      oldEpoch = 32
      newEpoch = 34
      __FUNCTION__ = "immd_evt_proc_immnd_intro"
   #3  0x000000000040366f in immd_process_evt () at immd_evt.c:105
      cb = 0x62a940
      rc = 1
      evt = 0x7fc0d4001e70
      __FUNCTION__ = "immd_process_evt"
   #4  0x0000000000409762 in main (argc=2, argv=0x7fff7f526308) at 
immd_main.c:242
    ret = 1
    error = SA_AIS_OK
    mbx_fd = {raise_obj = 11, rmv_obj = 12}
    fds = {{fd = 16, events = 1, revents = 0}, {fd = 14, events = 1, revents = 
0}, {fd = 12, events = 1, revents = 1}}
        __FUNCTION__ = "main"
   (gdb) p *cb
   $1 = {mbx = 4291821569, comp_name = {length = 47, value = 
"safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF", '\000' <repeats 208 times>}, 
  mds_handle = 65549, immd_anc = 0, mds_role = V_DEST_RL_ACTIVE, immd_dest_id = 
13, mbcsv_handle = 4293918753, mbcsv_sel_obj = 14, o_ckpt_hdl = 4292870177, 
immd_sync_cnt = 12134, immd_self_id = 242, immd_remote_id = 241, immd_remote_up 
= true, node_id = 131599, is_loc_immnd_up = true, is_rem_immnd_up = true, 
is_quiesced_set = false, loc_immnd_dest = 565213401204072, rem_immnd_dest = 
564113889559952, immnd_tree = {root_node = {bit = -1, left = 0x17ca6a0, right = 
0x62aaa8, key_info = 0x17a9030 ""}, params = {key_size = 4, info_size = 0, 
actual_key_size = 0, node_size = 0}, n_nodes = 4}, is_immnd_tree_up = true,  
amf_hdl = 4289724418, clm_hdl = 0, ha_state = SA_AMF_HA_ACTIVE, edu_hdl = 
{is_inited = true, tree = {root_node = {bit = -1, left = 0x62ab08, right = 
0x62ab08, key_info = 0x17a9010 ""}, params = {key_size = 8, info_size = 32704,  
actual_key_size = 193073544, node_size = 60}, n_nodes = 0}, to_version = 0}, 
amf_invocation = 4234149894, admo_id_count = 18, ccb_id_count = 1, impl_co
 unt = 56, fevsSendCount = 43301, mRulingEpoch = 33, mExpectedNodes = 0 '\000', 
WaitSecs = 0 '\000', immnd_coord = 131599, usr1_sel_obj = {raise_obj = 15, 
rmv_obj = 16}, amf_sel_obj = 16, saved_msgs = 0x0, mRim = 
SA_IMM_KEEP_REPOSITORY}
  (gdb) p *evt
   $2 = {type = IMMD_EVT_ND2D_INTRO, info = {ctrl_msg = {ndExecPid = 17768, 
epoch = 34, refresh = 1 '\001', pbeEnabled = 1 '\001'}, admown_init = 
{client_hdl = 146028905832, i = {adminOwnerName = {length = 257, value = '\000' 
<repeats 255 times>}, releaseOwnershipOnFinalize = SA_FALSE}}, ccb_init = 
{adminOwnerId = 17768, ccbFlags = 257, client_hdl = 0}, impl_set = {r = 
{client_hdl = 146028905832, impl_name = {size = 257, buf = 0x0}, impl_id = 0, 
scope = 0}, reply_dest = 0}, objModify = {ccbId = 17768, adminOwnerId = 34, 
objectName = {size = 257, buf = 0x0}, attrMods = 0x0, immHandle = 0}, ccbId = 
17768, admoId = 17768, fevsReq = {sender_count = 146028905832, reply_dest = 
257, client_hdl = 0, msg = {size = 0, buf = 0x0}, isObjSync = 0 '\000'}, 
tmr_info = {type = 17768, info = {immnd_dest = 257}}, mds_info = {change = 
17768, dest = 257, svc_id = 0,  node_id = 0, role = 0}, rda_info = {io_role = 
17768}, syncFevsBase = {fevsBase = 146028905832, client_hdl = 257}}}


IMMD and IMMND traces are available and huge. Can be shared on request


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to