- **Milestone**: future --> never
---
** [tickets:#37] cluster reset when IMMND is killed on new active during
switchover**
**Status:** invalid
**Milestone:** never
**Created:** Tue May 07, 2013 11:24 AM UTC by Sirisha Alla
**Last Updated:** Mon May 13, 2013 06:59 AM UTC
**Owner:** Neelakanta Reddy
The issue is seen on OEL6.4 TCP with IPV6(link-local). OpenSAF is up and
running with cs4241(4.3 GA Tag) and patches #2794 and #3117.
Steps performed:
1) At the beginning of the test, SC-1(SLOT1) is standby and SC-2(SLOT-2) is
Active.
2) SI Swap of the controllers is initiated and IMMND is killed on SC-1
May 7 15:43:05 OEL-64BIT-SLOT2 osafamfd[18014]: NO safSi=SC-2N,safApp=OpenSAF
Swap initiated
May 7 15:43:05 OEL-64BIT-SLOT2 osafamfd[18014]: NO Controller switch over
initiated
May 7 15:43:06 OEL-64BIT-SLOT2 osafamfnd[18031]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
May 7 15:43:08 OEL-64BIT-SLOT1 osafntfimcnd[18589]: ER saImmOiDispatch() Fail
SA_AIS_ERR_BAD_HANDLE (9)
May 7 15:43:08 OEL-64BIT-SLOT1 osafamfd[13669]: NO Re-initializing with IMM
May 7 15:43:08 OEL-64BIT-SLOT1 osafamfnd[13686]: NO
'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown'
: Recovery is 'componentRestart'
May 7 15:43:08 OEL-64BIT-SLOT1 osafimmd[13398]: WA Error returned from
processing message err:0 msg-type:11
May 7 15:43:08 OEL-64BIT-SLOT1 osafimmnd[18616]: Started
SC-1 went for reboot citing:
May 7 15:43:08 OEL-64BIT-SLOT1 osafimmd[13398]: NO New IMMND process is on
ACTIVE Controller at 2010f
May 7 15:43:08 OEL-64BIT-SLOT1 osafimmnd[18616]: NO SERVER STATE:
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
May 7 15:43:09 OEL-64BIT-SLOT1 osafimmnd[18616]: NO SERVER STATE:
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
May 7 15:43:09 OEL-64BIT-SLOT1 osafimmnd[18616]: NO SERVER STATE:
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
May 7 15:43:09 OEL-64BIT-SLOT1 osafimmd[13398]: WA IMMND on controller (not
currently coord) requests sync
May 7 15:43:09 OEL-64BIT-SLOT1 osafimmnd[18616]: NO NODE STATE->
IMM_NODE_ISOLATED
May 7 15:43:09 OEL-64BIT-SLOT1 osafimmd[13398]: NO Node 2010f request sync
sync-pid:18616 epoch:0
May 7 15:43:09 OEL-64BIT-SLOT1 osafimmnd[18616]: NO NODE STATE->
IMM_NODE_W_AVAILABLE
May 7 15:43:09 OEL-64BIT-SLOT1 osafimmd[13398]: NO Successfully announced
sync. New ruling epoch:33
May 7 15:43:09 OEL-64BIT-SLOT1 osafimmnd[18616]: NO SERVER STATE:
IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
May 7 15:43:18 OEL-64BIT-SLOT1 osaflogd[13604]: ER saImmOiClassImplementerSet
(safLogService) failed: 9
May 7 15:43:18 OEL-64BIT-SLOT1 osafclmd[13638]: ER saImmOiClassImplementerSet
failed for class SaClmNode rc:9, exiting
May 7 15:43:18 OEL-64BIT-SLOT1 osafamfnd[13686]: NO
'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
May 7 15:43:18 OEL-64BIT-SLOT1 osafamfnd[13686]: ER
safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery
is:nodeFailfast
May 7 15:43:18 OEL-64BIT-SLOT1 osafamfnd[13686]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: Component faulted: recovery is node failfast
May 7 15:43:18 OEL-64BIT-SLOT1 opensaf_reboot: Rebooting local node
May 7 15:43:18 OEL-64BIT-SLOT1 osafsmfd[13745]: ER amf_active_state_handler oi
activate FAILED
IMMD on SC-2 core dumped as follows:
May 7 15:43:55 OEL-64BIT-SLOT2 osafimmnd[17768]: NO Announce sync, epoch:34
May 7 15:43:55 OEL-64BIT-SLOT2 osafimmnd[17768]: NO SERVER STATE:
IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER
May 7 15:43:55 OEL-64BIT-SLOT2 osafimmd[17753]: WA Wrong Epoch 32 != 33
May 7 15:43:55 OEL-64BIT-SLOT2 osafimmd[17753]: WA Error returned from
processing message err:2 msg-type:6
May 7 15:44:16 OEL-64BIT-SLOT2 osafimmnd[17768]: NO Still waiting for existing
Ccbs to terminate after 21 seconds. Aborting this sync attempt
May 7 15:44:16 OEL-64BIT-SLOT2 osafimmnd[17768]: WA Abort sync received while
being fully available, should not happen.
May 7 15:44:16 OEL-64BIT-SLOT2 osafimmnd[17768]: NO Epoch set to 34 in ImmModel
May 7 15:44:16 OEL-64BIT-SLOT2 osafimmd[17753]: NO ACT: New Epoch for IMMND
process at node 2020f old epoch: 32 new epoch:34
May 7 15:44:16 OEL-64BIT-SLOT2 osafimmd[17753]: ER immd_evt_proc_immnd_intro:
syncStarted true for node with strange epoch node_info->epoch(34) !=
cb->mRulingEpoc(33)
May 7 15:44:16 OEL-64BIT-SLOT2 osafimmnd[17768]: NO Coord broadcasting
ABORT_SYNC, epoch:34
May 7 15:44:16 OEL-64BIT-SLOT2 rpc.idmapd[1143]: nss_getpwnam: name 'sirisha'
not found in domain 'localdomain'
May 7 15:44:16 OEL-64BIT-SLOT2 osafimmnd[17768]: NO SERVER STATE:
IMM_SERVER_SYNC_SERVER --> IMM SERVER READY
May 7 15:44:16 OEL-64BIT-SLOT2 osafimmnd[17768]: WA DISCARD DUPLICATE FEVS
message:43301
May 7 15:44:16 OEL-64BIT-SLOT2 osafimmnd[17768]: WA Error code 2 returned for
message type 57 - ignoring
May 7 15:44:16 OEL-64BIT-SLOT2 osafamfnd[18031]: NO
'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
May 7 15:44:16 OEL-64BIT-SLOT2 osafamfnd[18031]: ER
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery
is:nodeFailfast
May 7 15:44:16 OEL-64BIT-SLOT2 osafamfnd[18031]: Rebooting OpenSAF NodeId =
131599 EE Name = , Reason: Component faulted: recovery is node failfast
May 7 15:44:16 OEL-64BIT-SLOT2 opensaf_reboot: Rebooting local node
The backtrace of the corefile is as follows:
(gdb) bt
#0 0x0000003c0be328a5 in raise () from /lib64/libc.so.6
#1 0x0000003c0be34085 in abort () from /lib64/libc.so.6
#2 0x00000000004066db in immd_evt_proc_immnd_intro (cb=0x62a940,
evt=0x7fc0d4001e80, sinfo=0x7fc0d4001fc0) at immd_evt.c:1167
#3 0x000000000040366f in immd_process_evt () at immd_evt.c:105
#4 0x0000000000409762 in main (argc=2, argv=0x7fff7f526308) at
immd_main.c:242
(gdb) fr 2
#2 0x00000000004066db in immd_evt_proc_immnd_intro (cb=0x62a940,
evt=0x7fc0d4001e80, sinfo=0x7fc0d4001fc0) at immd_evt.c:1167
1167 abort();
(gdb) bt full
#0 0x0000003c0be328a5 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x0000003c0be34085 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00000000004066db in immd_evt_proc_immnd_intro (cb=0x62a940,
evt=0x7fc0d4001e80, sinfo=0x7fc0d4001fc0) at immd_evt.c:1167
proc_rc = 1
node_info = 0x17ca6a0
oldPid = 17768
newPid = 17768
oldEpoch = 32
newEpoch = 34
__FUNCTION__ = "immd_evt_proc_immnd_intro"
#3 0x000000000040366f in immd_process_evt () at immd_evt.c:105
cb = 0x62a940
rc = 1
evt = 0x7fc0d4001e70
__FUNCTION__ = "immd_process_evt"
#4 0x0000000000409762 in main (argc=2, argv=0x7fff7f526308) at
immd_main.c:242
ret = 1
error = SA_AIS_OK
mbx_fd = {raise_obj = 11, rmv_obj = 12}
fds = {{fd = 16, events = 1, revents = 0}, {fd = 14, events = 1, revents =
0}, {fd = 12, events = 1, revents = 1}}
__FUNCTION__ = "main"
(gdb) p *cb
$1 = {mbx = 4291821569, comp_name = {length = 47, value =
"safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF", '\000' <repeats 208 times>},
mds_handle = 65549, immd_anc = 0, mds_role = V_DEST_RL_ACTIVE, immd_dest_id =
13, mbcsv_handle = 4293918753, mbcsv_sel_obj = 14, o_ckpt_hdl = 4292870177,
immd_sync_cnt = 12134, immd_self_id = 242, immd_remote_id = 241, immd_remote_up
= true, node_id = 131599, is_loc_immnd_up = true, is_rem_immnd_up = true,
is_quiesced_set = false, loc_immnd_dest = 565213401204072, rem_immnd_dest =
564113889559952, immnd_tree = {root_node = {bit = -1, left = 0x17ca6a0, right =
0x62aaa8, key_info = 0x17a9030 ""}, params = {key_size = 4, info_size = 0,
actual_key_size = 0, node_size = 0}, n_nodes = 4}, is_immnd_tree_up = true,
amf_hdl = 4289724418, clm_hdl = 0, ha_state = SA_AMF_HA_ACTIVE, edu_hdl =
{is_inited = true, tree = {root_node = {bit = -1, left = 0x62ab08, right =
0x62ab08, key_info = 0x17a9010 ""}, params = {key_size = 8, info_size = 32704,
actual_key_size = 193073544, node_size = 60}, n_nodes = 0}, to_version = 0},
amf_invocation = 4234149894, admo_id_count = 18, ccb_id_count = 1, impl_co
unt = 56, fevsSendCount = 43301, mRulingEpoch = 33, mExpectedNodes = 0 '\000',
WaitSecs = 0 '\000', immnd_coord = 131599, usr1_sel_obj = {raise_obj = 15,
rmv_obj = 16}, amf_sel_obj = 16, saved_msgs = 0x0, mRim =
SA_IMM_KEEP_REPOSITORY}
(gdb) p *evt
$2 = {type = IMMD_EVT_ND2D_INTRO, info = {ctrl_msg = {ndExecPid = 17768,
epoch = 34, refresh = 1 '\001', pbeEnabled = 1 '\001'}, admown_init =
{client_hdl = 146028905832, i = {adminOwnerName = {length = 257, value = '\000'
<repeats 255 times>}, releaseOwnershipOnFinalize = SA_FALSE}}, ccb_init =
{adminOwnerId = 17768, ccbFlags = 257, client_hdl = 0}, impl_set = {r =
{client_hdl = 146028905832, impl_name = {size = 257, buf = 0x0}, impl_id = 0,
scope = 0}, reply_dest = 0}, objModify = {ccbId = 17768, adminOwnerId = 34,
objectName = {size = 257, buf = 0x0}, attrMods = 0x0, immHandle = 0}, ccbId =
17768, admoId = 17768, fevsReq = {sender_count = 146028905832, reply_dest =
257, client_hdl = 0, msg = {size = 0, buf = 0x0}, isObjSync = 0 '\000'},
tmr_info = {type = 17768, info = {immnd_dest = 257}}, mds_info = {change =
17768, dest = 257, svc_id = 0, node_id = 0, role = 0}, rda_info = {io_role =
17768}, syncFevsBase = {fevsBase = 146028905832, client_hdl = 257}}}
IMMD and IMMND traces are available and huge. Can be shared on request
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets