- **status**: accepted --> review


---

** [tickets:#405] ImplementerClear returns BAD HANDLE for the Applier OI when 
IMMND restarts during switchover**

**Status:** review
**Milestone:** 4.4.2
**Created:** Fri May 31, 2013 05:39 AM UTC by Nagendra Kumar
**Last Updated:** Thu Feb 05, 2015 11:40 AM UTC
**Owner:** Nagendra Kumar

Migrated from http://devel.opensaf.org/ticket/2656

When IMMND is killed when switchover in progress, the AMFD applier OI(i.e. the 
STANDBY that is becoming ACTIVE) gets BAD HANDLE for saImmOiImplementerClear(). 
This results in the AMFD asserting and switchover failure.


Attached is the log and the assert information.


Snippet from /var/log/messages:


May 7 12:22:00 TMP-SLOT osafimmnd[7062]: SERVER STATE: IMM_SERVER_SYNC_CLIENT 
—> IMM SERVER READY 
May 7 12:22:00 TMP-SLOT osafamfd[5910]: Switching StandBy? —> Active State 
May 7 12:22:00 TMP-SLOT osafamfd[5910]: Switch Standby —> Active FAILED, 
ImplementerClear? failed 9 
May 7 12:22:00 TMP-SLOT osafimmd[5852]: Received IMMD service event 
May 7 12:22:00 TMP-SLOT osafimmd[5852]: Received IMMD service event 
May 7 12:22:00 TMP-SLOT osafamfd[5910]: avd_msg_sanity_chk: invalid msg id 80, 
from 20200 should be 85 
May 7 12:22:00 TMP-SLOT osafimmnd[7062]: Implementer disconnected 22 <0, 20100> 
(@safAmfService20100) 
May 7 12:22:00 TMP-SLOT osafamfd[5910]: avd_msg_sanity_chk: invalid msg id 81, 
from 20200 should be 85 
May 7 12:22:00 TMP-SLOT osafamfd[5910]: avd_msg_sanity_chk: invalid msg id 82, 
from 20200 should be 85 
May 7 12:22:00 TMP-SLOT osafamfd[5910]: avd_msg_sanity_chk: invalid msg id 83, 
from 20200 should be 85 
May 7 12:22:00 TMP-SLOT osafamfd[5910]: avd_msg_sanity_chk: invalid msg id 84, 
from 20200 should be 85 
May 7 12:22:01 TMP-SLOT osafamfd[5910]: FAILOVER Active —> Quiesced FAILED, 
ImplementerClear? failed 9 
May 7 12:22:01 TMP-SLOT osafamfd[5910]: avd_role.c:585: avd_mds_qsd_role_evh: 
Assertion '0' failed. 
May 7 12:22:01 TMP-SLOT osafamfnd[5920]: AMF director unexpectedly crashed 
May 7 12:22:01 TMP-SLOT osafamfnd[5920]: Rebooting OpenSAF NodeId? = 131584 EE 
Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received 
May 7 12:22:01 TMP-SLOT osafimmnd[7062]: Director Service in NOACTIVE state - 
fevs replies pending:1 fevs highest processed:2407 


Steps to reproduce
==================


1) Invoke switchover
2) kill IMMND on the controller that is becoming active


Backtrace of the amfd corefile:


#0 0x00007fe66dc14645 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007fe66dc14645 in raise () from /lib64/libc.so.6
#1 0x00007fe66dc15c33 in abort () from /lib64/libc.so.6
#2 0x00007fe66f225e15 in osafassert_fail (file=0x4a406a "avd_role.c", line=585, 
func=0x4a4660 "avd_mds_qsd_role_evh", 


assertion=0x4a46b8 "0") at sysf_def.c:399


#3 0x000000000043c1be in avd_mds_qsd_role_evh (cb=0x6bcb80, evt=0x7fe6680015b0) 
at avd_role.c:585
#4 0x000000000043af86 in avd_process_event (cb_now=0x6bcb80, 
evt=0x7fe6680015b0) at avd_proc.c:589
#5 0x000000000043ad0d in avd_main_proc () at avd_proc.c:505
#6 0x0000000000409210 in main (argc=1, argv=0x7fff776adcf8) at amfd_main.c:47
(gdb) fr 2
#2 0x00007fe66f225e15 in osafassert_fail (file=0x4a406a "avd_role.c", line=585, 
func=0x4a4660 "avd_mds_qsd_role_evh", 


assertion=0x4a46b8 "0") at sysf_def.c:399


399 sysf_def.c: No such file or directory.


in sysf_def.c


(gdb) fr 3 
#3 0x000000000043c1be in avd_mds_qsd_role_evh (cb=0x6bcb80, evt=0x7fe6680015b0) 
at avd_role.c:585
585 avd_role.c: No such file or directory.


in avd_role.c


(gdb) p *cb
$1 = {avd_mbx = 4291821569, avd_hb_mbx = 0, mds_handle = 0x0, init_state = 
AVD_APP_STATE, avd_fover_state = false, 


avail_state_avd = SA_AMF_HA_ACTIVE, vaddr_pwe_hdl = 65537, vaddr_hdl = 1, 
adest_hdl = 131071, vaddr = 1, 
other_avd_adest = 564051710926873, local_avnd_adest = 565151639322652, 
nd_msg_queue_list = {nd_msg_queue = 0x0, tail = 0x0}, 
evt_queue = {evt_msg_queue = 0x0, tail = 0x0}, mbcsv_hdl = 4293918753, ckpt_hdl 
= 4292870177 begin_of_the_skype_highlighting            4292870177      
end_of_the_skype_highlighting, mbcsv_sel_obj = 13, 
stby_sync_state = AVD_STBY_IN_SYNC, synced_reo_type = 13, async_updt_cnt = 
{cb_updt = 13, node_updt = 699, app_updt = 12, sg_updt = 68, 


su_updt = 174, si_updt = 416, sg_su_oprlist_updt = 43, sg_admin_si_updt = 0, 
siass_updt = 80, comp_updt = 756, csi_updt = 0, 
compcstype_updt = 0, si_trans_updt = 0}, sync_required = true, async_updt_msgs 
= {async_updt_queue = 0x0, tail = 0x0}, edu_hdl = {
is_inited = true, tree = {root_node = {bit = -1, left = 0x6d3c00, right = 
0x6bcc48, key_info = 0x6d3be0 ""}, params = {key_size = 8, 


info_size = 16843009, actual_key_size = 1869266944 
begin_of_the_skype_highlighting            1869266944      
end_of_the_skype_highlighting, node_size = 32742}, n_nodes = 32}, to_version = 
4}, cluster_init_time = 0, 


node_id_avd = 131584, node_id_avd_other = 131328, node_avd_failed = 0, 
node_list = {root_node = {bit = -1, left = 0x6bcca0, 


right = 0x6bcca0, key_info = 0x6d3bc0 ""}, params = {key_size = 4, info_size = 
0, actual_key_size = 0, node_size = 0}, n_nodes = 0}, 


amf_init_tmr = {tmr_id = 0x0, type = AVD_TMR_SND_HB, node_id = 0, spons_si_name 
= {length = 0, value = '\0' <repeats 255 times>}, 


dep_si_name = {length = 0, value = '\0' <repeats 255 times>}, is_active = 
false}, heartbeat_tmr = {tmr_id = 0x6f0fb0, 
type = AVD_TMR_SND_HB, node_id = 0, spons_si_name = {length = 0, value = '\0' 
<repeats 255 times>}, dep_si_name = {length = 0, 


value = '\0' <repeats 255 times>}, is_active = true}, heartbeat_tmr_period = 
10000000000, nodes_exit_cnt = 4, 


ntfHandle = 4279238657 begin_of_the_skype_highlighting            4279238657    
  end_of_the_skype_highlighting, ext_comp_info = {local_avnd_node = 0x0, 
ext_comp_hlt_check = 0x0}, peer_msg_fmt_ver = 4, avd_peer_ver = 4, 
immOiHandle = 0, immOmHandle = 34359869952, imm_sel_obj = 17, is_implementer = 
false, clmHandle = 4285530113 begin_of_the_skype_highlighting            
4285530113      end_of_the_skype_highlighting, clm_sel_obj = 15, 
swap_switch = SA_FALSE}


(gdb) p *evt
$2 = {next = {next = 0x0}, rcv_evt = AVD_EVT_MDS_QSD_ACK, info = {avnd_msg = 
0x0, avd_msg = 0x0, node_id = 0, tmr = {tmr_id = 0x0, 


type = AVD_TMR_SND_HB, node_id = 0, spons_si_name = {length = 0, value = '\0' 
<repeats 255 times>}, dep_si_name = {length = 0, 


value = '\0' <repeats 255 times>}, is_active = false}}}



Changed 13 months ago by neelakanta ¶
  Standby AMFD registers as an applier. If the IMMND restarts the applier OI 
gets exposed and a BAD_HANDLE is returned to AMFD. 


Presently amfd tries to reinitialize with IMMND in a separate thread when 
dispatch returns BAD_HANDLE. At the same time, in the main thread 
ImmOiplementerClear?() is attempted as a part of switchover processing.


One possible solution is:
If the ImmOihandle? is zero and the implementerclear return BAD_HANDLE, AMFD 
can ignore this. As per the current flow of AMFD, subsequently when it attempts 
the ImplementerSet?, if the handle is still zero and implementerSEt() returns 
BAD_HANDLE, AMFD can do a try again from the main thread, until the reinit_bg 
thread completes.


Changed 13 months ago by anders ¶
  See also:


http://devel.opensaf.org/ticket/1933


Any solution to this failure case (coping with local immnd crash during 
switchover),
should be just a special case of coping with local immnd crash during normal 
processing,
plus I assume immnd crash during other use cases.


Changed 13 months ago by nagendra ¶
  Can you please upload more syslog from both the controllers, so that we want 
to see when the controller rebooted and rejoined, when TIPC link was down, etc. 
It will also help us in 2657.


Changed 13 months ago by praveenmalviya ¶
  ■owner changed from ravisekhar to praveenmalviya 
■status changed from new to accepted 
Changed 13 months ago by praveenmalviya ¶
  ■patch_waiting changed from no to yes 
Changed 13 months ago by hafe ¶
  ■patch_waiting changed from yes to no 
 http://list.opensaf.org/pipermail/devel/2012-May/026219.html





---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to