- **status**: assigned --> wontfix


---

** [tickets:#1165] Out of sync messages cause amfnd crash and  cluster reboot**

**Status:** wontfix
**Milestone:** never
**Created:** Thu Oct 09, 2014 11:11 AM UTC by surender khetavath
**Last Updated:** Tue Mar 24, 2015 10:49 AM UTC
**Owner:** Nagendra Kumar

Changeset : 6012
Setup : 2 controllers
Initially SC-1 active and SC-2 standby

Test: 
1) do 'kill -STOP `pidof osafamfd` on sc-1
2) on sc-2 do 'amf-adm si-swap safSi=SC-2N,safApp=OpenSAF'
3) after few seconds on sc-1 do 'kill -CONT <pid of amfd>'
At this point switchover succeeds.Then
4) Again on SC-1 do 'kill -STOP <pid of amfd>'
5) on sc-2 do 'amf-adm si-swap safSi=SC-2N,safApp=OpenSAF'
6) After few seconds do 'kill -CONT <pid of amfd>' on sc-1

Now the SC-2 will crash and go to reboot.
Later SC-1 also reboots.

syslog start time on sc-1 at 1st swap : 
Oct  9 15:25:33 SC-1 osafamfd[8666]: NO safSi=SC-2N,safApp=OpenSAF Swap 
initiated
syslog start time on sc-1 at 2nd swap : 
Oct  9 15:25:35 SC-1 osafamfd[8666]: NO Controller switch over initiated

After second swap errors seen in sc-1 syslog: 

Oct  9 15:26:28 SC-1 opensaf_reboot: Rebooting remote node in the absence of 
PLM is outside the scope of OpenSAF
Oct  9 15:26:28 SC-1 osaflckd[8778]: ER GLD mbcsv chgrole failed
Oct  9 15:26:28 SC-1 osafevtd[8797]: ER MBCSv state change failed
Oct  9 15:26:28 SC-1 osafckptd[8806]: NO ERR_INVALID_PARAM: Implementer 
safCheckPointService already set for this handle when trying to set 
safCheckPointService
Oct  9 15:26:28 SC-1 osafckptd[8806]: ER cpd immOiImplmenterSet failed with err 
= 7
Oct  9 15:26:28 SC-1 osafamfnd[8676]: NO 
'safComp=CPD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Oct  9 15:26:28 SC-1 osafamfnd[8676]: ER 
safComp=CPD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Oct  9 15:26:28 SC-1 osafamfnd[8676]: Rebooting OpenSAF NodeId = 131343 EE Name 
= , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  9 15:26:28 SC-1 opensaf_reboot: Rebooting local node; timeout=60
Oct  9 15:26:28 SC-1 osafimmnd[8603]: NO Implementer locally disconnected. 
Marking it as doomed 40 <514, 2010f> (MsgQueueService131599)
Oct  9 15:26:28 SC-1 osafmsgd[8726]: NO ERR_INVALID_PARAM: Implementer 
safMsgGrpService already set for this handle when trying to set safMsgGrpService
Oct  9 15:26:28 SC-1 osafmsgd[8726]: ER mqd_imm_declare_implementer failed: err 
= 7
Oct  9 15:26:28 SC-1 osafmsgd[8726]: ER MBCSV ChangeRole Failed



syslog start time on sc-2 at 1st swap : 
Oct  9 15:25:55 SC-2 osafamfd[7552]: NO safSi=SC-2N,safApp=OpenSAF Swap 
initiated
syslog start time on sc-2 at 2nd swap :
Oct  9 15:26:00 SC-2 osafamfd[7552]: NO Controller switch over initiated

After second swap errors seen in sc-2 syslog:
Oct  9 15:26:16 SC-2 osafamfd[7552]: ER Out of sync detected in warm sync 
response, exiting
Oct  9 15:26:16 SC-2 osafamfd[7552]: ckpt_dec.cc:2766: avd_dec_warm_sync_rsp: 
Assertion '0' failed.
Oct  9 15:26:16 SC-2 osafamfnd[7562]: ER AMF director unexpectedly crashed
Oct  9 15:26:16 SC-2 osafamfnd[7562]: Rebooting OpenSAF NodeId = 131599 EE Name 
= , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 
131599, SupervisionTime = 60


(gdb) bt
#0  0x00007f394c52fb55 in raise () from /lib64/libc.so.6
#1  0x00007f394c531131 in abort () from /lib64/libc.so.6
#2  0x00007f394e2f3ffe in __osafassert_fail () from 
/usr/lib64/libopensaf_core.so.0
#3  0x00000000004154e6 in avd_dec_warm_sync_rsp(cl_cb_tag*, ncs_mbcsv_cb_dec*) 
() at ckpt_dec.cc:2766
#4  0x000000000040ca75 in avsv_mbcsv_process_dec_cb(cl_cb_tag*, 
ncs_mbcsv_cb_arg*) ()
#5  0x000000000040c2ec in avsv_mbcsv_cb(ncs_mbcsv_cb_arg*) ()
#6  0x00007f394e304036 in ncs_mbscv_rcv_decode () from 
/usr/lib64/libopensaf_core.so.0
#7  0x00007f394e3047ce in ncs_mbcsv_rcv_warm_sync_resp_cmplt () from 
/usr/lib64/libopensaf_core.so.0
#8  0x00007f394e30ac40 in mbcsv_process_events () from 
/usr/lib64/libopensaf_core.so.0
#9  0x00007f394e30adab in mbcsv_hdl_dispatch_all () from 
/usr/lib64/libopensaf_core.so.0
#10 0x00007f394e305782 in mbcsv_process_dispatch_request () at mbcsv_api.c:423
#11 0x000000000040d26a in avsv_mbcsv_dispatch(cl_cb_tag*, unsigned int) ()
#12 0x000000000044337e in main_loop() () at main.cc:698
#13 0x00000000004437de in main () at main.cc:830


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to