- **status**: assigned --> review


---

** [tickets:#1312] AMF: NodeFailover during SiSwap leaves SG UnStable**

**Status:** review
**Milestone:** 4.4.2
**Created:** Fri Apr 10, 2015 10:57 AM UTC by Minh Hon Chau
**Last Updated:** Thu Apr 16, 2015 04:00 AM UTC
**Owner:** Minh Hon Chau

* Configuration:

2 2N SU1, SU2 hosted in SCs
1 sponsored SI (AGENT) and some dependent SIs (MTZ, ACA, CQH, AFD, HDF, NSF, 
SGS, CLH, DBO)
Only one componentRestart will escalate to nodeFailover

* Steps and analysis

All SIs are assigned ACTIVE to SU1, STANDBY to SU2

1) Swap SI safSi=AFD,safApp=TEST_APP
Apr 10 11:00:49 SC-1 osafamfd[491]: NO safSi=AFD,safApp=TEST_APP Swap initiated

2) Swap 2N SI will lead to SU switch over
Apr 10 11:00:49 SC-1 osafamfnd[500]: NO Assigning 'safSi=ACA,safApp=TEST_APP' 
QUIESCED to 'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP'
Apr 10 11:00:49 SC-1 osafamfnd[500]: NO Assigned 'safSi=ACA,safApp=TEST_APP' 
QUIESCED to 'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP'
...
Apr 10 11:00:49 SC-1 osafamfnd[500]: NO Assigning 'safSi=AGENT,safApp=TEST_APP' 
QUIESCED to 'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP'
Apr 10 11:00:49 SC-1 osafamfnd[500]: NO Assigned 'safSi=AGENT,safApp=TEST_APP' 
QUIESCED to 'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP'

3) Assign sponsor SI ACTIVE to SU2
Apr 10 11:00:49 SC-2 osafamfnd[488]: NO Assigning 'safSi=AGENT,safApp=TEST_APP' 
ACTIVE to 'safSu=SU2,safSg=TEST_SG_2N,safApp=TEST_APP'
(But AGENT in SC-2 has not responded to AMFND)

4) Binary of CQH is corrupted after QUIESCED response to AMF , escalate to 
nodeFailover
Apr 10 11:00:50 SC-1 osafamfnd[500]: NO 
'safComp=CQH,safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP' recovery action 
escalated from 'componentRestart' to 'nodeFailover'
Apr 10 11:00:50 SC-1 osafamfnd[500]: NO 
'safComp=CQH,safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP' faulted due to 
'avaDown' : Recovery is 'nodeFailover'

5) SC-1 is going reboot, SC-2 becomes ACTIVE
Apr 10 11:00:50 SC-2 osafamfd[479]: NO FAILOVER StandBy --> Active

6) AMFD-SC2 starts node_failover procedure
Apr 10 11:00:50.731489 osafamfd [479:ndproc.cc:0923] >> avd_node_failover: 
'safAmfNode=SC-1,safAmfCluster=myAmfCluster'
...
Apr 10 11:00:50.737048 osafamfd [479:sg_nored_fsm.cc:0793] >> node_fail: 
safSu=SC-1,safSg=NoRed,safApp=OpenSAF, sg_fsm_state=0
Apr 10 11:00:50.745536 osafamfd [479:sg_2n_fsm.cc:3262] >> node_fail: 
'safSu=SC-1,safSg=2N,safApp=OpenSAF', 0
Apr 10 11:00:50.748579 osafamfd [479:sg_2n_fsm.cc:3262] >> node_fail: 
'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP', 2

7) During running node_fail_su_oper for TEST_SG_2N (due to swap), SG state set 
to STABLE
Apr 10 11:00:50.748584 osafamfd [479:sg_2n_fsm.cc:2865] >> node_fail_su_oper 
...
Apr 10 11:00:50.749197 osafamfd [479:sg.cc:1635] TR 
safSg=TEST_SG_2N,safApp=TEST_APP sg_fsm_state 2 => 0
...
Apr 10 11:00:50.749217 osafamfd [479:sg_2n_fsm.cc:3099] << node_fail_su_oper 

8) Now in SC-2, AGENT responded to AMFND for ACTIVE csiSetCallback, AMFD 
receives this su_si event from AMFND.
But SG is STABLE, and no operation for su_si modify (act:5)
Apr 10 11:00:59.280465 osafamfnd [488:susm.cc:0954] NO Assigned 
'safSi=AGENT,safApp=TEST_APP' ACTIVE to 
'safSu=SU2,safSg=TEST_SG_2N,safApp=TEST_APP'
Apr 10 11:00:59.280681 osafamfd [479:sgproc.cc:0889] >> avd_su_si_assign_evh: 
id:120, node:2020f, act:5, 'safSu=SU2,safSg=TEST_SG_2N,safApp=TEST_APP', 
'safSi=AGENT,safApp=TEST_APP', ha:1, err:1, single:0
...
Apr 10 11:00:59.280737 osafamfd [479:sg_2n_fsm.cc:2361] >> susi_success: 
'safSu=SU2,safSg=TEST_SG_2N,safApp=TEST_APP' act=5, hastate=1, sg_fsm_state=0
Apr 10 11:00:59.280749 osafamfd [479:sg_2n_fsm.cc:2376] EM sg_2n_fsm.cc:2376: 
safSu=SU2,safSg=TEST_SG_2N,safApp=TEST_APP (42)
Apr 10 11:00:59.280752 osafamfd [479:sg_2n_fsm.cc:2562] << susi_success: rc:1
Apr 10 11:00:59.280755 osafamfd [479:sgproc.cc:1405] << avd_su_si_assign_evh 

9) SC-1 comes up, all SIs are assigned STANDBY
Apr 10 11:01:21 SC-1 opensafd: Starting OpenSAF Services (Using TCP)
...
Apr 10 11:01:24 SC-1 osafamfnd[490]: NO Assigning 'safSi=DBO,safApp=TEST_APP' 
STANDBY to 'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP'
Apr 10 11:01:24 SC-1 osafamfnd[490]: NO Assigned 'safSi=DBO,safApp=TEST_APP' 
STANDBY to 'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP'
...
Apr 10 11:01:24 SC-1 osafamfnd[490]: NO Assigning 'safSi=AGENT,safApp=TEST_APP' 
STANDBY to 'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP'
Apr 10 11:01:24 SC-1 osafamfnd[490]: NO Assigned 'safSi=AGENT,safApp=TEST_APP' 
STANDBY to 'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP'

10) AMFD-SC2 is informed the SU1's STANDBY assignment
After susi_success(), SG state is still REALIGN
Apr 10 11:01:24.345208 osafamfd [479:sgproc.cc:0889] >> avd_su_si_assign_evh: 
id:115, node:2010f, act:2, 'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP', 
'safSi=AGENT,safApp=TEST_APP', ha:2, err:1, single:0
...
Apr 10 11:01:24.345666 osafamfd [479:sg_2n_fsm.cc:2361] >> susi_success: 
'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP' act=2, hastate=2, sg_fsm_state=1
Apr 10 11:01:24.345669 osafamfd [479:sg_2n_fsm.cc:1446] >> 
susi_success_sg_realign: 'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP' act=2, 
state=2
Apr 10 11:01:24.345672 osafamfd [479:sg_2n_fsm.cc:1865] << 
susi_success_sg_realign: rc:1
Apr 10 11:01:24.345674 osafamfd [479:sg_2n_fsm.cc:2562] << susi_success: rc:1
Apr 10 11:01:24.345678 osafamfd [479:sgproc.cc:1405] << avd_su_si_assign_evh 

11) Finally, failed to swap again
Apr 10 11:03:23.304988 osafamfd [479:si.cc:0821] >> si_admin_op_cb: 
safSi=AFD,safApp=TEST_APP op=7
Apr 10 11:03:23.304997 osafamfd [479:sg_2n_fsm.cc:0757] >> si_swap: 
'safSi=AFD,safApp=TEST_APP' sg_fsm_state=1
Apr 10 11:03:23.305011 osafamfd [479:sg_2n_fsm.cc:0775] ER 
safSi=AFD,safApp=TEST_APP SWAP failed - SG not stable (1)
Apr 10 11:03:23.305013 osafamfd [479:sg_2n_fsm.cc:0857] << si_swap: 
sg_fsm_state=1





---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to