----- Original Message ----- 
From: [email protected] 
To: [email protected] 
Sent: Thursday, April 16, 2015 9:30:59 AM GMT +05:30 Chennai, Kolkata, Mumbai, 
New Delhi 
Subject: [opensaf:tickets] #1312 AMF: NodeFailover during SiSwap leaves SG 
UnStable 




I guess your suggestion will look like below 

/* the SI relationships to the SU is quiesced assigned and the * other SU is 
being modified to Active. If this * SU admin is shutdown change to LOCK. If 
this SU switch state * is true change to false. Remove the SU from operation 
list. * Add that SU to the operation list . Change state to * SG_realign state. 
Free all the SI assignments to this SU. */ if ( all_assignments_done ( a_susi 
-> su )) { // Patch of #708 } else { avd_sg_su_oper_list_del ( cb , su , false 
); su -> delete_all_susis (); su_node_ptr = su -> get_node_ptr (); /* the admin 
state of the SU is shutdown change it to lock. */ if ( su -> saAmfSUAdminState 
== SA_AMF_ADMIN_SHUTTING_DOWN ) { ... } else if ( su_node_ptr -> 
saAmfNodeAdminState == SA_AMF_ADMIN_SHUTTING_DOWN ) { ... } else if ( su -> 
su_switch == AVSV_SI_TOGGLE_SWITCH ) { /* During si-swap while standby 
assignment is going on, if Nodefailover or SU failover got escalated then 
toggle SU switch state and make SG stable. After SG becomes stable, spare SU 
will be instantiated, if available, or same SU will get standby assignment 
after repair. */ if ( all_assignments_done ( a_susi -> su )) { //#309 su -> 
set_su_switch ( AVSV_SI_TOGGLE_STABLE ); m_AVD_SET_SG_FSM ( cb , ( su -> 
sg_of_su ), AVD_SG_FSM_STABLE ); complete_siswap ( a_susi -> su , SA_AIS_OK ); 
} else if ( a_susi -> su -> any_susi_fsm_in_modify () == true ){ /* Do nothing 
let the response come from AMFND for a_susi->su. When such a response will be 
received SG FSM will take care of it as it is still in su oper state */ LOG_NO 
( "SG remain in unstable state" ); } else { avd_sg_su_oper_list_add ( cb , 
a_susi -> su , false ); m_AVD_SET_SG_FSM ( cb , ( su -> sg_of_su ), 
AVD_SG_FSM_SG_REALIGN ); } } 


The patch for #309 will never be run because "all_assignment_done(a_susi->su)" 
always returns false, since we are at "else" case of patch #708 

changeset: 5560:0e6771c09786 
user: Nagendra Kumar [email protected] 
date: Tue Aug 12 20:32:48 2014 +0530 
summary: amfd: start clm track and mark sg stable after node failover [#708] 

changeset: 5557:b1702cf32d6e 
user: Praveen [email protected] 
date: Tue Aug 12 20:17:16 2014 +0530 
summary: amfd : fix node failover while assigning standby state during si-swap 
[#309] 

I think both #309 and #708 separately can fix the issue of "NodeFailover 
happens on node receiving standby assignment during si-swap", but when those 
patches are merged into branch, a part of patch #309 for this particular issue 
will never be run since the fix of #708 has been run first. 

I think we can just back out the part of #309 related to this issue and add 
patch for #1312. Does it sound ok to you? 




[Praveen] Below patch looks ok to me. Please float it officially. 
Only one minor comment below. 




diff --git a/osaf/services/saf/amf/amfd/sg_2n_fsm.cc 
b/osaf/services/saf/amf/amfd/sg_2n_fsm.cc 
--- a/osaf/services/saf/amf/amfd/sg_2n_fsm.cc 
+++ b/osaf/services/saf/amf/amfd/sg_2n_fsm.cc 
@@ -2967,14 +2967,10 @@ void SG_2N::node_fail_su_oper(AVD_SU su 
avd_sg_su_oper_list_add(cb, a_susi->su, false); 
m_AVD_SET_SG_FSM(cb, (su->sg_of_su), AVD_SG_FSM_SG_REALIGN); 
} else if (su->su_switch == AVSV_SI_TOGGLE_SWITCH) { 
- / During si-swap while standby assignment is going on, if Nodefailover 
- or SU failover got escalated then toggle SU switch state and make SG 
- stable. After SG becomes stable, spare SU will be instantiated, 
- if available, or same SU will get standby assignment after repair. 
- */ 




[Praveen]Above comment can be moved in the if part (#708/#309 part), so that in 
future anybody can know the context. 




- su->set_su_switch(AVSV_SI_TOGGLE_STABLE); 
- m_AVD_SET_SG_FSM(cb, (su->sg_of_su), AVD_SG_FSM_STABLE); 
- complete_siswap(a_susi->su, SA_AIS_OK); 
+ // Patch for #1312 
+ // if (a_susi->su->any_susi_fsm_in_modify() == true){ 
+ // .... 
+ // } 
} else { 
avd_sg_su_oper_list_add(cb, a_susi->su, false); 
m_AVD_SET_SG_FSM(cb, (su->sg_of_su), AVD_SG_FSM_SG_REALIGN); 


Thanks, 
Minh 


[tickets:#1312] AMF: NodeFailover during SiSwap leaves SG UnStable 

Status: assigned 
Milestone: 4.4.2 
Created: Fri Apr 10, 2015 10:57 AM UTC by Minh Hon Chau 
Last Updated: Tue Apr 14, 2015 11:20 AM UTC 
Owner: Minh Hon Chau 

    • Configuration: 


2 2N SU1, SU2 hosted in SCs 
1 sponsored SI (AGENT) and some dependent SIs (MTZ, ACA, CQH, AFD, HDF, NSF, 
SGS, CLH, DBO) 
Only one componentRestart will escalate to nodeFailover 

    • Steps and analysis 


All SIs are assigned ACTIVE to SU1, STANDBY to SU2 

1) Swap SI safSi=AFD,safApp=TEST_APP 
Apr 10 11:00:49 SC-1 osafamfd [491] : NO safSi=AFD,safApp=TEST_APP Swap 
initiated 

2) Swap 2N SI will lead to SU switch over 
Apr 10 11:00:49 SC-1 osafamfnd [500] : NO Assigning 'safSi=ACA,safApp=TEST_APP' 
QUIESCED to 'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP' 
Apr 10 11:00:49 SC-1 osafamfnd [500] : NO Assigned 'safSi=ACA,safApp=TEST_APP' 
QUIESCED to 'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP' 
... 
Apr 10 11:00:49 SC-1 osafamfnd [500] : NO Assigning 
'safSi=AGENT,safApp=TEST_APP' QUIESCED to 
'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP' 
Apr 10 11:00:49 SC-1 osafamfnd [500] : NO Assigned 
'safSi=AGENT,safApp=TEST_APP' QUIESCED to 
'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP' 

3) Assign sponsor SI ACTIVE to SU2 
Apr 10 11:00:49 SC-2 osafamfnd [488] : NO Assigning 
'safSi=AGENT,safApp=TEST_APP' ACTIVE to 
'safSu=SU2,safSg=TEST_SG_2N,safApp=TEST_APP' 
(But AGENT in SC-2 has not responded to AMFND) 

4) Binary of CQH is corrupted after QUIESCED response to AMF , escalate to 
nodeFailover 
Apr 10 11:00:50 SC-1 osafamfnd [500] : NO 
'safComp=CQH,safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP' recovery action 
escalated from 'componentRestart' to 'nodeFailover' 
Apr 10 11:00:50 SC-1 osafamfnd [500] : NO 
'safComp=CQH,safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP' faulted due to 
'avaDown' : Recovery is 'nodeFailover' 

5) SC-1 is going reboot, SC-2 becomes ACTIVE 
Apr 10 11:00:50 SC-2 osafamfd [479] : NO FAILOVER StandBy --> Active 

6) AMFD-SC2 starts node_failover procedure 
Apr 10 11:00:50.731489 osafamfd [479:ndproc.cc:0923] >> avd_node_failover: 
'safAmfNode=SC-1,safAmfCluster=myAmfCluster' 
... 
Apr 10 11:00:50.737048 osafamfd [479:sg_nored_fsm.cc:0793] >> node_fail: 
safSu=SC-1,safSg=NoRed,safApp=OpenSAF, sg_fsm_state=0 
Apr 10 11:00:50.745536 osafamfd [479:sg_2n_fsm.cc:3262] >> node_fail: 
'safSu=SC-1,safSg=2N,safApp=OpenSAF', 0 
Apr 10 11:00:50.748579 osafamfd [479:sg_2n_fsm.cc:3262] >> node_fail: 
'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP', 2 

7) During running node_fail_su_oper for TEST_SG_2N (due to swap), SG state set 
to STABLE 
Apr 10 11:00:50.748584 osafamfd [479:sg_2n_fsm.cc:2865] >> node_fail_su_oper 
... 
Apr 10 11:00:50.749197 osafamfd [479:sg.cc:1635] TR 
safSg=TEST_SG_2N,safApp=TEST_APP sg_fsm_state 2 => 0 
... 
Apr 10 11:00:50.749217 osafamfd [479:sg_2n_fsm.cc:3099] << node_fail_su_oper 

8) Now in SC-2, AGENT responded to AMFND for ACTIVE csiSetCallback, AMFD 
receives this su_si event from AMFND. 
But SG is STABLE, and no operation for su_si modify (act:5) 
Apr 10 11:00:59.280465 osafamfnd [488:susm.cc:0954] NO Assigned 
'safSi=AGENT,safApp=TEST_APP' ACTIVE to 
'safSu=SU2,safSg=TEST_SG_2N,safApp=TEST_APP' 
Apr 10 11:00:59.280681 osafamfd [479:sgproc.cc:0889] >> avd_su_si_assign_evh: 
id:120, node:2020f, act:5, 'safSu=SU2,safSg=TEST_SG_2N,safApp=TEST_APP', 
'safSi=AGENT,safApp=TEST_APP', ha:1, err:1, single:0 
... 
Apr 10 11:00:59.280737 osafamfd [479:sg_2n_fsm.cc:2361] >> susi_success: 
'safSu=SU2,safSg=TEST_SG_2N,safApp=TEST_APP' act=5, hastate=1, sg_fsm_state=0 
Apr 10 11:00:59.280749 osafamfd [479:sg_2n_fsm.cc:2376] EM sg_2n_fsm.cc:2376: 
safSu=SU2,safSg=TEST_SG_2N,safApp=TEST_APP (42) 
Apr 10 11:00:59.280752 osafamfd [479:sg_2n_fsm.cc:2562] << susi_success: rc:1 
Apr 10 11:00:59.280755 osafamfd [479:sgproc.cc:1405] << avd_su_si_assign_evh 

9) SC-1 comes up, all SIs are assigned STANDBY 
Apr 10 11:01:21 SC-1 opensafd: Starting OpenSAF Services (Using TCP) 
... 
Apr 10 11:01:24 SC-1 osafamfnd [490] : NO Assigning 'safSi=DBO,safApp=TEST_APP' 
STANDBY to 'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP' 
Apr 10 11:01:24 SC-1 osafamfnd [490] : NO Assigned 'safSi=DBO,safApp=TEST_APP' 
STANDBY to 'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP' 
... 
Apr 10 11:01:24 SC-1 osafamfnd [490] : NO Assigning 
'safSi=AGENT,safApp=TEST_APP' STANDBY to 
'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP' 
Apr 10 11:01:24 SC-1 osafamfnd [490] : NO Assigned 
'safSi=AGENT,safApp=TEST_APP' STANDBY to 
'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP' 

10) AMFD-SC2 is informed the SU1's STANDBY assignment 
After susi_success(), SG state is still REALIGN 
Apr 10 11:01:24.345208 osafamfd [479:sgproc.cc:0889] >> avd_su_si_assign_evh: 
id:115, node:2010f, act:2, 'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP', 
'safSi=AGENT,safApp=TEST_APP', ha:2, err:1, single:0 
... 
Apr 10 11:01:24.345666 osafamfd [479:sg_2n_fsm.cc:2361] >> susi_success: 
'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP' act=2, hastate=2, sg_fsm_state=1 
Apr 10 11:01:24.345669 osafamfd [479:sg_2n_fsm.cc:1446] >> 
susi_success_sg_realign: 'safSu=SU1,safSg=TEST_SG_2N,safApp=TEST_APP' act=2, 
state=2 
Apr 10 11:01:24.345672 osafamfd [479:sg_2n_fsm.cc:1865] << 
susi_success_sg_realign: rc:1 
Apr 10 11:01:24.345674 osafamfd [479:sg_2n_fsm.cc:2562] << susi_success: rc:1 
Apr 10 11:01:24.345678 osafamfd [479:sgproc.cc:1405] << avd_su_si_assign_evh 

11) Finally, failed to swap again 
Apr 10 11:03:23.304988 osafamfd [479:si.cc:0821] >> si_admin_op_cb: 
safSi=AFD,safApp=TEST_APP op=7 
Apr 10 11:03:23.304997 osafamfd [479:sg_2n_fsm.cc:0757] >> si_swap: 
'safSi=AFD,safApp=TEST_APP' sg_fsm_state=1 
Apr 10 11:03:23.305011 osafamfd [479:sg_2n_fsm.cc:0775] ER 
safSi=AFD,safApp=TEST_APP SWAP failed - SG not stable (1) 
Apr 10 11:03:23.305013 osafamfd [479:sg_2n_fsm.cc:0857] << si_swap: 
sg_fsm_state=1 


Sent from sourceforge.net because you indicated interest in 
https://sourceforge.net/p/opensaf/tickets/1312/ 

To unsubscribe from further messages, please visit 
https://sourceforge.net/auth/subscriptions/
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to