---
** [tickets:#503] SU failover was not happening during SI-SI dependency flow.**
**Status:** unassigned
**Created:** Fri Jul 12, 2013 09:56 AM UTC by Sirisha Alla
**Last Updated:** Fri Jul 12, 2013 09:56 AM UTC
**Owner:** nobody
Setup used:-
Oracle Linux Server release 6.4, TCP, PBE enabled, IPV6 address.
4.3 Latest with 002_amf_98.patch and 001_mds_tcp_232.patch
Problem description:-
=============
SU failover was not happening during SI-SI dependency flow.
No standby assignments. Application was having only active assignments.
1) Runtime configure the 2N model with below configuration.
* 3SUs. Each SUs containing 3 PI components.
SU1 was spawned on PL-4, SU2 on PL-3 and SU3 on SC-1.
* saAmfCtDefRecoveryOnError=2
saAmfCtDefDisableRestart=1
saAmfSGAutoRepair=1
saAmfSutDefSUFailover=1
saAmfSUFailover=1 for each SUs.
saAmfSGNumPrefInserviceSUs=4
* Two SIs, SI1 having saAmfSIRank=2 and SI2 having saAmfSIRank=1
2) Performed admin unlock-in first then unlock of each SUs
3) Admin lock both the SIs, SI1 and SI2.
4) Create SI-SI dependency with SI2 as sponser and SI1 its dependent.
immcfg -c SaAmfSIDependency
safDepend="safSi=d_2n_2\,safApp=2nApp,safSi=d_2n_1,safApp=2nApp" -a
saAmfToleranceTime=0
5) Admin unlock both the SIs, SI1 and SI2.
6) Kill COMP1 of SU1 using kill -9 command.
After step6, it was observed that the SUFailover was triggered and cleanup was
successfully called for all the components of SU1 and finally SU1 moves to
UNINSTANTIATED state and SU2 became active but spare SU3 doesn't got any
standby assignments. Since saAmfSGAutoRepair=1 was true, SU1 components again
got instantiated and SU1 presence state moved to INSTANTIATED state
but still no standby assignments. Application was having only the active
assignments for SU2. No standby assignments.
Also observed that when SU2 took the active role, dependent SI1 got the active
assignment first then the sponser SI2 got the active which was wrong.
See the below snippet after COMP1 SU1 fault:-
===========================================
[root@OEL-64BIT-SLOT4 ~]# Jul 11 13:46:43 OEL-64BIT-SLOT4 osafamfnd[18511]: NO
saAmfCompDisableRestart is true for
'safComp=N1,safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp'
Jul 11 13:46:43 OEL-64BIT-SLOT4 osafamfnd[18511]: NO saAmfSUFailover is true
for 'safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp'
Jul 11 13:46:43 OEL-64BIT-SLOT4 osafamfnd[18511]: NO
'safComp=N1,safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp' faulted due to 'avaDown' :
Recovery is 'suFailover'
Jul 11 13:46:43 OEL-64BIT-SLOT4 osafamfnd[18511]: NO Terminating components of
'safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp'(abruptly & unordered)
Jul 11 13:46:43 OEL-64BIT-SLOT4 osafamfnd[18511]: NO
'safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp' Presence State INSTANTIATED =>
TERMINATING
Jul 11 13:46:43 OEL-64BIT-SLOT4 root: CLC-CLI cleanup has been spawned for
safComp=N3,safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp
Jul 11 13:46:43 OEL-64BIT-SLOT4 root: CLC-CLI cleanup has been spawned for
safComp=N2,safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp
Jul 11 13:46:43 OEL-64BIT-SLOT4 root: CLC-CLI cleanup has been spawned for
safComp=N1,safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp
Jul 11 13:46:43 OEL-64BIT-SLOT4 osafamfnd[18511]: NO
'safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp' Presence State TERMINATING =>
UNINSTANTIATED
Jul 11 13:46:43 OEL-64BIT-SLOT4 osafamfnd[18511]: NO Terminated all components
in 'safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp'
Jul 11 13:46:43 OEL-64BIT-SLOT4 osafamfnd[18511]: NO Informing director of
sufailover
Jul 11 13:46:43 OEL-64BIT-SLOT4 osafamfnd[18511]: NO
'safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp' Presence State UNINSTANTIATED =>
UNINSTANTIATED
Jul 11 13:46:43 OEL-64BIT-SLOT4 osafamfnd[18511]: ER cannot unlink failed state
file /var/run/opensaf/amf_failed_state: No such file or directory
Jul 11 13:46:43 OEL-64BIT-SLOT4 osafamfnd[18511]: NO
'safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp' Presence State UNINSTANTIATED =>
INSTANTIATING
Jul 11 13:46:43 OEL-64BIT-SLOT4 root: CLC-CLI instantiate has been spawned for
safComp=N3,safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp
Jul 11 13:46:43 OEL-64BIT-SLOT4 root: CLC-CLI instantiate has been spawned for
safComp=N2,safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp
Jul 11 13:46:44 OEL-64BIT-SLOT4 root: CLC-CLI instantiate has been spawned for
safComp=N1,safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp
Jul 11 13:46:44 OEL-64BIT-SLOT4 osafamfnd[18511]: NO
'safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp' Presence State INSTANTIATING =>
INSTANTIATED
[root@OEL-64BIT-SLOT1 framework]# amf-state siass ha
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=d_2n_2\,safSg=SG_d_2n\,safApp=2nApp,safSi=d_2n_1,safApp=2nApp
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=d_2n_2\,safSg=SG_d_2n\,safApp=2nApp,safSi=d_2n_2,safApp=2nApp
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSu=d_2n_3,safSg=SG_d_2n,safApp=2nApp
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=IN-SERVICE(2)
safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=IN-SERVICE(2)
safSu=d_2n_2,safSg=SG_d_2n,safApp=2nApp
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=IN-SERVICE(2)
SU2 COMPONENT LOG:-
===================
[root@OEL-64BIT-SLOT3 amftresult]# cat
amf_demo_safComp\=N1\,safSu\=d_2n_2\,safSg\=SG_d_2n\,safApp\=2nApp.log
############ CSI SET CALLBACK ####################
in amf csi set callback Invocation: 4265607178
compName: safComp=N1,safSu=d_2n_2,safSg=SG_d_2n,safApp=2nApp
haState: 1 : CSISET_ACTIVE_CALLBACK
csiName: safCsi=Norm1,safSi=d_2n_1,safApp=2nApp
[1373530601.292845964] SockClient: Sending Packet: ('data',
'safComp=N1,safSu=d_2n_2,safSg=SG_d_2n,safApp=2nApp', {'DATA':
'CSISET_ACTIVE_CALLBACK', 'csiName': 'safCsi=Norm1,safSi=d_2n_1,safApp=2nApp',
'prevCsiAssigned': ['safCsi=Norm1,safSi=d_2n_2,safApp=2nApp',
'safCsi=Norm1,safSi=d_2n_1,safApp=2nApp',
'safCsi=Norm1,safSi=d_2n_2,safApp=2nApp',
'safCsi=Norm1,safSi=d_2n_1,safApp=2nApp']})
cmdResponseDelay: 0
cmdResponse: 1
cmdCsiQiescingCompleteDelay: 0
cmdCsiQiescingCompleteResponse: 1
sleeping for saAmfResponse_4: 0 secs
sending response for saAmfResponse_4: 1
[1373530601.294462919] sending response
Invoking the function <built-in function saAmfResponse_4> with the arguments
(4289724417, 4265607178L, None, 1)
('Return Value of the function : <--', 1)
in amf csi set callback csiAttrList.number 0
{}
{'csiAttr': , 'csiStateDescriptor': {'standbyDescriptor':
<saAmf.SaAmfCSIStandbyDescriptorT; proxy of <Swig Object of type
'SaAmfCSIStandbyDescriptorT *' at 0x7f758644c1e8> >, 'activeDescriptor':
<saAmf.SaAmfCSIActiveDescriptorT; proxy of <Swig Object of type
'SaAmfCSIActiveDescriptorT *' at 0x7f758644c1e8> >}, 'csiFlags': 2, 'csiName':
safCsi=Norm1,safSi=d_2n_1,safApp=2nApp}
#################################################
############ CSI SET CALLBACK ####################
in amf csi set callback Invocation: 4267704334
compName: safComp=N1,safSu=d_2n_2,safSg=SG_d_2n,safApp=2nApp
haState: 1 : CSISET_ACTIVE_CALLBACK
csiName: safCsi=Norm1,safSi=d_2n_2,safApp=2nApp
[1373530601.304487944] SockClient: Sending Packet: ('data',
'safComp=N1,safSu=d_2n_2,safSg=SG_d_2n,safApp=2nApp', {'DATA':
'CSISET_ACTIVE_CALLBACK', 'csiName': 'safCsi=Norm1,safSi=d_2n_2,safApp=2nApp',
'prevCsiAssigned': ['safCsi=Norm1,safSi=d_2n_2,safApp=2nApp',
'safCsi=Norm1,safSi=d_2n_1,safApp=2nApp',
'safCsi=Norm1,safSi=d_2n_2,safApp=2nApp',
'safCsi=Norm1,safSi=d_2n_1,safApp=2nApp',
'safCsi=Norm1,safSi=d_2n_1,safApp=2nApp']})
cmdResponseDelay: 0
cmdResponse: 1
cmdCsiQiescingCompleteDelay: 0
cmdCsiQiescingCompleteResponse: 1
sleeping for saAmfResponse_4: 0 secs
sending response for saAmfResponse_4: 1
[1373530601.314575911] sending response
Invoking the function <built-in function saAmfResponse_4> with the arguments
(4289724417, 4267704334L, None, 1)
('Return Value of the function : <--', 1)
in amf csi set callback csiAttrList.number 0
{}
{'csiAttr': , 'csiStateDescriptor': {'standbyDescriptor':
<saAmf.SaAmfCSIStandbyDescriptorT; proxy of <Swig Object of type
'SaAmfCSIStandbyDescriptorT *' at 0x7f758644c1e8> >, 'activeDescriptor':
<saAmf.SaAmfCSIActiveDescriptorT; proxy of <Swig Object of type
'SaAmfCSIActiveDescriptorT *' at 0x7f758644c1e8> >}, 'csiFlags': 2, 'csiName':
safCsi=Norm1,safSi=d_2n_2,safApp=2nApp}
#################################################
Further lock of sponsor SI2 was returned with timeout and /var/log/messages was
printing the below messages:-
[root@OEL-64BIT-SLOT1 framework]# amf-adm lock safSi=d_2n_2,safApp=2nApp
error - command timed out (alarm)
[root@OEL-64BIT-SLOT1 ~]# Jul 11 13:49:45 OEL-64BIT-SLOT1 osafamfd[27241]: WA
SI lock of safSi=d_2n_2,safApp=2nApp failed, SG not stable
Jul 11 13:49:45 OEL-64BIT-SLOT1 osafamfd[27241]: WA 'safSi=d_2n_2,safApp=2nApp'
other semantics...
Jul 11 13:49:46 OEL-64BIT-SLOT1 osafamfd[27241]: WA SI lock of
safSi=d_2n_2,safApp=2nApp failed, SG not stable
Jul 11 13:49:46 OEL-64BIT-SLOT1 osafamfd[27241]: WA 'safSi=d_2n_2,safApp=2nApp'
other semantics...
Jul 11 13:49:47 OEL-64BIT-SLOT1 osafamfd[27241]: WA SI lock of
safSi=d_2n_2,safApp=2nApp failed, SG not stable
Jul 11 13:49:47 OEL-64BIT-SLOT1 osafamfd[27241]: WA 'safSi=d_2n_2,safApp=2nApp'
other semantics...
Jul 11 13:49:49 OEL-64BIT-SLOT1 osafamfd[27241]: WA SI lock of
safSi=d_2n_2,safApp=2nApp failed, SG not stable
Jul 11 13:49:49 OEL-64BIT-SLOT1 osafamfd[27241]: WA 'safSi=d_2n_2,safApp=2nApp'
other semantics...
Jul 11 13:49:50 OEL-64BIT-SLOT1 osafamfd[27241]: WA SI lock of
safSi=d_2n_2,safApp=2nApp failed, SG not stable
Jul 11 13:49:50 OEL-64BIT-SLOT1 osafamfd[27241]: WA 'safSi=d_2n_2,safApp=2nApp'
other semantics...
Jul 11 13:49:51 OEL-64BIT-SLOT1 osafamfd[27241]: WA SI lock of
safSi=d_2n_2,safApp=2nApp failed, SG not stable
Jul 11 13:49:51 OEL-64BIT-SLOT1 osafamfd[27241]: WA 'safSi=d_2n_2,safApp=2nApp'
other semantics...
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets