Hi,
I have conducted the same 9 test cases sent on Mar 2 (in review
response) with the patches #1-#4along with attached patches(#9-#13).
The summary of the results: All the 9 test cases have failed except in TC #2,
in which stopping PL-4 has worked.
======================================
TC #1: Configuration(Comp recovery is comp failover, saAmfSutDefSUFailover as
false) and logs attached(New TC 1) in the ticket.
1. Start SC-1, PL-3 and PL-4. SU1 Act on PL-3 and SU2 Standby on SC-2.
2. Stop SC-1 and kill demo. It goes for comp failover as configured. Ideally,
node should reboot.
3. Start SC-1. After cluster timer expires, PL-4 got the following error
messages:
Mar 4 10:10:15 PM_PL-4 osafamfnd[10290]: CR SU-SI record addition failed, SU=
safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : SI=safSi=AmfDemo,safApp=AmfDemo1
Mar 4 10:10:15 PM_PL-4 osafamfnd[10290]: CR SU-SI record addition failed, SU=
safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : SI=safSi=AmfDemo1,safApp=AmfDemo1
There is no assignment given for SU1. SU2 has Standby assignments:
safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1
saAmfSISUHAState=STANDBY(2)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1
saAmfSISUHAState=STANDBY(2)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
Other problems: a.) Further command for locking SU1/SU2 fails in SG unstable
error.
b.) Immlist if SU2 gives the below result,
Standby assignment it prints as 4, which is wrong:
saAmfSUNumCurrStandbySIs SA_UINT32_T 4 (0x4)
saAmfSUNumCurrActiveSIs SA_UINT32_T 0 (0x0)
c.) Even if SC-2 joins, and you do
failover/switchover of SC-1, still same as above.
TC #2: After execution of TC #1, stop PL-3. In worst case, SU2 assignment
should change to Act, which is not happening. SU2 still holds Standby
assignment:
safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1
saAmfSISUHAState=STANDBY(2)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1
saAmfSISUHAState=STANDBY(2)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
Failure message same as above TC #1:
Mar 4 10:40:18 PM_PL-4 osafamfnd[12749]: CR SU-SI record addition failed, SU=
safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : SI=safSi=AmfDemo,safApp=AmfDemo1
Mar 4 10:40:18 PM_PL-4 osafamfnd[12749]: CR SU-SI record addition failed, SU=
safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : SI=safSi=AmfDemo1,safApp=AmfDemo1
But after stopping of PL-4, Assignments are gone, which is good. I am able to
lock/unlock the SU1.
The configuration and logs attached(New TC 2).
TC #3: After TC #2(before stopping PL-4), start PL-3 and start SC-2.
SU1 is instantiated, but no assignment and the same problem as
above.
When stop PL-4, SU1 gets Act assignments, the following logs
comes at SC-2:
Mar 4 10:59:22 PM_SC-2 osafamfd[11449]: ER avd_ckpt_siass:
safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 safSi=AmfDemo,safApp=AmfDemo1 does not
exist
Mar 4 10:59:22 PM_SC-2 osafamfd[11449]: ER avd_ckpt_siass:
safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 safSi=AmfDemo1,safApp=AmfDemo1 does not
exist
Start PL-4, SU2 gets Standby assignments and everything works fine after that.
The configuration and logs attached(New TC 3) in the ticket.
TC #4: Similar problems exist in the following test cases:
a.) Configuration same as TC #1 except saAmfSutDefSUFailover as true.
After killing demo, PL-3 went for reboot.
But the problem is the same as shown in TC #1.
The configuration and logs attached(New TC 4.a) in the ticket.
b.) Configuration same as TC #1 except with saAmfCtDefRecoveryOnError as 2
and saAmfCtDefDisableRestart as 1.
But the problem is the same as shown in TC #1, TC #2 and TC #3.
The configuration and logs attached(New TC 4.b) in the ticket.
c.) I didn't run it, but as I guess it will have same problem as 4.a.
TC #5: Configuration same as TC #1 except with saAmfCtDefRecoveryOnError as
2. Configuration and logs(New TC 5) attached in ticket.
1. Start SC-1, PL-3 and PL-4. SU1 Act on PL-3 and SU2 Standby on PL-4.
2. Stop SC-1 and kill demo. It goes for comp restart as configured.
3. Start SC-1. After SC-1 comes up and before cluster timer expires, stop PL-3:
Even if PL-3 is stopped(see below PL-3 is not available), SU1 is still having
Act assignment and SU2 is having Standby assignment:
PM_SC-1:/home/nagu/views/staging # date
Fri Mar 4 11:26:21 IST 2016
PM_SC-1:/home/nagu/views/staging # amf-state siass
safSISU=safSu=SU1\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1
saAmfSISUHAState=STANDBY(2)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU1\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1
saAmfSISUHAState=STANDBY(2)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
TC #6: After TC #5, start PL-3:
SU1 is not given any assignment (may be because it exists in Amfd db):
Mar 4 11:30:00 PM_PL-3 osafamfnd[29869]: NO
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' Presence State INSTANTIATING =>
INSTANTIATED
Mar 4 11:30:00 PM_PL-3 osafamfnd[29869]: NO Assigning
'safSi=NoRed4,safApp=OpenSAF' ACTIVE to 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF'
Mar 4 11:30:00 PM_PL-3 osafamfnd[29869]: NO Assigned
'safSi=NoRed4,safApp=OpenSAF' ACTIVE to 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF'
Mar 4 11:30:00 PM_PL-3 osafamfnd[29869]: NO
'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State UNINSTANTIATED =>
INSTANTIATING
Mar 4 11:30:00 PM_PL-3 opensafd: OpenSAF(5.0.M0 - 7282:4fbffe857512:) services
successfully started
Mar 4 11:30:00 PM_PL-3 amf_demo[29947]:
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' started
Mar 4 11:30:00 PM_PL-3 osafamfnd[29869]: NO
'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State INSTANTIATING =>
INSTANTIATED
Mar 4 11:30:00 PM_PL-3 amf_demo[29947]: HC started with AMF
Mar 4 11:30:00 PM_PL-3 amf_demo[29947]: Registered with AMF
Mar 4 11:30:00 PM_PL-3 amf_demo[29947]: Health check 1
TC #7: After TC #6:
Lock SU1: Amfnd of PL-3 throws error:
Mar 4 11:53:50 PM_PL-3 osafamfnd[31064]: ER susi_assign_evh:
'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' has no assignments
This is obvious because, Amfnd doesn't have any assignment.
SU1 admin state is locked, but SUSI is being shown on SU1.
TC #8: After TC #7:
Lock SU1, it throws error:
Mar 4 11:59:51.406386 osafamfd [8859:su.cc:1146] >> su_admin_op_cb:
60129542146, 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1', 2
Mar 4 11:59:51.406401 osafamfd [8859:imm.cc:1998] >> report_admin_op_error:
inv:60129542146, res:6, Error String: 'SG state is not stable'
TC #9: Same as TC #6 except Configure saAmfCtDefRecoveryOnError as Node
Switchover/Failover/Failfast.
The problem reported in TC #4 exists.
Thanks
-Nagu
> -----Original Message-----
> From: Nagendra Kumar
> Sent: 02 March 2016 20:42
> To: Minh Hon Chau; [email protected];
> [email protected]; Praveen Malviya
> Cc: [email protected]
> Subject: Re: [devel] [PATCH 01 of 15] amfd: Add support for cloud resilience
> at common libs [#1620]
>
> #1 I have applied patches #1 to #4 only. With this patches(not having patch
> #6), I thought to have passed most of the following tests, but they got
> failed(Listed below).
>
>
>
> I could not test other scenarios (including alarms and notifications), because
> I haven't applied patch #6. I think there should be a simple patch replacing
> patch #6, which handles transient state as 'reboot the node' if Amf finds SUSI
> in transient state on that node.
>
> I am attaching a concept patch(assignment_recovery.patch), which pass
> some of the scenarios and we are testing and enhancing it.
>
> As Praveen has suggested that we need to reboot the node which is
> undergoing in transient state to make it simple.
>
> This patch reduces complexity and maintainability.
>
>
>
> So, ACK for patch #1-#4 along with the attached patch.
>
> Please note that the attached patch has been created on patch #6 of yours,
> so please apply #1 to #4 and then #6 and then the attached patch.
>
> Currently the patch is for 2N red model. We are working to make for Nway
> Act and No red model (and possibly for Nway and NpM), we will publish it
> tomorrow.
>
>
>
> TC #1:
>
> Configuration(Comp recovery is comp failover, saAmfSutDefSUFailover as
> false) and logs attached(TC 1) in the ticket.
>
> 1. Start SC-1, PL-3 and PL-4. SU1 Act on PL-3 and SU2 Standby on SC-2.
>
> 2. Stop SC-1 and kill demo. It goes for comp failover as configured. Ideally,
> node should reboot.
>
> 3. Start SC-1. After cluster timer expires, PL-4 got the following error
> messages:
>
>
>
> Mar 2 08:01:15 PM_PL-4 osafamfnd[20050]: CR SU-SI record addition failed,
> SU= safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 :
> SI=safSi=AmfDemo,safApp=AmfDemo1
>
> Mar 2 08:01:15 PM_PL-4 osafamfnd[20050]: CR SU-SI record addition failed,
> SU= safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 :
> SI=safSi=AmfDemo1,safApp=AmfDemo1
>
>
>
> There is no assignment given for SU1. SU2 has Standby assignments:
>
> safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,s
> afApp=AmfDemo1
>
> saAmfSISUHAState=STANDBY(2)
>
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1
> ,safApp=AmfDemo1
>
> saAmfSISUHAState=STANDBY(2)
>
>
>
> Other problems: a.) Further command for locking SU1/SU2 fails in SG
> unstable error.
>
> b.) Immlist if SU2 gives the below result,
> Standby
> assignment it prints as 4, which is wrong:
>
> saAmfSUNumCurrStandbySIs
> SA_UINT32_T
> 4 (0x4)
>
> saAmfSUNumCurrActiveSIs
> SA_UINT32_T 0
> (0x0)
>
> c.) Even if SC-2 joins, and you do
> failover/switchover of SC-
> 1, still same as above.
>
>
>
> TC #2: After execution of TC #1, stop PL-3. In worst case, SU2 assignment
> should change to Act, which is not happening. After stopping of PL-4 also,
> the same problems as TC #1. logs attached(TC 2).
>
>
>
> TC #3: After TC #2, start PL-3 and start SC-2.
>
> SU1 is instantiated, but no assignment and the same problem as
> above.
>
> When stop PL-4, SU1 gets assignments, the following logs
> comes at
> SC-2:
>
>
>
> Mar 2 09:06:18 PM_SC-2 osafamfd[8518]: ER avd_ckpt_siass:
> safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1
> safSi=AmfDemo,safApp=AmfDemo1 does not exist
>
> Mar 2 09:06:18 PM_SC-2 osafamfd[8518]: ER avd_ckpt_siass:
> safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1
> safSi=AmfDemo1,safApp=AmfDemo1 does not exist
>
> Mar 2 09:06:21 PM_SC-2 kernel: [ 3290.784933] tipc: Resetting link
> <1.1.2:eth0-1.1.4:eth0>, peer not responding
>
> Mar 2 09:06:21 PM_SC-2 kernel: [ 3290.784947] tipc: Lost link <1.1.2:eth0-
> 1.1.4:eth0> on network plane A
>
> Mar 2 09:06:21 PM_SC-2 kernel: [ 3290.784956] tipc: Lost contact with
> <1.1.4>
>
>
>
> Start PL-4, SU2 gets Standby assignments and everything works fine after
> that.
>
>
>
> TC #4: Similar problems exist in the following test cases:
>
> a.) Configuration same as TC #1 except saAmfSutDefSUFailover as true.
>
> After killing demo, PL-3 went for reboot.
>
> But the problem is the same as shown in TC #1, TC #2 and TC
> #3.
>
>
>
> b.) Configuration same as TC #1 except with saAmfCtDefRecoveryOnError
> as 2 and saAmfCtDefDisableRestart as 1.
>
> But the problem is the same as shown in TC #1, TC #2 and TC
> #3.
>
>
>
> c.) Configuration same as TC #1 except with saAmfCtDefRecoveryOnError
> as 2 and saAmfCtDefDisableRestart as 1 and saAmfSutDefSUFailover as 1.
>
> After killing demo, PL-3 went for reboot.
>
> But the problem is the same as shown in TC #1, TC #2 and TC
> #3.
>
>
>
> TC #5: Configuration same as TC #1 except with
> saAmfCtDefRecoveryOnError as 2. Configuration and logs(TC 5) attached in
> ticket.
>
> 1. Start SC-1, PL-3 and PL-4. SU1 Act on PL-3 and SU2 Standby on SC-2.
>
> 2. Stop SC-1 and kill demo. It goes for comp restart as configured.
>
> 3. Start SC-1. After SC-1 comes up and before cluster timer expires, stop PL-
> 3:
>
>
>
> Even if PL-3 is stopped(see below PL-3 is not available), SU1 is still having
> Act
> assignment and SU2 is having Standby assignment:
>
>
>
> PM_SC-1:/home/nagu/views/staging # amf-state siass
>
> safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1
> ,safApp=AmfDemo1
>
> saAmfSISUHAState=STANDBY(2)
>
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,s
> afApp=AmfDemo1
>
> saAmfSISUHAState=STANDBY(2)
>
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> safSISU=safSu=SC-
> 1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
>
> saAmfSISUHAState=ACTIVE(1)
>
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> safSISU=safSu=PL-
> 4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF
>
> saAmfSISUHAState=ACTIVE(1)
>
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> safSISU=safSu=SU1\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,s
> afApp=AmfDemo1
>
> saAmfSISUHAState=ACTIVE(1)
>
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-
> 2N,safApp=OpenSAF
>
> saAmfSISUHAState=ACTIVE(1)
>
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> safSISU=safSu=SU1\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1
> ,safApp=AmfDemo1
>
> saAmfSISUHAState=ACTIVE(1)
>
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
>
>
> TC #6: After TC #5, start PL-3:
>
> SU1 is not given any assignment (may be because it exists in Amfd db):
>
> Mar 2 14:22:06 PM_PL-3 osafamfwd[8318]: Started
>
> Mar 2 14:22:06 PM_PL-3 osafamfnd[8259]: NO 'safSu=PL-
> 3,safSg=NoRed,safApp=OpenSAF' Presence State INSTANTIATING =>
> INSTANTIATED
>
> Mar 2 14:22:06 PM_PL-3 osafamfnd[8259]: NO Assigning
> 'safSi=NoRed2,safApp=OpenSAF' ACTIVE to 'safSu=PL-
> 3,safSg=NoRed,safApp=OpenSAF'
>
> Mar 2 14:22:06 PM_PL-3 osafamfnd[8259]: NO Assigned
> 'safSi=NoRed2,safApp=OpenSAF' ACTIVE to 'safSu=PL-
> 3,safSg=NoRed,safApp=OpenSAF'
>
> Mar 2 14:22:06 PM_PL-3 osafamfnd[8259]: NO
> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State
> UNINSTANTIATED => INSTANTIATING
>
> Mar 2 14:22:06 PM_PL-3 opensafd: OpenSAF(5.0.M0 - 7282:4fbffe857512:)
> services successfully started
>
> Mar 2 14:22:06 PM_PL-3 amf_demo[8337]:
> 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
> started
>
> Mar 2 14:22:06 PM_PL-3 osafamfnd[8259]: NO
> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State
> INSTANTIATING => INSTANTIATED
>
> Mar 2 14:22:06 PM_PL-3 amf_demo[8337]: HC started with AMF
>
>
>
> TC #7: After TC #6:
>
> Lock SU1: Amfnd of PL-3 throws error:
>
> Mar 2 14:23:57 PM_PL-3 osafamfnd[8259]: ER susi_assign_evh:
> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' has no assignments
>
>
>
> This is obvious because, Amfnd doesn't have any assignment.
>
> SU1 admin state is locked, but SUSI is being shown on SU1.
>
>
>
> TC #8: After TC #7:
>
> Lock SU1, it throws error:
>
> Admin operation is already going on
> (su'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
>
>
>
> TC #9: Same as TC #6 except Configure saAmfCtDefRecoveryOnError as
> Node Switchover/Failover/Failfast.
>
> The problem reported in TC #4 exists.
>
>
>
> Thanks
>
> -Nagu
>
>
>
> > -----Original Message-----
>
> > From: Minh Hon Chau [mailto:[email protected]]
>
> > Sent: 25 February 2016 14:14
>
> > To: [email protected]; [email protected]; Nagendra
>
> > Kumar; Praveen Malviya; [email protected]
>
> > Cc: [email protected]
>
> > Subject: [PATCH 01 of 15] amfd: Add support for cloud resilience at
> common
>
> > libs [#1620]
bug_09_13.tgz
Description: Binary data
------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
