Hi, I have conducted the same 9 test cases sent on Mar 2 (in review response) with the patches #1-#4along with attached patches(#9-#13).
The summary of the results: All the 9 test cases have failed except in TC #2, in which stopping PL-4 has worked. ====================================== TC #1: Configuration(Comp recovery is comp failover, saAmfSutDefSUFailover as false) and logs attached(New TC 1) in the ticket. 1. Start SC-1, PL-3 and PL-4. SU1 Act on PL-3 and SU2 Standby on SC-2. 2. Stop SC-1 and kill demo. It goes for comp failover as configured. Ideally, node should reboot. 3. Start SC-1. After cluster timer expires, PL-4 got the following error messages: Mar 4 10:10:15 PM_PL-4 osafamfnd[10290]: CR SU-SI record addition failed, SU= safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : SI=safSi=AmfDemo,safApp=AmfDemo1 Mar 4 10:10:15 PM_PL-4 osafamfnd[10290]: CR SU-SI record addition failed, SU= safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : SI=safSi=AmfDemo1,safApp=AmfDemo1 There is no assignment given for SU1. SU2 has Standby assignments: safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1 saAmfSISUHAState=STANDBY(2) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1 saAmfSISUHAState=STANDBY(2) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) Other problems: a.) Further command for locking SU1/SU2 fails in SG unstable error. b.) Immlist if SU2 gives the below result, Standby assignment it prints as 4, which is wrong: saAmfSUNumCurrStandbySIs SA_UINT32_T 4 (0x4) saAmfSUNumCurrActiveSIs SA_UINT32_T 0 (0x0) c.) Even if SC-2 joins, and you do failover/switchover of SC-1, still same as above. TC #2: After execution of TC #1, stop PL-3. In worst case, SU2 assignment should change to Act, which is not happening. SU2 still holds Standby assignment: safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1 saAmfSISUHAState=STANDBY(2) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1 saAmfSISUHAState=STANDBY(2) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) Failure message same as above TC #1: Mar 4 10:40:18 PM_PL-4 osafamfnd[12749]: CR SU-SI record addition failed, SU= safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : SI=safSi=AmfDemo,safApp=AmfDemo1 Mar 4 10:40:18 PM_PL-4 osafamfnd[12749]: CR SU-SI record addition failed, SU= safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : SI=safSi=AmfDemo1,safApp=AmfDemo1 But after stopping of PL-4, Assignments are gone, which is good. I am able to lock/unlock the SU1. The configuration and logs attached(New TC 2). TC #3: After TC #2(before stopping PL-4), start PL-3 and start SC-2. SU1 is instantiated, but no assignment and the same problem as above. When stop PL-4, SU1 gets Act assignments, the following logs comes at SC-2: Mar 4 10:59:22 PM_SC-2 osafamfd[11449]: ER avd_ckpt_siass: safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 safSi=AmfDemo,safApp=AmfDemo1 does not exist Mar 4 10:59:22 PM_SC-2 osafamfd[11449]: ER avd_ckpt_siass: safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 safSi=AmfDemo1,safApp=AmfDemo1 does not exist Start PL-4, SU2 gets Standby assignments and everything works fine after that. The configuration and logs attached(New TC 3) in the ticket. TC #4: Similar problems exist in the following test cases: a.) Configuration same as TC #1 except saAmfSutDefSUFailover as true. After killing demo, PL-3 went for reboot. But the problem is the same as shown in TC #1. The configuration and logs attached(New TC 4.a) in the ticket. b.) Configuration same as TC #1 except with saAmfCtDefRecoveryOnError as 2 and saAmfCtDefDisableRestart as 1. But the problem is the same as shown in TC #1, TC #2 and TC #3. The configuration and logs attached(New TC 4.b) in the ticket. c.) I didn't run it, but as I guess it will have same problem as 4.a. TC #5: Configuration same as TC #1 except with saAmfCtDefRecoveryOnError as 2. Configuration and logs(New TC 5) attached in ticket. 1. Start SC-1, PL-3 and PL-4. SU1 Act on PL-3 and SU2 Standby on PL-4. 2. Stop SC-1 and kill demo. It goes for comp restart as configured. 3. Start SC-1. After SC-1 comes up and before cluster timer expires, stop PL-3: Even if PL-3 is stopped(see below PL-3 is not available), SU1 is still having Act assignment and SU2 is having Standby assignment: PM_SC-1:/home/nagu/views/staging # date Fri Mar 4 11:26:21 IST 2016 PM_SC-1:/home/nagu/views/staging # amf-state siass safSISU=safSu=SU1\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1 saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1 saAmfSISUHAState=STANDBY(2) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU1\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1 saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1 saAmfSISUHAState=STANDBY(2) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) TC #6: After TC #5, start PL-3: SU1 is not given any assignment (may be because it exists in Amfd db): Mar 4 11:30:00 PM_PL-3 osafamfnd[29869]: NO 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' Presence State INSTANTIATING => INSTANTIATED Mar 4 11:30:00 PM_PL-3 osafamfnd[29869]: NO Assigning 'safSi=NoRed4,safApp=OpenSAF' ACTIVE to 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' Mar 4 11:30:00 PM_PL-3 osafamfnd[29869]: NO Assigned 'safSi=NoRed4,safApp=OpenSAF' ACTIVE to 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' Mar 4 11:30:00 PM_PL-3 osafamfnd[29869]: NO 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State UNINSTANTIATED => INSTANTIATING Mar 4 11:30:00 PM_PL-3 opensafd: OpenSAF(5.0.M0 - 7282:4fbffe857512:) services successfully started Mar 4 11:30:00 PM_PL-3 amf_demo[29947]: 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' started Mar 4 11:30:00 PM_PL-3 osafamfnd[29869]: NO 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State INSTANTIATING => INSTANTIATED Mar 4 11:30:00 PM_PL-3 amf_demo[29947]: HC started with AMF Mar 4 11:30:00 PM_PL-3 amf_demo[29947]: Registered with AMF Mar 4 11:30:00 PM_PL-3 amf_demo[29947]: Health check 1 TC #7: After TC #6: Lock SU1: Amfnd of PL-3 throws error: Mar 4 11:53:50 PM_PL-3 osafamfnd[31064]: ER susi_assign_evh: 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' has no assignments This is obvious because, Amfnd doesn't have any assignment. SU1 admin state is locked, but SUSI is being shown on SU1. TC #8: After TC #7: Lock SU1, it throws error: Mar 4 11:59:51.406386 osafamfd [8859:su.cc:1146] >> su_admin_op_cb: 60129542146, 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1', 2 Mar 4 11:59:51.406401 osafamfd [8859:imm.cc:1998] >> report_admin_op_error: inv:60129542146, res:6, Error String: 'SG state is not stable' TC #9: Same as TC #6 except Configure saAmfCtDefRecoveryOnError as Node Switchover/Failover/Failfast. The problem reported in TC #4 exists. Thanks -Nagu > -----Original Message----- > From: Nagendra Kumar > Sent: 02 March 2016 20:42 > To: Minh Hon Chau; hans.nordeb...@ericsson.com; > gary....@dektech.com.au; Praveen Malviya > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [devel] [PATCH 01 of 15] amfd: Add support for cloud resilience > at common libs [#1620] > > #1 I have applied patches #1 to #4 only. With this patches(not having patch > #6), I thought to have passed most of the following tests, but they got > failed(Listed below). > > > > I could not test other scenarios (including alarms and notifications), because > I haven't applied patch #6. I think there should be a simple patch replacing > patch #6, which handles transient state as 'reboot the node' if Amf finds SUSI > in transient state on that node. > > I am attaching a concept patch(assignment_recovery.patch), which pass > some of the scenarios and we are testing and enhancing it. > > As Praveen has suggested that we need to reboot the node which is > undergoing in transient state to make it simple. > > This patch reduces complexity and maintainability. > > > > So, ACK for patch #1-#4 along with the attached patch. > > Please note that the attached patch has been created on patch #6 of yours, > so please apply #1 to #4 and then #6 and then the attached patch. > > Currently the patch is for 2N red model. We are working to make for Nway > Act and No red model (and possibly for Nway and NpM), we will publish it > tomorrow. > > > > TC #1: > > Configuration(Comp recovery is comp failover, saAmfSutDefSUFailover as > false) and logs attached(TC 1) in the ticket. > > 1. Start SC-1, PL-3 and PL-4. SU1 Act on PL-3 and SU2 Standby on SC-2. > > 2. Stop SC-1 and kill demo. It goes for comp failover as configured. Ideally, > node should reboot. > > 3. Start SC-1. After cluster timer expires, PL-4 got the following error > messages: > > > > Mar 2 08:01:15 PM_PL-4 osafamfnd[20050]: CR SU-SI record addition failed, > SU= safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : > SI=safSi=AmfDemo,safApp=AmfDemo1 > > Mar 2 08:01:15 PM_PL-4 osafamfnd[20050]: CR SU-SI record addition failed, > SU= safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : > SI=safSi=AmfDemo1,safApp=AmfDemo1 > > > > There is no assignment given for SU1. SU2 has Standby assignments: > > safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,s > afApp=AmfDemo1 > > saAmfSISUHAState=STANDBY(2) > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1 > ,safApp=AmfDemo1 > > saAmfSISUHAState=STANDBY(2) > > > > Other problems: a.) Further command for locking SU1/SU2 fails in SG > unstable error. > > b.) Immlist if SU2 gives the below result, > Standby > assignment it prints as 4, which is wrong: > > saAmfSUNumCurrStandbySIs > SA_UINT32_T > 4 (0x4) > > saAmfSUNumCurrActiveSIs > SA_UINT32_T 0 > (0x0) > > c.) Even if SC-2 joins, and you do > failover/switchover of SC- > 1, still same as above. > > > > TC #2: After execution of TC #1, stop PL-3. In worst case, SU2 assignment > should change to Act, which is not happening. After stopping of PL-4 also, > the same problems as TC #1. logs attached(TC 2). > > > > TC #3: After TC #2, start PL-3 and start SC-2. > > SU1 is instantiated, but no assignment and the same problem as > above. > > When stop PL-4, SU1 gets assignments, the following logs > comes at > SC-2: > > > > Mar 2 09:06:18 PM_SC-2 osafamfd[8518]: ER avd_ckpt_siass: > safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 > safSi=AmfDemo,safApp=AmfDemo1 does not exist > > Mar 2 09:06:18 PM_SC-2 osafamfd[8518]: ER avd_ckpt_siass: > safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 > safSi=AmfDemo1,safApp=AmfDemo1 does not exist > > Mar 2 09:06:21 PM_SC-2 kernel: [ 3290.784933] tipc: Resetting link > <1.1.2:eth0-1.1.4:eth0>, peer not responding > > Mar 2 09:06:21 PM_SC-2 kernel: [ 3290.784947] tipc: Lost link <1.1.2:eth0- > 1.1.4:eth0> on network plane A > > Mar 2 09:06:21 PM_SC-2 kernel: [ 3290.784956] tipc: Lost contact with > <1.1.4> > > > > Start PL-4, SU2 gets Standby assignments and everything works fine after > that. > > > > TC #4: Similar problems exist in the following test cases: > > a.) Configuration same as TC #1 except saAmfSutDefSUFailover as true. > > After killing demo, PL-3 went for reboot. > > But the problem is the same as shown in TC #1, TC #2 and TC > #3. > > > > b.) Configuration same as TC #1 except with saAmfCtDefRecoveryOnError > as 2 and saAmfCtDefDisableRestart as 1. > > But the problem is the same as shown in TC #1, TC #2 and TC > #3. > > > > c.) Configuration same as TC #1 except with saAmfCtDefRecoveryOnError > as 2 and saAmfCtDefDisableRestart as 1 and saAmfSutDefSUFailover as 1. > > After killing demo, PL-3 went for reboot. > > But the problem is the same as shown in TC #1, TC #2 and TC > #3. > > > > TC #5: Configuration same as TC #1 except with > saAmfCtDefRecoveryOnError as 2. Configuration and logs(TC 5) attached in > ticket. > > 1. Start SC-1, PL-3 and PL-4. SU1 Act on PL-3 and SU2 Standby on SC-2. > > 2. Stop SC-1 and kill demo. It goes for comp restart as configured. > > 3. Start SC-1. After SC-1 comes up and before cluster timer expires, stop PL- > 3: > > > > Even if PL-3 is stopped(see below PL-3 is not available), SU1 is still having > Act > assignment and SU2 is having Standby assignment: > > > > PM_SC-1:/home/nagu/views/staging # amf-state siass > > safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1 > ,safApp=AmfDemo1 > > saAmfSISUHAState=STANDBY(2) > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,s > afApp=AmfDemo1 > > saAmfSISUHAState=STANDBY(2) > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > safSISU=safSu=SC- > 1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF > > saAmfSISUHAState=ACTIVE(1) > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > safSISU=safSu=PL- > 4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF > > saAmfSISUHAState=ACTIVE(1) > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > safSISU=safSu=SU1\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,s > afApp=AmfDemo1 > > saAmfSISUHAState=ACTIVE(1) > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC- > 2N,safApp=OpenSAF > > saAmfSISUHAState=ACTIVE(1) > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > safSISU=safSu=SU1\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1 > ,safApp=AmfDemo1 > > saAmfSISUHAState=ACTIVE(1) > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > > > TC #6: After TC #5, start PL-3: > > SU1 is not given any assignment (may be because it exists in Amfd db): > > Mar 2 14:22:06 PM_PL-3 osafamfwd[8318]: Started > > Mar 2 14:22:06 PM_PL-3 osafamfnd[8259]: NO 'safSu=PL- > 3,safSg=NoRed,safApp=OpenSAF' Presence State INSTANTIATING => > INSTANTIATED > > Mar 2 14:22:06 PM_PL-3 osafamfnd[8259]: NO Assigning > 'safSi=NoRed2,safApp=OpenSAF' ACTIVE to 'safSu=PL- > 3,safSg=NoRed,safApp=OpenSAF' > > Mar 2 14:22:06 PM_PL-3 osafamfnd[8259]: NO Assigned > 'safSi=NoRed2,safApp=OpenSAF' ACTIVE to 'safSu=PL- > 3,safSg=NoRed,safApp=OpenSAF' > > Mar 2 14:22:06 PM_PL-3 osafamfnd[8259]: NO > 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State > UNINSTANTIATED => INSTANTIATING > > Mar 2 14:22:06 PM_PL-3 opensafd: OpenSAF(5.0.M0 - 7282:4fbffe857512:) > services successfully started > > Mar 2 14:22:06 PM_PL-3 amf_demo[8337]: > 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' > started > > Mar 2 14:22:06 PM_PL-3 osafamfnd[8259]: NO > 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State > INSTANTIATING => INSTANTIATED > > Mar 2 14:22:06 PM_PL-3 amf_demo[8337]: HC started with AMF > > > > TC #7: After TC #6: > > Lock SU1: Amfnd of PL-3 throws error: > > Mar 2 14:23:57 PM_PL-3 osafamfnd[8259]: ER susi_assign_evh: > 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' has no assignments > > > > This is obvious because, Amfnd doesn't have any assignment. > > SU1 admin state is locked, but SUSI is being shown on SU1. > > > > TC #8: After TC #7: > > Lock SU1, it throws error: > > Admin operation is already going on > (su'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 > > > > TC #9: Same as TC #6 except Configure saAmfCtDefRecoveryOnError as > Node Switchover/Failover/Failfast. > > The problem reported in TC #4 exists. > > > > Thanks > > -Nagu > > > > > -----Original Message----- > > > From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] > > > Sent: 25 February 2016 14:14 > > > To: hans.nordeb...@ericsson.com; gary....@dektech.com.au; Nagendra > > > Kumar; Praveen Malviya; minh.c...@dektech.com.au > > > Cc: opensaf-devel@lists.sourceforge.net > > > Subject: [PATCH 01 of 15] amfd: Add support for cloud resilience at > common > > > libs [#1620]
bug_09_13.tgz
Description: Binary data
------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel