Hi Nagu, Praveen Since #1-#4 have been acked, can you please push them? #5 and #11_2 allows comp/su failover during headless, so we may have to visit them later. However, the patches: #9 #10 #11_1 #12 #13 are bug fixes that does not relate to *delayed failover* and needed for #1-#4. Can you please have a look?
Thanks, Minh On 03/03/16 02:12, Nagendra Kumar wrote: > > #1 I have applied patches #1 to #4 only. With this patches(not having > patch #6), I thought to have passed most of the following tests, but > they got failed(Listed below). > > I could not test other scenarios (including alarms and notifications), > because I haven’t applied patch #6. I think there should be a simple > patch replacing patch #6, which handles transient state as ‘reboot the > node‘ if Amf finds SUSI in transient state on that node. > > I am attaching a concept patch(assignment_recovery.patch), which pass > some of the scenarios and we are testing and enhancing it. > > As Praveen has suggested that we need to reboot the node which is > undergoing in transient state to make it simple. > > This patch reduces complexity and maintainability. > > So, ACK for patch #1-#4 along with the attached patch. > > Please note that the attached patch has been created on patch #6 of > yours, so please apply #1 to #4 and then #6 and then the attached patch. > > Currently the patch is for 2N red model. We are working to make for > Nway Act and No red model (and possibly for Nway and NpM), we will > publish it tomorrow. > > TC #1: > > Configuration(Comp recovery is comp failover, saAmfSutDefSUFailover as > false) and logs attached(TC 1) in the ticket. > > 1. Start SC-1, PL-3 and PL-4. SU1 Act on PL-3 and SU2 Standby on SC-2. > > 2. Stop SC-1 and kill demo. It goes for comp failover as configured. > Ideally, node should reboot. > > 3. Start SC-1. After cluster timer expires, PL-4 got the following > error messages: > > Mar 2 08:01:15 PM_PL-4 osafamfnd[20050]: CR SU-SI record addition > failed, SU= safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : > SI=safSi=AmfDemo,safApp=AmfDemo1 > > Mar 2 08:01:15 PM_PL-4 osafamfnd[20050]: CR SU-SI record addition > failed, SU= safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : > SI=safSi=AmfDemo1,safApp=AmfDemo1 > > There is no assignment given for SU1. SU2 has Standby assignments: > > safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1 > > saAmfSISUHAState=STANDBY(2) > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1 > > saAmfSISUHAState=STANDBY(2) > > Other problems: a.) Further command for locking SU1/SU2 fails in SG > unstable error. > > b.) Immlist if SU2 gives the below > result, Standby assignment it prints as 4, which is wrong: > > saAmfSUNumCurrStandbySIs SA_UINT32_T 4 (0x4) > > saAmfSUNumCurrActiveSIs SA_UINT32_T 0 (0x0) > > c.) Even if SC-2 joins, and you do > failover/switchover of SC-1, still same as above. > > TC #2: After execution of TC #1, stop PL-3. In worst case, SU2 > assignment should change to Act, which is not happening. After > stopping of PL-4 also, the same problems as TC #1. logs attached(TC 2). > > TC #3: After TC #2, start PL-3 and start SC-2. > > SU1 is instantiated, but no assignment and the same > problem as above. > > When stop PL-4, SU1 gets assignments, the following > logs comes at SC-2: > > Mar 2 09:06:18 PM_SC-2 osafamfd[8518]: ER avd_ckpt_siass: > safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 safSi=AmfDemo,safApp=AmfDemo1 > does not exist > > Mar 2 09:06:18 PM_SC-2 osafamfd[8518]: ER avd_ckpt_siass: > safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 safSi=AmfDemo1,safApp=AmfDemo1 > does not exist > > Mar 2 09:06:21 PM_SC-2 kernel: [ 3290.784933] tipc: Resetting link > <1.1.2:eth0-1.1.4:eth0>, peer not responding > > Mar 2 09:06:21 PM_SC-2 kernel: [ 3290.784947] tipc: Lost link > <1.1.2:eth0-1.1.4:eth0> on network plane A > > Mar 2 09:06:21 PM_SC-2 kernel: [ 3290.784956] tipc: Lost contact with > <1.1.4> > > Start PL-4, SU2 gets Standby assignments and everything works fine > after that. > > TC #4: Similar problems exist in the following test cases: > > a.)Configuration same as TC #1 except saAmfSutDefSUFailover as true. > > After killing demo, PL-3 went for reboot. > > But the problem is the same as shown in TC #1, TC #2 > and TC #3. > > b.) Configuration same as TC #1 except with saAmfCtDefRecoveryOnError > as 2 and saAmfCtDefDisableRestart as 1. > > But the problem is the same as shown in TC #1, TC #2 > and TC #3. > > c.)Configuration same as TC #1 except with saAmfCtDefRecoveryOnError > as 2 and saAmfCtDefDisableRestart as 1 and saAmfSutDefSUFailover as 1. > > After killing demo, PL-3 went for reboot. > > But the problem is the same as shown in TC #1, TC #2 > and TC #3. > > TC #5: Configuration same as TC #1 except with > saAmfCtDefRecoveryOnError as 2. Configuration and logs(TC 5) attached > in ticket. > > 1. Start SC-1, PL-3 and PL-4. SU1 Act on PL-3 and SU2 Standby on SC-2. > > 2. Stop SC-1 and kill demo. It goes for comp restart as configured. > > 3. Start SC-1. After SC-1 comes up and before cluster timer expires, > stop PL-3: > > Even if PL-3 is stopped(see below PL-3 is not available), SU1 is still > having Act assignment and SU2 is having Standby assignment: > > PM_SC-1:/home/nagu/views/staging # amf-state siass > > safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1 > > saAmfSISUHAState=STANDBY(2) > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1 > > saAmfSISUHAState=STANDBY(2) > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF > > saAmfSISUHAState=ACTIVE(1) > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF > > saAmfSISUHAState=ACTIVE(1) > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > safSISU=safSu=SU1\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1 > > saAmfSISUHAState=ACTIVE(1) > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF > > saAmfSISUHAState=ACTIVE(1) > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > safSISU=safSu=SU1\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1 > > saAmfSISUHAState=ACTIVE(1) > > saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > > TC #6: After TC #5, start PL-3: > > SU1 is not given any assignment (may be because it exists in Amfd db): > > Mar 2 14:22:06 PM_PL-3 osafamfwd[8318]: Started > > Mar 2 14:22:06 PM_PL-3 osafamfnd[8259]: NO > 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' Presence State INSTANTIATING > => INSTANTIATED > > Mar 2 14:22:06 PM_PL-3 osafamfnd[8259]: NO Assigning > 'safSi=NoRed2,safApp=OpenSAF' ACTIVE to > 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' > > Mar 2 14:22:06 PM_PL-3 osafamfnd[8259]: NO Assigned > 'safSi=NoRed2,safApp=OpenSAF' ACTIVE to > 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' > > Mar 2 14:22:06 PM_PL-3 osafamfnd[8259]: NO > 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State > UNINSTANTIATED => INSTANTIATING > > Mar 2 14:22:06 PM_PL-3 opensafd: OpenSAF(5.0.M0 - 7282:4fbffe857512:) > services successfully started > > Mar 2 14:22:06 PM_PL-3 amf_demo[8337]: > 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' started > > Mar 2 14:22:06 PM_PL-3 osafamfnd[8259]: NO > 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State INSTANTIATING > => INSTANTIATED > > Mar 2 14:22:06 PM_PL-3 amf_demo[8337]: HC started with AMF > > TC #7: After TC #6: > > Lock SU1: Amfnd of PL-3 throws error: > > Mar 2 14:23:57 PM_PL-3 osafamfnd[8259]: ER susi_assign_evh: > 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' has no assignments > > This is obvious because, Amfnd doesn’t have any assignment. > > SU1 admin state is locked, but SUSI is being shown on SU1. > > TC #8: After TC #7: > > Lock SU1, it throws error: > > Admin operation is already going on > (su'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 > > TC #9: Same as TC #6 except Configure saAmfCtDefRecoveryOnError as > Node Switchover/Failover/Failfast. > > The problem reported in TC #4 exists. > > Thanks > > -Nagu > > > -----Original Message----- > > > From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] > > > Sent: 25 February 2016 14:14 > > > To: hans.nordeb...@ericsson.com; gary....@dektech.com.au; Nagendra > > > Kumar; Praveen Malviya; minh.c...@dektech.com.au > > > Cc: opensaf-devel@lists.sourceforge.net > > > Subject: [PATCH 01 of 15] amfd: Add support for cloud resilience at > common > > > libs [#1620] > ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140 _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel