Hi Nagu, Praveen

Since #1-#4 have been acked, can you please push them?
#5 and #11_2 allows comp/su failover during headless, so we may have to 
visit them later.
However, the patches: #9 #10 #11_1 #12 #13 are bug fixes that does not 
relate to *delayed failover* and needed for #1-#4. Can you please have a 
look?

Thanks,
Minh

On 03/03/16 02:12, Nagendra Kumar wrote:
>
> #1 I have applied patches #1 to #4 only. With this patches(not having 
> patch #6), I thought to have passed most of the following tests, but 
> they got failed(Listed below).
>
> I could not test other scenarios (including alarms and notifications), 
> because I haven’t applied patch #6. I think there should be a simple 
> patch replacing patch #6, which handles transient state as ‘reboot the 
> node‘ if Amf finds SUSI in transient state on that node.
>
> I am attaching a concept patch(assignment_recovery.patch), which pass 
> some of the scenarios and we are testing and enhancing it.
>
> As Praveen has suggested that we need to reboot the node which is 
> undergoing in transient state to make it simple.
>
> This patch reduces complexity and maintainability.
>
> So, ACK for patch #1-#4 along with the attached patch.
>
> Please note that the attached patch has been created on patch #6 of 
> yours, so please apply #1 to #4 and then #6 and then the attached patch.
>
> Currently the patch is for 2N red model. We are working to make for 
> Nway Act and No red model (and possibly for Nway and NpM), we will 
> publish it tomorrow.
>
> TC #1:
>
> Configuration(Comp recovery is comp failover, saAmfSutDefSUFailover as 
> false) and logs attached(TC 1) in the ticket.
>
> 1. Start SC-1, PL-3 and PL-4. SU1 Act on PL-3 and SU2 Standby on SC-2.
>
> 2. Stop SC-1 and kill demo. It goes for comp failover as configured. 
> Ideally, node should reboot.
>
> 3. Start SC-1. After cluster timer expires, PL-4 got the following 
> error messages:
>
> Mar  2 08:01:15 PM_PL-4 osafamfnd[20050]: CR SU-SI record addition 
> failed, SU= safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : 
> SI=safSi=AmfDemo,safApp=AmfDemo1
>
> Mar  2 08:01:15 PM_PL-4 osafamfnd[20050]: CR SU-SI record addition 
> failed, SU= safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : 
> SI=safSi=AmfDemo1,safApp=AmfDemo1
>
> There is no assignment given for SU1. SU2 has Standby assignments:
>
> safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1
>
> saAmfSISUHAState=STANDBY(2)
>
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1
>
> saAmfSISUHAState=STANDBY(2)
>
> Other problems: a.) Further command for locking SU1/SU2 fails in SG 
> unstable error.
>
>                                 b.) Immlist if SU2 gives the below 
> result, Standby assignment it prints as 4, which is wrong:
>
> saAmfSUNumCurrStandbySIs SA_UINT32_T  4 (0x4)
>
> saAmfSUNumCurrActiveSIs SA_UINT32_T  0 (0x0)
>
>                                 c.) Even if SC-2 joins, and you do 
> failover/switchover of SC-1, still same as above.
>
> TC #2: After execution of TC #1, stop PL-3. In worst case, SU2 
> assignment should change to Act, which is not happening. After 
> stopping of PL-4 also, the same problems as TC #1. logs attached(TC 2).
>
> TC #3: After TC #2, start PL-3 and start SC-2.
>
>                 SU1 is instantiated, but no assignment and the same 
> problem as above.
>
>                 When stop PL-4, SU1 gets assignments, the following 
> logs comes at SC-2:
>
> Mar  2 09:06:18 PM_SC-2 osafamfd[8518]: ER avd_ckpt_siass: 
> safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 safSi=AmfDemo,safApp=AmfDemo1 
> does not exist
>
> Mar  2 09:06:18 PM_SC-2 osafamfd[8518]: ER avd_ckpt_siass: 
> safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 safSi=AmfDemo1,safApp=AmfDemo1 
> does not exist
>
> Mar  2 09:06:21 PM_SC-2 kernel: [ 3290.784933] tipc: Resetting link 
> <1.1.2:eth0-1.1.4:eth0>, peer not responding
>
> Mar  2 09:06:21 PM_SC-2 kernel: [ 3290.784947] tipc: Lost link 
> <1.1.2:eth0-1.1.4:eth0> on network plane A
>
> Mar  2 09:06:21 PM_SC-2 kernel: [ 3290.784956] tipc: Lost contact with 
> <1.1.4>
>
> Start PL-4, SU2 gets Standby assignments and everything works fine 
> after that.
>
> TC #4: Similar problems exist in the following test cases:
>
> a.)Configuration same as TC #1 except saAmfSutDefSUFailover as true.
>
>                 After killing demo, PL-3 went for reboot.
>
>                 But the problem is the same as shown in TC #1, TC #2 
> and TC #3.
>
> b.) Configuration same as TC #1 except with  saAmfCtDefRecoveryOnError 
> as 2 and saAmfCtDefDisableRestart as 1.
>
>                 But the problem is the same as shown in TC #1, TC #2 
> and TC #3.
>
> c.)Configuration same as TC #1 except with  saAmfCtDefRecoveryOnError 
> as 2 and saAmfCtDefDisableRestart as 1 and saAmfSutDefSUFailover as 1.
>
>                 After killing demo, PL-3 went for reboot.
>
>                 But the problem is the same as shown in TC #1, TC #2 
> and TC #3.
>
> TC #5:  Configuration same as TC #1 except with 
>  saAmfCtDefRecoveryOnError as 2. Configuration and logs(TC 5) attached 
> in ticket.
>
> 1. Start SC-1, PL-3 and PL-4. SU1 Act on PL-3 and SU2 Standby on SC-2.
>
> 2. Stop SC-1 and kill demo. It goes for comp restart as configured.
>
> 3. Start SC-1. After SC-1 comes up and before cluster timer expires, 
> stop PL-3:
>
> Even if PL-3 is stopped(see below PL-3 is not available), SU1 is still 
> having Act assignment and SU2 is having Standby assignment:
>
> PM_SC-1:/home/nagu/views/staging # amf-state siass
>
> safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1
>
>         saAmfSISUHAState=STANDBY(2)
>
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1
>
>        saAmfSISUHAState=STANDBY(2)
>
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
>
>         saAmfSISUHAState=ACTIVE(1)
>
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF
>
>         saAmfSISUHAState=ACTIVE(1)
>
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> safSISU=safSu=SU1\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1
>
>         saAmfSISUHAState=ACTIVE(1)
>
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
>
>         saAmfSISUHAState=ACTIVE(1)
>
>    saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> safSISU=safSu=SU1\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1
>
>         saAmfSISUHAState=ACTIVE(1)
>
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>
> TC #6:  After TC #5, start PL-3:
>
> SU1 is not given any assignment (may be because it exists in Amfd db):
>
> Mar  2 14:22:06 PM_PL-3 osafamfwd[8318]: Started
>
> Mar  2 14:22:06 PM_PL-3 osafamfnd[8259]: NO 
> 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' Presence State INSTANTIATING 
> => INSTANTIATED
>
> Mar  2 14:22:06 PM_PL-3 osafamfnd[8259]: NO Assigning 
> 'safSi=NoRed2,safApp=OpenSAF' ACTIVE to 
> 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF'
>
> Mar  2 14:22:06 PM_PL-3 osafamfnd[8259]: NO Assigned 
> 'safSi=NoRed2,safApp=OpenSAF' ACTIVE to 
> 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF'
>
> Mar  2 14:22:06 PM_PL-3 osafamfnd[8259]: NO 
> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State 
> UNINSTANTIATED => INSTANTIATING
>
> Mar  2 14:22:06 PM_PL-3 opensafd: OpenSAF(5.0.M0 - 7282:4fbffe857512:) 
> services successfully started
>
> Mar  2 14:22:06 PM_PL-3 amf_demo[8337]: 
> 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' started
>
> Mar  2 14:22:06 PM_PL-3 osafamfnd[8259]: NO 
> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State INSTANTIATING 
> => INSTANTIATED
>
> Mar  2 14:22:06 PM_PL-3 amf_demo[8337]: HC started with AMF
>
> TC #7:  After TC #6:
>
> Lock SU1: Amfnd of PL-3 throws error:
>
> Mar  2 14:23:57 PM_PL-3 osafamfnd[8259]: ER susi_assign_evh: 
> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' has no assignments
>
> This is obvious because, Amfnd doesn’t have any assignment.
>
> SU1 admin state is locked, but SUSI is being shown on SU1.
>
> TC #8:  After TC #7:
>
> Lock SU1, it throws error:
>
> Admin operation is already going on 
> (su'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
>
> TC #9:  Same as TC #6 except Configure saAmfCtDefRecoveryOnError as 
> Node Switchover/Failover/Failfast.
>
> The problem reported in TC #4 exists.
>
> Thanks
>
> -Nagu
>
> > -----Original Message-----
>
> > From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
>
> > Sent: 25 February 2016 14:14
>
> > To: hans.nordeb...@ericsson.com; gary....@dektech.com.au; Nagendra
>
> > Kumar; Praveen Malviya; minh.c...@dektech.com.au
>
> > Cc: opensaf-devel@lists.sourceforge.net
>
> > Subject: [PATCH 01 of 15] amfd: Add support for cloud resilience at 
> common
>
> > libs [#1620]
>

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to