15. Same configuration as Test Case #12, SI lock. Keep gdb in both the SUs for csi remove and keep timeout as 100 sec. Slock SI and stop controller. Start controller and allow csi remove to timeout. Two things: SU2 has Standby assignment(which is wrong), SU1 has not assignment. Error at PL-4 : SU-SI record addition failed
PM_SC-1:/home/nagu/views/staging # amf-state siass safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1 saAmfSISUHAState=STANDBY(2) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) Syslog of PL-4: Feb 9 21:24:50 PM_PL-4 osafamfnd[7998]: NO 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' component restart probation timer started (timeout: 60000000000 ns) Feb 9 21:24:50 PM_PL-4 osafamfnd[7998]: NO Restarting a component of 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' (comp restart count: 1) Feb 9 21:24:50 PM_PL-4 osafamfnd[7998]: NO 'safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' Feb 9 21:24:55 PM_PL-4 amf_demo_script: killproc /opt/amf_demo/amf_demo failed Feb 9 21:24:55 PM_PL-4 amf_demo[8200]: 'safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' started Feb 9 21:24:55 PM_PL-4 osafamfnd[7998]: NO Removed 'safSi=AmfDemo1,safApp=AmfDemo1' from 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' Feb 9 21:24:55 PM_PL-4 amf_demo[8200]: HC started with AMF Feb 9 21:24:55 PM_PL-4 amf_demo[8200]: Registered with AMF Feb 9 21:24:55 PM_PL-4 amf_demo[8200]: CSI Set - add 'safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1' HAState Standby Feb 9 21:24:55 PM_PL-4 amf_demo[8200]: name: abcdef, value: val1 Feb 9 21:24:55 PM_PL-4 amf_demo[8200]: name: abcdef, value: val2 Feb 9 21:24:55 PM_PL-4 osafamfnd[7998]: CR SU-SI record addition failed, SU= safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : SI=safSi=AmfDemo,safApp=AmfDemo1 Feb 9 21:24:55 PM_PL-4 amf_demo[8200]: Health check 1 Feb 9 21:25:50 PM_PL-4 osafamfnd[7998]: NO 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' Component or SU restart probation timer expired Thanks -Nagu > -----Original Message----- > From: Nagendra Kumar > Sent: 09 February 2016 20:44 > To: minh chau; hans.nordeb...@ericsson.com; gary....@dektech.com.au; > Praveen Malviya > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [devel] FW: [PATCH 0 of 5] Review Request for amf: Add support > for cloud resilience [#1620] V2 > > >> SI Swap again and the commands come out with success, but swap > doesn't happen and syslog prints: > > Modification in #13, SU1 gets Act, but SU2 gets assignment removed as an > outcome of SI swap. > > Next Si-swap failed as only one assignment. > > > -----Original Message----- > > From: Nagendra Kumar > > Sent: 09 February 2016 20:41 > > To: minh chau; hans.nordeb...@ericsson.com; gary....@dektech.com.au; > > Praveen Malviya > > Cc: opensaf-devel@lists.sourceforge.net > > Subject: Re: [devel] FW: [PATCH 0 of 5] Review Request for amf: Add > > support for cloud resilience [#1620] V2 > > > > 12. Issue shutdown on SI and keep sleep in csi set callback, stop > > controller and let csi set callback timeout. Start SC-1 and immlist > > the SI, it is in shutting down state: > > saAmfSIAdminState SA_UINT32_T 4 (0x4) > > 13. Issue SI Swap of appl SI (SU1 Act, SU2 Std): Keep gdb in Quisced csi > > callback and allow to timeout and stop the controller. > > At one time: Start the controller, SU1 gets Standby and SU2 gets Act. > > Now issue, SI Swap again and the commands come out with success, but > > swap doesn't happen and syslog prints: > > Feb 9 20:33:51 PM_SC-1 osafamfd[9497]: NO > > safSi=AmfDemo,safApp=AmfDemo1 Swap initiated > > > > Please find the amfd trace attached. > > > > 14.) test Case #13: At another time: Amfnd crash: Bt and syslog(below) > > and Amfnd traces(osafamfnd-PL-3) attached. > > > > Program terminated with signal 11, Segmentation fault. > > #0 0x000000000041deaa in avnd_err_process(avnd_cb_tag*, > > avnd_comp_tag*, avnd_err_tag*) > > () > > (gdb) bt > > #0 0x000000000041deaa in avnd_err_process(avnd_cb_tag*, > > avnd_comp_tag*, avnd_err_tag*) > > () > > #1 0x0000000000407559 in avnd_evt_tmr_cbk_resp_evh(avnd_cb_tag*, > > avnd_evt_tag*) () > > #2 0x000000000042133f in avnd_main_process() () at main.cc:667 > > #3 0x0000000000405517 in main () at main.cc:186 > > (gdb) thread apply bt all > > (gdb) thread apply all bt > > > > Thread 4 (Thread 0x7fe84b5b3b00 (LWP 7892)): > > #0 0x00007fe84a4d976d in read () from /lib64/libpthread.so.0 > > #1 0x00007fe84b19af17 in ncs_exec_mod_hdlr () from > > /usr/local/lib/libopensaf_core.so.0 > > #2 0x00007fe84a4d27b6 in start_thread () from /lib64/libpthread.so.0 > > #3 0x00007fe849a889cd in clone () from /lib64/libc.so.6 > > #4 0x0000000000000000 in ?? () > > > > Thread 3 (Thread 0x7fe84b5d3b00 (LWP 7890)): > > #0 0x00007fe849a7f4f6 in poll () from /lib64/libc.so.6 > > #1 0x00007fe84b1c5623 in mdtm_process_recv_events () > > from /usr/local/lib/libopensaf_core.so.0 > > #2 0x00007fe84a4d27b6 in start_thread () from /lib64/libpthread.so.0 > > #3 0x00007fe849a889cd in clone () from /lib64/libc.so.6 > > #4 0x0000000000000000 in ?? () > > > > Thread 2 (Thread 0x7fe84b606b00 (LWP 7889)): > > #0 0x00007fe849a7f4f6 in poll () from /lib64/libc.so.6 > > #1 0x00007fe84b18922f in osaf_ppoll () from > > /usr/local/lib/libopensaf_core.so.0 > > #2 0x00007fe84b190acf in ncs_tmr_wait () from > > /usr/local/lib/libopensaf_core.so.0 > > #3 0x00007fe84a4d27b6 in start_thread () from /lib64/libpthread.so.0 > > #4 0x00007fe849a889cd in clone () from /lib64/libc.so.6 > > #5 0x0000000000000000 in ?? () > > ---Type <return> to continue, or q <return> to quit--- > > > > Thread 1 (Thread 0x7fe84b5d6720 (LWP 7888)): > > #0 0x000000000041deaa in avnd_err_process(avnd_cb_tag*, > > avnd_comp_tag*, avnd_err_tag*) > > () > > #1 0x0000000000407559 in avnd_evt_tmr_cbk_resp_evh(avnd_cb_tag*, > > avnd_evt_tag*) () > > #2 0x000000000042133f in avnd_main_process() () at main.cc:667 > > #3 0x0000000000405517 in main () at main.cc:186 > > > > Syslog: > > Feb 9 20:05:44 PM_PL-3 osafimmnd[7869]: NO Re-introduce-me > > highestProcessed:1514 highestReceived:1514 Feb 9 20:05:46 PM_PL-3 > > kernel: [117927.208595] TIPC: Resetting link <1.1.3:eth0-1.1.1:eth0>, > > peer not responding Feb 9 20:05:46 PM_PL-3 kernel: [117927.208604] > > TIPC: Lost link <1.1.3:eth0-1.1.1:eth0> on network plane A Feb 9 > > 20:05:46 PM_PL-3 > > kernel: [117927.208610] TIPC: Lost contact with <1.1.1> Feb 9 > > 20:05:49 > > PM_PL-3 osafimmnd[7869]: WA MDS Send Failed to service:IMMD rc:2 Feb > > 9 > > 20:05:49 PM_PL-3 osafamfnd[7888]: NO component with > QUIESCED/QUIESCING > > assignment failed Feb 9 20:05:49 PM_PL-3 > > osafamfnd[7888]: NO recovery action 'comp restart' escalated to 'comp > > failover' > > Feb 9 20:05:49 PM_PL-3 osafamfnd[7888]: NO SU failover probation > > timer started (timeout: 1200000000000 ns) Feb 9 20:05:49 PM_PL-3 > > osafamfnd[7888]: NO Performing failover of > > 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' (SU failover count: 1) Feb > > 9 20:05:49 PM_PL-3 osafamfnd[7888]: NO > > 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' > > recovery action escalated from 'componentRestart' to > 'componentFailover' > > Feb 9 20:05:49 PM_PL-3 osafamfnd[7888]: NO > > 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' > > faulted due to 'csiSetcallbackTimeout' : Recovery is 'componentFailover' > > Feb 9 20:05:49 PM_PL-3 osafamfnd[7888]: NO > > 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State > INSTANTIATED > > => TERMINATING Feb 9 20:05:49 PM_PL-3 osafamfnd[7888]: > > NO Removed 'safSi=AmfDemo,safApp=AmfDemo1' from > > 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' > > Feb 9 20:05:49 PM_PL-3 osafamfnd[7888]: NO Assigned > > 'safSi=AmfDemo1,safApp=AmfDemo1' QUIESCED to > > 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' > > Feb 9 20:05:49 PM_PL-3 osafclmna[7879]: AL AMF Node Director is down, > > terminate this process Feb 9 20:05:49 PM_PL-3 osafamfwd[7947]: > > Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: AMF > > unexpectedly crashed, OwnNodeId = 131855, SupervisionTime = 60 Feb 9 > > 20:05:49 PM_PL-3 osafckptnd[7937]: AL AMF Node Director is down, > > terminate this process Feb 9 20:05:49 PM_PL-3 osaflcknd[7927]: AL AMF > > Node Director is down, terminate this process Feb 9 20:05:49 PM_PL-3 > > osafimmnd[7869]: AL AMF Node Director is down, terminate this process > > Feb > > 9 20:05:49 PM_PL-3 osafmsgnd[7908]: AL AMF Node Director is down, > > terminate this process Feb 9 20:05:49 PM_PL-3 osafsmfnd[7898]: AL AMF > > Node Director is down, terminate this process Feb 9 20:05:49 PM_PL-3 > > opensaf_reboot: Rebooting local node; timeout=60 > > > > > > > -----Original Message----- > > > From: Nagendra Kumar > > > Sent: 09 February 2016 19:40 > > > To: minh chau; hans.nordeb...@ericsson.com; > gary....@dektech.com.au; > > > Praveen Malviya > > > Cc: opensaf-devel@lists.sourceforge.net > > > Subject: Re: [devel] FW: [PATCH 0 of 5] Review Request for amf: Add > > > support for cloud resilience [#1620] V2 > > > > > > Testing continued.... > > > > > > 11. Lock SI and then unlock SI and keep sleep in csi set callback > > > and then > > > reboot SC-1. Allow csi set timeout. When SC-1 is coming Amfd crashes. > > > Complete Amfd Logs attached and Amfnd of SC-1 and PL-3 is coming in > > > next email. > > > > > > Thanks > > > -Nagu > > > > > > > -----Original Message----- > > > > From: Nagendra Kumar > > > > Sent: 09 February 2016 15:57 > > > > To: minh chau; hans.nordeb...@ericsson.com; > > gary....@dektech.com.au; > > > > Praveen Malviya > > > > Cc: opensaf-devel@lists.sourceforge.net > > > > Subject: RE: [devel] FW: [PATCH 0 of 5] Review Request for amf: > > > > Add support for cloud resilience [#1620] V2 > > > > > > > > Continued.... > > > > > > > > > -----Original Message----- > > > > > From: Nagendra Kumar [mailto:nagendr...@oracle.com] > > > > > Sent: 09 February 2016 15:56 > > > > > To: 'minh chau'; 'hans.nordeb...@ericsson.com'; > > > > > 'gary....@dektech.com.au'; Praveen Malviya > > > > > Cc: 'opensaf-devel@lists.sourceforge.net' > > > > > Subject: RE: [devel] FW: [PATCH 0 of 5] Review Request for amf: > > > > > Add support for cloud resilience [#1620] V2 > > > > > > > > > > Hi Hans N, > > > > > Please find the amfd and amfnd of SC-1 and amfnd of PL-3 > > > > traces > > > > > attached in 3 emails coming(because of limit of devel list, I am > > > > > not able to send it in one go). It took second reboot to > > > > > reproduce it for TC #6, but it is coming at the same location. > > > > > > > > > > Feb 9 15:32:28 PM_SC-1 osafamfd[3962]: NO Received node_up > from > > > > > 2010f: msg_id 1 Feb 9 15:32:28 PM_SC-1 osafamfd[3962]: > siass.cc:842: > > > > > avd_susi_recreate: Assertion 'su' failed. > > > > > Feb 9 15:32:28 PM_SC-1 osafamfnd[3972]: WA AMF director > > > > > unexpectedly crashed Feb 9 15:32:28 PM_SC-1 osafamfnd[3972]: WA > > > AMF > > > > > director unexpectedly crashed > > > > > > > > > > Thanks > > > > > -Nagu > > ------------------------------------------------------------------------------ > Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + > Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor > end-to-end web transactions and take corrective actions now Troubleshoot > faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 > _______________________________________________ > Opensaf-devel mailing list > Opensaf-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/opensaf-devel ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel