Hi Zoran, In TC#16 #17 #18, there're no amfd crash. I didn't mean TC #16 failed because of one payload IMM limitation
Thanks, Minh On 11/02/16 19:18, Zoran Milinkovic wrote: > Hi Minh, > > In TC#16, it's written that the test has been done with PL-3 and PL-4. So, it > is not a case with one payload. > This problem looks more like AMF than IMM limitation problem. > > Thanks, > Zoran > > -----Original Message----- > From: minh chau [mailto:minh.c...@dektech.com.au] > Sent: Thursday, February 11, 2016 5:47 AM > To: Nagendra Kumar; Hans Nordebäck; Gary Lee; Praveen Malviya > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [devel] FW: [PATCH 0 of 5] Review Request for amf: Add support > for cloud resilience [#1620] V2 (delayed failover issue) > > Hi Nagu, > > There's known limitation in IMM with configuration 2+1 or 1+1, which results > in IMM reload at second controller restart. We're discussing to avoid the > crash Meanwhile I'm also looking at the delayed failover issues which you > reported on TC#2 #12 #13 #15 We have similar automated tests but they all > pass, so I guess your test has something special. > Can you run those tests once and send me syslog + amfd/amfnd traces? > > Thanks, > Minh > On 10/02/16 23:00, Nagendra Kumar wrote: >> TC #16. Same configuration as #12: Run SI shutdown and keep sleep of 5 >> sec before saAmfCSIQuiescingComplete and stop controller and then after >> sleep, reject saAmfCSIQuiescingComplete with SA_AIS_ERR_FAILED_OPERATION. >> All the assignment from SU1 on PL-3 and SU2 on PL-4 are removed and SI admin >> state is 2(locked): >> saAmfSIAdminState SA_UINT32_T 2 (0x2) >> >> "Si going into locked state" is different behaviour when controller is up >> and running and run this test case. In case, controller is available, SI >> will be in unlocked state and all the assignments will be on SU2 as Act and >> SU3 as Standby (on PL-4). This need either correction or documentation. >> >> TC #17. Same configuration as #12: Run SG shutdown and keep >> sleep of 5 sec before saAmfCSIQuiescingComplete and stop controller and >> then after sleep, reject saAmfCSIQuiescingComplete with >> SA_AIS_ERR_FAILED_OPERATION. Amfnd crashes[Please note that this test case >> works with controller up]: >> Syslog and bt: >> Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO component with >> QUIESCED/QUIESCING assignment failed Feb 10 11:44:29 PM_PL-3 >> osafamfnd[15508]: NO recovery action 'comp restart' escalated to 'comp >> failover' >> Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO SU failover probation >> timer started (timeout: 1200000000000 ns) Feb 10 11:44:29 PM_PL-3 >> osafamfnd[15508]: NO Performing failover of >> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' (SU failover count: 1) Feb 10 >> 11:44:29 PM_PL-3 osafamfnd[15508]: NO >> 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' recovery action >> escalated from 'componentRestart' to 'componentFailover' >> Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO >> 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' faulted due to >> 'csiSetcallbackFailed' : Recovery is 'componentFailover' >> Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO >> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State INSTANTIATED => >> TERMINATING Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO Removed >> 'safSi=AmfDemo,safApp=AmfDemo1' from >> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' >> Feb 10 11:44:29 PM_PL-3 amf_demo[15721]: saAmfHAStateGet FAILED - 7 >> Feb 10 11:44:29 PM_PL-3 amf_demo[15721]: exiting (caught term signal) >> Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO avnd_di_oper_send() >> deferred as AMF director is offline Feb 10 11:44:29 PM_PL-3 >> osafimmnd[15760]: AL AMF Node Director is down, terminate this process >> >> Program terminated with signal 11, Segmentation fault. >> #0 0x0000000000412b50 in >> avnd_comp_cmplete_all_assignment(avnd_cb_tag*, avnd_comp_tag*) () >> (gdb) bt >> #0 0x0000000000412b50 in >> avnd_comp_cmplete_all_assignment(avnd_cb_tag*, avnd_comp_tag*) () >> #1 0x000000000040a093 in >> avnd_comp_clc_terming_cleansucc_hdler(avnd_cb_tag*, avnd_comp_tag*) () >> #2 0x000000000040c7d4 in avnd_comp_clc_fsm_run(avnd_cb_tag*, >> avnd_comp_tag*, avnd_comp_clc_pres_fsm_ev) () >> #3 0x000000000040ce49 in avnd_evt_clc_resp_evh(avnd_cb_tag*, >> avnd_evt_tag*) () >> #4 0x000000000042133f in avnd_main_process() () at main.cc:667 >> #5 0x0000000000405517 in main () at main.cc:186 >> (gdb) thread apply all bt >> >> Thread 4 (Thread 0x7fdaf3c05b00 (LWP 15512)): >> #0 0x00007fdaf2b2b415 in __lll_unlock_wake () from >> /lib64/libpthread.so.0 >> #1 0x00007fdaf2b27ac4 in _L_unlock_553 () from /lib64/libpthread.so.0 >> #2 0x00007fdaf2b279f7 in __pthread_mutex_unlock_usercnt () from >> /lib64/libpthread.so.0 >> #3 0x00007fdaf37edac3 in ncs_os_lock () from >> /usr/local/lib/libopensaf_core.so.0 >> #4 0x00007fdaf37e084d in ncs_ipc_send () from >> /usr/local/lib/libopensaf_core.so.0 >> #5 0x000000000041eea1 in avnd_evt_send(avnd_cb_tag*, avnd_evt_tag*) >> () >> #6 0x000000000040a2cb in >> comp_clc_resp_callback(NCS_OS_PROC_EXECUTE_TIMED_CB_INFO*) () >> #7 0x00007fdaf37ecdfb in give_exec_mod_cb () from >> /usr/local/lib/libopensaf_core.so.0 >> #8 0x00007fdaf37ecfde in ncs_exec_mod_hdlr () from >> /usr/local/lib/libopensaf_core.so.0 >> #9 0x00007fdaf2b247b6 in start_thread () from /lib64/libpthread.so.0 >> #10 0x00007fdaf20da9cd in clone () from /lib64/libc.so.6 >> #11 0x0000000000000000 in ?? () >> >> Thread 3 (Thread 0x7fdaf3c25b00 (LWP 15510)): >> #0 0x00007fdaf20d14f6 in poll () from /lib64/libc.so.6 >> #1 0x00007fdaf3817623 in mdtm_process_recv_events () from >> /usr/local/lib/libopensaf_core.so.0 >> #2 0x00007fdaf2b247b6 in start_thread () from /lib64/libpthread.so.0 >> #3 0x00007fdaf20da9cd in clone () from /lib64/libc.so.6 >> #4 0x0000000000000000 in ?? () >> >> Thread 2 (Thread 0x7fdaf3c58b00 (LWP 15509)): >> #0 0x00007fdaf20d14f6 in poll () from /lib64/libc.so.6 >> #1 0x00007fdaf37db22f in osaf_ppoll () from >> /usr/local/lib/libopensaf_core.so.0 >> #2 0x00007fdaf37e2acf in ncs_tmr_wait () from >> /usr/local/lib/libopensaf_core.so.0 >> #3 0x00007fdaf2b247b6 in start_thread () from /lib64/libpthread.so.0 >> #4 0x00007fdaf20da9cd in clone () from /lib64/libc.so.6 >> #5 0x0000000000000000 in ?? () >> >> Thread 1 (Thread 0x7fdaf3c28720 (LWP 15508)): >> #0 0x0000000000412b50 in >> avnd_comp_cmplete_all_assignment(avnd_cb_tag*, avnd_comp_tag*) () >> #1 0x000000000040a093 in >> avnd_comp_clc_terming_cleansucc_hdler(avnd_cb_tag*, avnd_comp_tag*) () >> #2 0x000000000040c7d4 in avnd_comp_clc_fsm_run(avnd_cb_tag*, >> avnd_comp_tag*, avnd_comp_clc_pres_fsm_ev) () >> #3 0x000000000040ce49 in avnd_evt_clc_resp_evh(avnd_cb_tag*, >> avnd_evt_tag*) () >> #4 0x000000000042133f in avnd_main_process() () at main.cc:667 >> #5 0x0000000000405517 in main () at main.cc:186 >> >> TC #18. Same configuration as #12: Run SG lock and keep gdb in >> amf_csi_remove_callback and stop controller and then start the controller >> and make is up. Now release Amfnd from gdb so that it can respond to csi >> remove(Please note that controller has reboot and is available now). Now, >> issue SG unlock. Amfnd crashes on PL-3 and PL-4 at the same location[Please >> note that this test case works with controller up]: >> Syslog and Bt: >> Feb 10 16:35:51 PM_PL-3 amf_demo[26623]: CSI Remove for all CSIs Feb >> 10 16:35:51 PM_PL-3 osafamfnd[26545]: NO Removed >> 'safSi=AmfDemo,safApp=AmfDemo1' from >> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' >> Feb 10 16:36:02 PM_PL-3 osafamfnd[26545]: NO Assigning >> 'safSi=AmfDemo,safApp=AmfDemo1' ACTIVE to >> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' >> Feb 10 16:36:02 PM_PL-3 amf_demo[26623]: CSI Set - add >> 'safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1' HAState Active >> Feb 10 16:36:02 PM_PL-3 amf_demo[26623]: name: abcdef, value: val1 >> Feb 10 16:36:02 PM_PL-3 amf_demo[26623]: name: abcdef, value: val2 >> Feb 10 16:36:02 PM_PL-3 osafamfnd[26545]: NO Assigned >> 'safSi=AmfDemo,safApp=AmfDemo1' ACTIVE to >> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' >> Feb 10 16:36:02 PM_PL-3 osafamfnd[26545]: di.cc:850: avnd_di_susi_resp_send: >> Assertion 'si' failed. >> Feb 10 16:36:02 PM_PL-3 osafclmna[26536]: AL AMF Node Director is >> down, terminate this process >> >> Core was generated by `/usr/local/lib/opensaf/osafamfnd >> --tracemask=0xffffffff'. >> Program terminated with signal 6, Aborted. >> #0 0x00007f022ebd9b55 in raise () from /lib64/libc.so.6 >> (gdb) bt >> #0 0x00007f022ebd9b55 in raise () from /lib64/libc.so.6 >> #1 0x00007f022ebdb131 in abort () from /lib64/libc.so.6 >> #2 0x00007f023038331b in __osafassert_fail () from >> /usr/local/lib/libopensaf_core.so.0 >> #3 0x000000000041b399 in avnd_di_susi_resp_send(avnd_cb_tag*, >> avnd_su_tag*, avnd_su_si_rec*) () >> #4 0x000000000042e9fa in avnd_su_si_oper_done(avnd_cb_tag*, >> avnd_su_tag*, avnd_su_si_rec*) () >> #5 0x0000000000411622 in avnd_comp_csi_assign_done(avnd_cb_tag*, >> avnd_comp_tag*, avnd_comp_csi_rec*) () >> #6 0x0000000000407397 in avnd_evt_ava_resp_evh(avnd_cb_tag*, >> avnd_evt_tag*) () >> #7 0x000000000042133f in avnd_main_process() () at main.cc:667 >> #8 0x0000000000405517 in main () at main.cc:186 >> >> TC #19. Same configuration as #12: Run Node lock and keep sleep >> of 5 sec in amf_csi_set_callback and stop controller. Reject quisced >> assignment in amf_csi_set_callback, Amfnd crashes. Syslog and gdb is the >> same as in TC #17. >> >> TC #20. Same configuration as #12: Issue Node shutdown: and >> keep sleep of 5 sec in amf_csi_set_callback before sending saAmfResponse() >> and stop controller. Amfnd crashes: >> Syslog: >> >> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO component with >> QUIESCED/QUIESCING a ssignment >> failed >> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO recovery action 'comp restart' >> esca lated to 'comp failover' >> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO SU failover probation timer >> started (timeout: >> 1200000000000 ns) >> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO Performing failover of >> 'safSu=SU1,s >> afSg=AmfDemo,safApp=AmfDemo1' (SU failover count: 1) >> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO >> 'safComp=AmfDemo,safSu=SU1,safSg=Am >> fDemo,safApp=AmfDemo1' recovery action escalated from 'componentRestart' >> to 'com ponentFailover' >> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO >> 'safComp=AmfDemo,safSu=SU1,safSg=Am >> fDemo,safApp=AmfDemo1' faulted due to 'csiSetcallbackFailed' : Recovery >> is 'comp onentFailover' >> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO >> 'safSu=SU1,safSg=AmfDemo,safApp=Amf >> Demo1' Presence State INSTANTIATED => TERMINATING >> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO Removed >> 'safSi=AmfDemo,safApp=AmfDe >> mo1' from 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' >> Feb 10 17:21:10 PM_PL-3 amf_demo[29519]: saAmfHAStateGet FAILED - 7 >> Feb 10 17:21:10 PM_PL-3 osafimmnd[29561]: AL AMF Node Director is down, >> terminat e this process >> Feb 10 17:21:10 PM_PL-3 amf_demo[29519]: AL AMF Node Director is down, >> terminate this process >> Feb 10 17:21:10 PM_PL-3 amf_demo[29519]: exiting (caught term signal) >> Feb 10 17:21:10 PM_PL-3 osafclmna[29321]: AL AMF Node Director is down, >> terminat e this process >> >> Bt: >> Core was generated by `/usr/local/lib/opensaf/osafamfnd >> --tracemask=0xffffffff'. >> Program terminated with signal 11, Segmentation fault. >> #0 0x00000000004117c9 in avnd_comp_csi_assign_done(avnd_cb_tag*, >> avnd_comp_tag*, avnd_comp_csi_rec*) () >> (gdb) bt >> #0 0x00000000004117c9 in avnd_comp_csi_assign_done(avnd_cb_tag*, >> avnd_comp_tag*, avnd_comp_csi_rec*) () >> #1 0x0000000000406a3b in >> avnd_evt_ava_csi_quiescing_compl_evh(avnd_cb_tag*, avnd_evt_tag*) () >> #2 0x000000000042133f in avnd_main_process() () at main.cc:667 >> #3 0x0000000000405517 in main () at main.cc:186 >> >> >> Thanks >> -Nagu >> >>> -----Original Message----- >>> From: Nagendra Kumar >>> Sent: 09 February 2016 21:39 >>> To: minh chau; hans.nordeb...@ericsson.com; gary....@dektech.com.au; >>> Praveen Malviya >>> Cc: opensaf-devel@lists.sourceforge.net >>> Subject: Re: [devel] FW: [PATCH 0 of 5] Review Request for amf: Add >>> support for cloud resilience [#1620] V2 >>> >>> 15. Same configuration as Test Case #12, SI lock. Keep gdb in both >>> the SUs for csi remove and keep timeout as 100 sec. Slock SI and stop >>> controller. >>> Start controller and allow csi remove to timeout. >>> Two things: >>> SU2 has Standby assignment(which is wrong), SU1 has not assignment. >>> Error at PL-4 : SU-SI record addition failed >>> >>> PM_SC-1:/home/nagu/views/staging # amf-state siass safSISU=safSu=PL- >>> 4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF >>> saAmfSISUHAState=ACTIVE(1) >>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) >>> safSISU=safSu=PL- >>> 3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF >>> saAmfSISUHAState=ACTIVE(1) >>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) >>> safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,s >>> afApp=AmfDemo1 >>> saAmfSISUHAState=STANDBY(2) >>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) >>> safSISU=safSu=SC- >>> 1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF >>> saAmfSISUHAState=ACTIVE(1) >>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) >>> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC- >>> 2N,safApp=OpenSAF >>> saAmfSISUHAState=ACTIVE(1) >>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) >>> >>> Syslog of PL-4: >>> >>> Feb 9 21:24:50 PM_PL-4 osafamfnd[7998]: NO >>> 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' component restart probation >>> timer started (timeout: 60000000000 ns) Feb 9 21:24:50 PM_PL-4 >>> osafamfnd[7998]: NO Restarting a component of >>> 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' (comp restart count: 1) Feb >>> 9 21:24:50 PM_PL-4 osafamfnd[7998]: NO >>> 'safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' >>> faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' >>> Feb 9 21:24:55 PM_PL-4 amf_demo_script: killproc >>> /opt/amf_demo/amf_demo failed Feb 9 21:24:55 PM_PL-4 >>> amf_demo[8200]: >>> 'safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' >>> started Feb 9 21:24:55 PM_PL-4 osafamfnd[7998]: NO Removed >>> 'safSi=AmfDemo1,safApp=AmfDemo1' from >>> 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' >>> Feb 9 21:24:55 PM_PL-4 amf_demo[8200]: HC started with AMF Feb 9 >>> 21:24:55 PM_PL-4 amf_demo[8200]: Registered with AMF Feb 9 21:24:55 >>> PM_PL-4 amf_demo[8200]: CSI Set - add >>> 'safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1' HAState Standby >>> Feb 9 21:24:55 PM_PL-4 amf_demo[8200]: name: abcdef, value: val1 >>> Feb 9 21:24:55 PM_PL-4 amf_demo[8200]: name: abcdef, value: val2 >>> Feb 9 21:24:55 PM_PL-4 osafamfnd[7998]: CR SU-SI record addition >>> failed, SU= safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : >>> SI=safSi=AmfDemo,safApp=AmfDemo1 Feb 9 21:24:55 PM_PL-4 >>> amf_demo[8200]: Health check 1 Feb 9 21:25:50 PM_PL-4 >>> osafamfnd[7998]: NO 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' >>> Component or SU restart probation timer expired >>> >>> Thanks >>> -Nagu >>> >>> --------------------------------------------------------------------- >>> --------- >>> Site24x7 APM Insight: Get Deep Visibility into Application >>> Performance APM + Mobile APM + RUM: Monitor 3 App instances at just >>> $35/Month Monitor end-to-end web transactions and take corrective >>> actions now Troubleshoot faster and improve end-user experience. Signup Now! >>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 >>> _______________________________________________ >>> Opensaf-devel mailing list >>> Opensaf-devel@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel > > ------------------------------------------------------------------------------ > Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + > Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor > end-to-end web transactions and take corrective actions now Troubleshoot > faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 > _______________________________________________ > Opensaf-devel mailing list > Opensaf-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/opensaf-devel > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel