Hi Zoran,

In TC#16 #17 #18, there're no amfd crash. I didn't mean TC #16 failed 
because of one payload IMM limitation

Thanks,
Minh
On 11/02/16 19:18, Zoran Milinkovic wrote:
> Hi Minh,
>
> In TC#16, it's written that the test has been done with PL-3 and PL-4. So, it 
> is not a case with one payload.
> This problem looks more like AMF than IMM limitation problem.
>
> Thanks,
> Zoran
>
> -----Original Message-----
> From: minh chau [mailto:minh.c...@dektech.com.au]
> Sent: Thursday, February 11, 2016 5:47 AM
> To: Nagendra Kumar; Hans Nordebäck; Gary Lee; Praveen Malviya
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [devel] FW: [PATCH 0 of 5] Review Request for amf: Add support 
> for cloud resilience [#1620] V2 (delayed failover issue)
>
> Hi Nagu,
>
> There's known limitation in IMM with configuration 2+1 or 1+1, which results 
> in IMM reload  at second controller restart. We're discussing to avoid the 
> crash Meanwhile I'm also looking at the delayed failover issues which you 
> reported on TC#2 #12 #13 #15 We have similar automated tests but they all 
> pass, so I guess your test has something special.
> Can you run those tests once and send me syslog + amfd/amfnd traces?
>
> Thanks,
> Minh
> On 10/02/16 23:00, Nagendra Kumar wrote:
>> TC #16.      Same configuration as #12: Run SI shutdown and keep sleep of 5 
>> sec before saAmfCSIQuiescingComplete  and stop controller and then after 
>> sleep, reject saAmfCSIQuiescingComplete with SA_AIS_ERR_FAILED_OPERATION. 
>> All the assignment from SU1 on PL-3 and SU2 on PL-4 are removed and SI admin 
>> state is 2(locked):
>> saAmfSIAdminState                                  SA_UINT32_T  2 (0x2)
>>
>> "Si going into locked state" is different behaviour when controller is up 
>> and running and run this test case. In case, controller is available, SI 
>> will be in unlocked state and all the assignments will be on SU2 as Act and 
>> SU3 as Standby (on PL-4). This need either correction or documentation.
>>
>> TC #17.              Same configuration as #12: Run SG shutdown and keep 
>> sleep of 5 sec before saAmfCSIQuiescingComplete  and stop controller and 
>> then after sleep, reject saAmfCSIQuiescingComplete with 
>> SA_AIS_ERR_FAILED_OPERATION. Amfnd crashes[Please note that this test case 
>> works with controller up]:
>> Syslog and bt:
>> Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO component with
>> QUIESCED/QUIESCING assignment failed Feb 10 11:44:29 PM_PL-3 
>> osafamfnd[15508]: NO recovery action 'comp restart' escalated to 'comp 
>> failover'
>> Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO SU failover probation
>> timer started (timeout: 1200000000000 ns) Feb 10 11:44:29 PM_PL-3
>> osafamfnd[15508]: NO Performing failover of 
>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' (SU failover count: 1) Feb 10 
>> 11:44:29 PM_PL-3 osafamfnd[15508]: NO 
>> 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' recovery action 
>> escalated from 'componentRestart' to 'componentFailover'
>> Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO 
>> 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' faulted due to 
>> 'csiSetcallbackFailed' : Recovery is 'componentFailover'
>> Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO
>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State INSTANTIATED => 
>> TERMINATING Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO Removed 
>> 'safSi=AmfDemo,safApp=AmfDemo1' from 
>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
>> Feb 10 11:44:29 PM_PL-3 amf_demo[15721]: saAmfHAStateGet FAILED - 7
>> Feb 10 11:44:29 PM_PL-3 amf_demo[15721]: exiting (caught term signal)
>> Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO avnd_di_oper_send()
>> deferred as AMF director is offline Feb 10 11:44:29 PM_PL-3
>> osafimmnd[15760]: AL AMF Node Director is down, terminate this process
>>
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x0000000000412b50 in
>> avnd_comp_cmplete_all_assignment(avnd_cb_tag*, avnd_comp_tag*) ()
>> (gdb) bt
>> #0  0x0000000000412b50 in
>> avnd_comp_cmplete_all_assignment(avnd_cb_tag*, avnd_comp_tag*) ()
>> #1  0x000000000040a093 in
>> avnd_comp_clc_terming_cleansucc_hdler(avnd_cb_tag*, avnd_comp_tag*) ()
>> #2  0x000000000040c7d4 in avnd_comp_clc_fsm_run(avnd_cb_tag*,
>> avnd_comp_tag*, avnd_comp_clc_pres_fsm_ev) ()
>> #3  0x000000000040ce49 in avnd_evt_clc_resp_evh(avnd_cb_tag*,
>> avnd_evt_tag*) ()
>> #4  0x000000000042133f in avnd_main_process() () at main.cc:667
>> #5  0x0000000000405517 in main () at main.cc:186
>> (gdb) thread apply all bt
>>
>> Thread 4 (Thread 0x7fdaf3c05b00 (LWP 15512)):
>> #0  0x00007fdaf2b2b415 in __lll_unlock_wake () from
>> /lib64/libpthread.so.0
>> #1  0x00007fdaf2b27ac4 in _L_unlock_553 () from /lib64/libpthread.so.0
>> #2  0x00007fdaf2b279f7 in __pthread_mutex_unlock_usercnt () from
>> /lib64/libpthread.so.0
>> #3  0x00007fdaf37edac3 in ncs_os_lock () from
>> /usr/local/lib/libopensaf_core.so.0
>> #4  0x00007fdaf37e084d in ncs_ipc_send () from
>> /usr/local/lib/libopensaf_core.so.0
>> #5  0x000000000041eea1 in avnd_evt_send(avnd_cb_tag*, avnd_evt_tag*)
>> ()
>> #6  0x000000000040a2cb in
>> comp_clc_resp_callback(NCS_OS_PROC_EXECUTE_TIMED_CB_INFO*) ()
>> #7  0x00007fdaf37ecdfb in give_exec_mod_cb () from
>> /usr/local/lib/libopensaf_core.so.0
>> #8  0x00007fdaf37ecfde in ncs_exec_mod_hdlr () from
>> /usr/local/lib/libopensaf_core.so.0
>> #9  0x00007fdaf2b247b6 in start_thread () from /lib64/libpthread.so.0
>> #10 0x00007fdaf20da9cd in clone () from /lib64/libc.so.6
>> #11 0x0000000000000000 in ?? ()
>>
>> Thread 3 (Thread 0x7fdaf3c25b00 (LWP 15510)):
>> #0  0x00007fdaf20d14f6 in poll () from /lib64/libc.so.6
>> #1  0x00007fdaf3817623 in mdtm_process_recv_events () from
>> /usr/local/lib/libopensaf_core.so.0
>> #2  0x00007fdaf2b247b6 in start_thread () from /lib64/libpthread.so.0
>> #3  0x00007fdaf20da9cd in clone () from /lib64/libc.so.6
>> #4  0x0000000000000000 in ?? ()
>>
>> Thread 2 (Thread 0x7fdaf3c58b00 (LWP 15509)):
>> #0  0x00007fdaf20d14f6 in poll () from /lib64/libc.so.6
>> #1  0x00007fdaf37db22f in osaf_ppoll () from
>> /usr/local/lib/libopensaf_core.so.0
>> #2  0x00007fdaf37e2acf in ncs_tmr_wait () from
>> /usr/local/lib/libopensaf_core.so.0
>> #3  0x00007fdaf2b247b6 in start_thread () from /lib64/libpthread.so.0
>> #4  0x00007fdaf20da9cd in clone () from /lib64/libc.so.6
>> #5  0x0000000000000000 in ?? ()
>>
>> Thread 1 (Thread 0x7fdaf3c28720 (LWP 15508)):
>> #0  0x0000000000412b50 in
>> avnd_comp_cmplete_all_assignment(avnd_cb_tag*, avnd_comp_tag*) ()
>> #1  0x000000000040a093 in
>> avnd_comp_clc_terming_cleansucc_hdler(avnd_cb_tag*, avnd_comp_tag*) ()
>> #2  0x000000000040c7d4 in avnd_comp_clc_fsm_run(avnd_cb_tag*,
>> avnd_comp_tag*, avnd_comp_clc_pres_fsm_ev) ()
>> #3  0x000000000040ce49 in avnd_evt_clc_resp_evh(avnd_cb_tag*,
>> avnd_evt_tag*) ()
>> #4  0x000000000042133f in avnd_main_process() () at main.cc:667
>> #5  0x0000000000405517 in main () at main.cc:186
>>
>> TC #18.              Same configuration as #12: Run SG lock and keep gdb in 
>> amf_csi_remove_callback  and stop controller and then start the controller 
>> and make is up. Now release Amfnd from gdb so that it can respond to csi 
>> remove(Please note that controller has reboot and is available now). Now, 
>> issue SG unlock. Amfnd crashes on PL-3 and PL-4 at the same location[Please 
>> note that this test case works with controller up]:
>> Syslog and Bt:
>> Feb 10 16:35:51 PM_PL-3 amf_demo[26623]: CSI Remove for all CSIs Feb
>> 10 16:35:51 PM_PL-3 osafamfnd[26545]: NO Removed 
>> 'safSi=AmfDemo,safApp=AmfDemo1' from 
>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
>> Feb 10 16:36:02 PM_PL-3 osafamfnd[26545]: NO Assigning 
>> 'safSi=AmfDemo,safApp=AmfDemo1' ACTIVE to 
>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
>> Feb 10 16:36:02 PM_PL-3 amf_demo[26623]: CSI Set - add 
>> 'safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1' HAState Active
>> Feb 10 16:36:02 PM_PL-3 amf_demo[26623]:        name: abcdef, value: val1
>> Feb 10 16:36:02 PM_PL-3 amf_demo[26623]:        name: abcdef, value: val2
>> Feb 10 16:36:02 PM_PL-3 osafamfnd[26545]: NO Assigned 
>> 'safSi=AmfDemo,safApp=AmfDemo1' ACTIVE to 
>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
>> Feb 10 16:36:02 PM_PL-3 osafamfnd[26545]: di.cc:850: avnd_di_susi_resp_send: 
>> Assertion 'si' failed.
>> Feb 10 16:36:02 PM_PL-3 osafclmna[26536]: AL AMF Node Director is
>> down, terminate this process
>>
>> Core was generated by `/usr/local/lib/opensaf/osafamfnd 
>> --tracemask=0xffffffff'.
>> Program terminated with signal 6, Aborted.
>> #0  0x00007f022ebd9b55 in raise () from /lib64/libc.so.6
>> (gdb) bt
>> #0  0x00007f022ebd9b55 in raise () from /lib64/libc.so.6
>> #1  0x00007f022ebdb131 in abort () from /lib64/libc.so.6
>> #2  0x00007f023038331b in __osafassert_fail () from
>> /usr/local/lib/libopensaf_core.so.0
>> #3  0x000000000041b399 in avnd_di_susi_resp_send(avnd_cb_tag*,
>> avnd_su_tag*, avnd_su_si_rec*) ()
>> #4  0x000000000042e9fa in avnd_su_si_oper_done(avnd_cb_tag*,
>> avnd_su_tag*, avnd_su_si_rec*) ()
>> #5  0x0000000000411622 in avnd_comp_csi_assign_done(avnd_cb_tag*,
>> avnd_comp_tag*, avnd_comp_csi_rec*) ()
>> #6  0x0000000000407397 in avnd_evt_ava_resp_evh(avnd_cb_tag*,
>> avnd_evt_tag*) ()
>> #7  0x000000000042133f in avnd_main_process() () at main.cc:667
>> #8  0x0000000000405517 in main () at main.cc:186
>>
>> TC #19.              Same configuration as #12: Run Node lock and keep sleep 
>> of 5 sec in amf_csi_set_callback and stop controller. Reject quisced 
>> assignment in amf_csi_set_callback, Amfnd crashes. Syslog and gdb is the 
>> same as in TC #17.
>>
>> TC #20.              Same configuration as #12: Issue Node shutdown: and 
>> keep sleep of 5 sec in amf_csi_set_callback before sending saAmfResponse() 
>> and stop controller. Amfnd crashes:
>> Syslog:
>>
>> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO component with 
>> QUIESCED/QUIESCING a                                              ssignment 
>> failed
>> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO recovery action 'comp restart' 
>> esca                                              lated to 'comp failover'
>> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO SU failover probation timer 
>> started                                               (timeout: 
>> 1200000000000 ns)
>> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO Performing failover of 
>> 'safSu=SU1,s                                              
>> afSg=AmfDemo,safApp=AmfDemo1' (SU failover count: 1)
>> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO 
>> 'safComp=AmfDemo,safSu=SU1,safSg=Am                                          
>>     fDemo,safApp=AmfDemo1' recovery action escalated from 'componentRestart' 
>> to 'com                                              ponentFailover'
>> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO 
>> 'safComp=AmfDemo,safSu=SU1,safSg=Am                                          
>>     fDemo,safApp=AmfDemo1' faulted due to 'csiSetcallbackFailed' : Recovery 
>> is 'comp                                              onentFailover'
>> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO 
>> 'safSu=SU1,safSg=AmfDemo,safApp=Amf                                          
>>     Demo1' Presence State INSTANTIATED => TERMINATING
>> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO Removed 
>> 'safSi=AmfDemo,safApp=AmfDe                                              
>> mo1' from 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
>> Feb 10 17:21:10 PM_PL-3 amf_demo[29519]: saAmfHAStateGet FAILED - 7
>> Feb 10 17:21:10 PM_PL-3 osafimmnd[29561]: AL AMF Node Director is down, 
>> terminat                                              e this process
>> Feb 10 17:21:10 PM_PL-3 amf_demo[29519]: AL AMF Node Director is down, 
>> terminate                                               this process
>> Feb 10 17:21:10 PM_PL-3 amf_demo[29519]: exiting (caught term signal)
>> Feb 10 17:21:10 PM_PL-3 osafclmna[29321]: AL AMF Node Director is down, 
>> terminat                                              e this process
>>
>> Bt:
>> Core was generated by `/usr/local/lib/opensaf/osafamfnd 
>> --tracemask=0xffffffff'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x00000000004117c9 in avnd_comp_csi_assign_done(avnd_cb_tag*,
>> avnd_comp_tag*, avnd_comp_csi_rec*) ()
>> (gdb) bt
>> #0  0x00000000004117c9 in avnd_comp_csi_assign_done(avnd_cb_tag*,
>> avnd_comp_tag*, avnd_comp_csi_rec*) ()
>> #1  0x0000000000406a3b in
>> avnd_evt_ava_csi_quiescing_compl_evh(avnd_cb_tag*, avnd_evt_tag*) ()
>> #2  0x000000000042133f in avnd_main_process() () at main.cc:667
>> #3  0x0000000000405517 in main () at main.cc:186
>>
>>
>> Thanks
>> -Nagu
>>
>>> -----Original Message-----
>>> From: Nagendra Kumar
>>> Sent: 09 February 2016 21:39
>>> To: minh chau; hans.nordeb...@ericsson.com; gary....@dektech.com.au;
>>> Praveen Malviya
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: Re: [devel] FW: [PATCH 0 of 5] Review Request for amf: Add
>>> support for cloud resilience [#1620] V2
>>>
>>> 15. Same configuration as Test Case #12, SI lock. Keep gdb in both
>>> the SUs for csi remove and keep timeout as 100 sec. Slock SI and stop 
>>> controller.
>>> Start controller and allow csi remove to timeout.
>>> Two things:
>>>     SU2 has Standby assignment(which is wrong), SU1 has not assignment.
>>>     Error at PL-4 : SU-SI record addition failed
>>>
>>> PM_SC-1:/home/nagu/views/staging # amf-state  siass safSISU=safSu=PL-
>>> 4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF
>>>           saAmfSISUHAState=ACTIVE(1)
>>>           saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>>> safSISU=safSu=PL-
>>> 3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
>>>           saAmfSISUHAState=ACTIVE(1)
>>>           saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>>> safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,s
>>> afApp=AmfDemo1
>>>           saAmfSISUHAState=STANDBY(2)
>>>           saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>>> safSISU=safSu=SC-
>>> 1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
>>>           saAmfSISUHAState=ACTIVE(1)
>>>           saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>>> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-
>>> 2N,safApp=OpenSAF
>>>           saAmfSISUHAState=ACTIVE(1)
>>>           saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>>>
>>> Syslog of PL-4:
>>>
>>> Feb  9 21:24:50 PM_PL-4 osafamfnd[7998]: NO
>>> 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' component restart probation
>>> timer started (timeout: 60000000000 ns) Feb  9 21:24:50 PM_PL-4
>>> osafamfnd[7998]: NO Restarting a component of
>>> 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' (comp restart count: 1) Feb
>>> 9 21:24:50 PM_PL-4 osafamfnd[7998]: NO
>>> 'safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1'
>>> faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart'
>>> Feb  9 21:24:55 PM_PL-4 amf_demo_script: killproc
>>> /opt/amf_demo/amf_demo failed Feb  9 21:24:55 PM_PL-4
>>> amf_demo[8200]:
>>> 'safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1'
>>> started Feb  9 21:24:55 PM_PL-4 osafamfnd[7998]: NO Removed
>>> 'safSi=AmfDemo1,safApp=AmfDemo1' from
>>> 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1'
>>> Feb  9 21:24:55 PM_PL-4 amf_demo[8200]: HC started with AMF Feb  9
>>> 21:24:55 PM_PL-4 amf_demo[8200]: Registered with AMF Feb  9 21:24:55
>>> PM_PL-4 amf_demo[8200]: CSI Set - add
>>> 'safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1' HAState Standby
>>> Feb  9 21:24:55 PM_PL-4 amf_demo[8200]:         name: abcdef, value: val1
>>> Feb  9 21:24:55 PM_PL-4 amf_demo[8200]:         name: abcdef, value: val2
>>> Feb  9 21:24:55 PM_PL-4 osafamfnd[7998]: CR SU-SI record addition
>>> failed, SU= safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 :
>>> SI=safSi=AmfDemo,safApp=AmfDemo1 Feb  9 21:24:55 PM_PL-4
>>> amf_demo[8200]: Health check 1 Feb  9 21:25:50 PM_PL-4
>>> osafamfnd[7998]: NO 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1'
>>> Component or SU restart probation timer expired
>>>
>>> Thanks
>>> -Nagu
>>>
>>> ---------------------------------------------------------------------
>>> ---------
>>> Site24x7 APM Insight: Get Deep Visibility into Application
>>> Performance APM + Mobile APM + RUM: Monitor 3 App instances at just
>>> $35/Month Monitor end-to-end web transactions and take corrective
>>> actions now Troubleshoot faster and improve end-user experience. Signup Now!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>> _______________________________________________
>>> Opensaf-devel mailing list
>>> Opensaf-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + 
> Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor 
> end-to-end web transactions and take corrective actions now Troubleshoot 
> faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Opensaf-devel mailing list
> Opensaf-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to