Hi Minh, I thought traces should be ok, so I didn't uploaded the configuration file. I will upload it going forward.
Thanks -Nagu > -----Original Message----- > From: minh chau [mailto:minh.c...@dektech.com.au] > Sent: 15 February 2016 14:27 > To: Nagendra Kumar; Gary Lee > Cc: hans.nordeb...@ericsson.com; Praveen Malviya; opensaf- > de...@lists.sourceforge.net > Subject: Re: [devel] [PATCH 0 of 5] Review Request for amf: Add support for > cloud resilience [#1620] V2 > > Hi Nagu, > > One thing that can help us to reproduce your problems, that can you attach > to the ticket the models you are using for test? > > Thanks, > Minh > > On 15/02/16 19:32, Nagendra Kumar wrote: > > Hi Gary, > > I am using the patch tar sent by Minh(9 Feb on devel list) and I using > these on same change set #7280 mentioned by Minh. So, please contact him > for any clarifications. > > > > Are you finding mismatch in the traces attached in the ticket #1620 (for > many test cases) and source code of Amfd anf Amfnd ? > > > > BTW, I am attaching the tar sent by Minh and how I applied patches on top > of #7280. Please note 010_log_1179.patch, I have taken from my repo as the > tar sent by Minh was not having correct log patch for 1179. So, ideally, Amf > patches should be the same, please check that. I enabled cloud feature > (IMMSV_SC_ABSENCE_ALLOWED) manually. > > ================================================== > > patch -p1 < /tmp/sf_cloud_resilience_integration/777_osaftimer_2.diff > > patch -p1 < ../OpensafHeadless/patches/010_log_1179.patch > > patch -p1 < /tmp/sf_cloud_resilience_integration/1620_README_V2.diff > > patch -p1 < /tmp/sf_cloud_resilience_integration/1620_amfd_V3.diff > > patch -p1 < /tmp/sf_cloud_resilience_integration/1620_amfnd_V3.diff > > patch -p1 < /tmp/sf_cloud_resilience_integration/1180_ntf_agent.diff > > patch -p1 < > > /tmp/sf_cloud_resilience_integration/1180_ntf_libs_common.diff > > patch -p1 < /tmp/sf_cloud_resilience_integration/1180_ntf_readme.diff > > patch -p1 < /tmp/sf_cloud_resilience_integration/180_ntf_test.diff > > patch -p1 < /tmp/sf_cloud_resilience_integration/1180_ntf_tools.diff > > patch -p1 < > > /tmp/sf_cloud_resilience_integration/1620_common_libs_V2.diff > > patch -p1 < /tmp/sf_cloud_resilience_integration/1620_config.diff > > patch -p1 < /tmp/sf_cloud_resilience_integration/1621_ckpt.diff > > patch -p1 < /tmp/sf_cloud_resilience_integration/1625_imm_1.diff > > patch -p1 < /tmp/sf_cloud_resilience_integration/1625_imm_2.diff > > patch -p1 < /tmp/sf_cloud_resilience_integration/1625_imm_3.diff > > patch -p1 < /tmp/sf_cloud_resilience_integration/1625_imm_4.diff > > patch -p1 < /tmp/sf_cloud_resilience_integration/1625_imm_5.diff > > patch -p1 < /tmp/sf_cloud_resilience_integration/1625_imm_6.diff > > patch -p1 < > > /tmp/sf_cloud_resilience_integration/1625_imm_7_compile_err.diff > > patch -p1 < /tmp/sf_cloud_resilience_integration/1646_clm.patch > > ================================================== > > > > I manually compared installed Amfd and Amfnd binary files with binary files > created in source code repo while compiling. I compiled again and they are > the same. All the patches are applied. > > So, please check from your side and confirm me if I am making any > mistake? > > > > Thanks > > -Nagu > >> -----Original Message----- > >> From: Gary Lee [mailto:gary....@dektech.com.au] > >> Sent: 15 February 2016 13:36 > >> To: Nagendra Kumar > >> Cc: minh chau; hans.nordeb...@ericsson.com; Praveen Malviya; > opensaf- > >> de...@lists.sourceforge.net > >> Subject: Re: [devel] [PATCH 0 of 5] Review Request for amf: Add > >> support for cloud resilience [#1620] V2 > >> > >> Hi Nagu > >> > >> I think we need to make sure we’re all looking at the same source code. > >> > >> I have trouble recreating some of the problems you’ve seen, but I see > >> other problems. > >> > >> Perhaps we can set up a fork of opensaf-staging on source forge, and > >> check in the patches? > >> > >> Thanks > >> Gary > >> > >> > >>> On 15 Feb 2016, at 4:00 PM, Gary Lee <gary....@dektech.com.au> > wrote: > >>> > >>> Hi Nagu > >>> > >>> Just wanted to confirm that when you attach gdb to a process, the > >>> process > >> is amf_demo, and not amfnd? > >>> Thanks > >>> Gary > >>> > >>>> On 13 Feb 2016, at 1:35 AM, Nagendra Kumar > <nagendr...@oracle.com> > >> wrote: > >>>> TC #27: Same configuration as TC #24: > >>>> Add a new Csi in running demo appl: Keep gdb in SU1 comp in > >> amf_csi_set_callback and add new csi to existing si. Stop controller, > >> and then respond from gdb. Start controller. Only Act assignment is > >> given to SU1 component. Standby csi assignment is not given to SU2 > component: > >> > safCSIComp=safComp=AmfDemo\,safSu=SU1\,safSg=AmfDemo\,safApp=Am > >> fDemo1 > >>>> ,safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1 > >>>> > >> > safCSIComp=safComp=AmfDemo\,safSu=SU2\,safSg=AmfDemo\,safApp=Am > >> fDemo1 > >>>> ,safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1 > >>>> > >> > safCSIComp=safComp=AmfDemo\,safSu=SU1\,safSg=AmfDemo\,safApp=Am > >> fDemo1 > >>>> ,safCsi=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1 > >>>> > >>>> Logs " TC 27" are attached in ticket. > >>>> > >>>> TC #28: Same configuration as TC #24: > >>>> Delete a Csi in running demo appl: Add a new csi and keep gdb in > >>>> SU1 > >> comp in amf_csi_remove_callback and then delete csi. Stop controller, > >> and then respond from gdb. Start controller. Amfd crashes: > >>>> Feb 12 15:52:32 PM_SC-1 osafamfnd[16096]: Started Feb 12 15:52:32 > >>>> PM_SC-1 osafamfnd[16096]: NO Sending node up due to NCSMDS_UP > >> Feb 12 > >>>> 15:52:32 PM_SC-1 osafamfd[16086]: NO Received node_up from > 2010f: > >>>> msg_id 1 Feb 12 15:52:32 PM_SC-1 osafamfd[16086]: csi.cc:1470: > >> avd_compcsi_recreate: Assertion 'csi' failed. > >>>> Feb 12 15:52:32 PM_SC-1 osafamfnd[16096]: WA AMF director > >>>> unexpectedly crashed Feb 12 15:52:32 PM_SC-1 osafamfnd[16096]: WA > >> AMF > >>>> director unexpectedly crashed > >>>> > >>>> Logs " TC 28" are attached in ticket. > >>>> > >>>> TC #29: Configuration: One controller, and two PLs. 2N SG, SU1 and > >>>> SU2 > >> with 3 comp with one csi and SI associated. Si under si deps. > >> safSi=B,safApp=Test depends on safSi=A,safApp=Test and > >> safSi=C,safApp=Test depends on safSi=B,safApp=Test. SU1(Act) on PL-3 > >> and > >> SU2(Standby) on PL-4. > >>>> Lock SI A and keep gdb in quisced callback and stop controller. > >>>> Respond > >> from gdb and start controller. After controller comes up SU2 has two > >> Standby assignment and no active assignments, which looks serious > problem. > >>>> safSISU=safSu=ABC2\,safSg=2N\,safApp=Test,safSi=B,safApp=Test > >>>> saAmfSISUHAState=STANDBY(2) > >>>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > >>>> safSISU=safSu=ABC2\,safSg=2N\,safApp=Test,safSi=C,safApp=Test > >>>> saAmfSISUHAState=STANDBY(2) > >>>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > >>>> > >>>> At the same time, there are two error message in controller: > >>>> Feb 12 19:32:01 PM_SC-1 osafamfd[1572]: EM sg_2n_fsm.cc:1439: > >>>> safSu=ABC2,safSg=2N,safApp=Test (31) Feb 12 19:32:01 PM_SC-1 > >>>> osafamfd[1572]: EM sg_2n_fsm.cc:1439: > >> safSu=ABC2,safSg=2N,safApp=Test > >>>> (31) > >>>> > >>>> Logs " TC 29" are attached in ticket. > >>>> > >>>> TC #30: Configuration same as TC #29: This case, Lock SI B, which > >>>> is > >> dependent on SI A and is sponsor of SI C. > >>>> Only assignment of SI B is removed and SI C assignment remains: > >>>> safSISU=safSu=ABC1\,safSg=2N\,safApp=Test,safSi=C,safApp=Test > >>>> saAmfSISUHAState=ACTIVE(1) > >>>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > >>>> safSISU=safSu=ABC2\,safSg=2N\,safApp=Test,safSi=A,safApp=Test > >>>> saAmfSISUHAState=STANDBY(2) > >>>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > >>>> safSISU=safSu=ABC1\,safSg=2N\,safApp=Test,safSi=A,safApp=Test > >>>> saAmfSISUHAState=ACTIVE(1) > >>>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > >>>> safSISU=safSu=ABC2\,safSg=2N\,safApp=Test,safSi=C,safApp=Test > >>>> saAmfSISUHAState=STANDBY(2) > >>>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > >>>> > >>>> Logs " TC 30" are attached in ticket. > >>>> > >>>> TC #31: Same configuration as TC #30. Lock SI B, which is dependent > >>>> on SI > >> A and is sponsor of SI C. Assignment of SI B is removed and tolerance > >> timer will start running. > >>>> Reboot the controller. The assignment of SI C should be removed > >>>> because > >> its sponsor is in locked state. > >>>> Logs " TC 30" are attached in ticket. > >>>> > >>>> Thanks > >>>> -Nagu > >>>> > >>>>> -----Original Message----- > >>>>> From: Nagendra Kumar > >>>>> Sent: 11 February 2016 20:06 > >>>>> To: minh chau; hans.nordeb...@ericsson.com; > >> gary....@dektech.com.au; > >>>>> Praveen Malviya > >>>>> Cc: opensaf-devel@lists.sourceforge.net > >>>>> Subject: Re: [devel] FW: [PATCH 0 of 5] Review Request for amf: > >>>>> Add support for cloud resilience [#1620] V2 > >>>>> > >>>>> TC #21: Node shutdown and same as TC #17. Logs " TC 21" are > >>>>> attached in ticket. > >>>>> > >>>>> TC #22: Node lock and unlock(2nd time), result same as TC #18. Logs " > >> TC 22" > >>>>> are attached in ticket. > >>>>> > >>>>> TC #23: Node Group shutdown operation: Stop controller during > >>>>> shutdown operation and it will remain in shutdown state even if > >>>>> all the assignments are removed. > >>>>> > >>>>> TC #24: Configuration: Start SC-1 and PL-3 and PL-4, create 2N Red > >>>>> model > >>>>> SU1(Act) on PL-3 and SU2(Standby) on PL-4 . Create a node group > >>>>> having PL- > >>>>> 3 and PL-4. > >>>>> Now, lock and lock-in the node group. Reboot the controller. Check > >>>>> the admin state of node group, it will be 3. Now, do unlock-in of > >>>>> node group, > >>>>> SU1 and SU2 are not instantiated(which is wrong). Node group admin > >>>>> state is > >>>>> 2 now(locked). Logs " TC 24" are attached in ticket. > >>>>> > >>>>> TC #25: Same configuration as TC #24: > >>>>> Lock the node group and keep gdb in amf_csi_set_callback in SU1's > >>>>> comp(Act) and amf_csi_remove_callback in SU2's comp(Standby). > >> Reboot > >>>>> the controller. When SC-1 comes up, respond from gdb from SU1 and > >>>>> SU2 components. The following error comes, that means extra susi > >>>>> remove is > >>>>> coming: > >>>>> Feb 11 19:46:27 PM_PL-3 osafamfnd[16881]: ER susi_assign_evh: > >>>>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' has no assignments > >>>>> > >>>>> Logs " TC 25" are attached in ticket. > >>>>> > >>>>> TC #26: Same configuration as TC #24: > >>>>> Lock the node group and keep gdb in amf_csi_remove_callback in > >>>>> SU1's > >>>>> comp(Act) and amf_csi_remove_callback in SU2's comp(Standby). > >> Reboot > >>>>> the controller. When SC-1 comes up, respond from gdb from SU1 and > >>>>> SU2 components. Amfnd on PL-3 and PL-4 crashes: > >>>>> PL-3: > >>>>> Feb 11 19:56:41 PM_PL-3 osafamfnd[17561]: di.cc:850: > >>>>> avnd_di_susi_resp_send: Assertion 'si' failed. > >>>>> Feb 11 19:56:42 PM_PL-3 osafclmna[17552]: AL AMF Node Director is > >>>>> down, terminate this process > >>>>> PL-4: > >>>>> Feb 11 19:56:40 PM_PL-4 osafamfnd[10912]: di.cc:850: > >>>>> avnd_di_susi_resp_send: Assertion 'si' failed. > >>>>> Feb 11 19:56:40 PM_PL-4 osafimmnd[10893]: NO Global discard node > >>>>> received for nodeId:2030f pid:17542 Feb 11 19:56:40 PM_PL-4 > >>>>> osafimmnd[10893]: NO Implementer disconnected 13 <0, > 2030f(down)> > >>>>> (MsgQueueService131855) Feb 11 19:56:40 PM_PL-4 > amf_demo[11007]: > >> AL > >>>>> AMF Node Director is down, terminate this process > >>>>> > >>>>> Logs " TC 26" are attached in ticket. > >>>>> > >>>>> Thanks > >>>>> -Nagu > >>>>>> -----Original Message----- > >>>>>> From: Nagendra Kumar > >>>>>> Sent: 10 February 2016 17:30 > >>>>>> To: minh chau; hans.nordeb...@ericsson.com; > >>>>>> gary....@dektech.com.au; Praveen Malviya > >>>>>> Cc: opensaf-devel@lists.sourceforge.net > >>>>>> Subject: Re: [devel] FW: [PATCH 0 of 5] Review Request for amf: > >>>>>> Add support for cloud resilience [#1620] V2 > >>>>>> > >>>>>> TC #16. Same configuration as #12: Run SI shutdown and > keep sleep > >>>>>> of 5 sec before saAmfCSIQuiescingComplete and stop controller > >>>>>> and then after sleep, reject saAmfCSIQuiescingComplete with > >>>>>> SA_AIS_ERR_FAILED_OPERATION. All the assignment from SU1 on > PL-3 > >>>>>> and > >>>>>> SU2 on PL-4 are removed and SI admin state is 2(locked): > >>>>>> saAmfSIAdminState SA_UINT32_T 2 (0x2) > >>>>>> > >>>>>> "Si going into locked state" is different behaviour when > >>>>>> controller is up and running and run this test case. In case, > >>>>>> controller is available, SI will be in unlocked state and all the > >>>>>> assignments will be on SU2 as Act and SU3 as Standby (on PL-4). > >>>>>> This need either correction > >>>>> or documentation. > >>>>>> TC #17. Same configuration as #12: Run SG shutdown and > >>>>> keep sleep > >>>>>> of 5 sec before saAmfCSIQuiescingComplete and stop controller > >>>>>> and then after sleep, reject saAmfCSIQuiescingComplete with > >>>>>> SA_AIS_ERR_FAILED_OPERATION. Amfnd crashes[Please note that > this > >>>>>> test case works with controller up]: > >>>>>> Syslog and bt: > >>>>>> Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO component with > >>>>>> QUIESCED/QUIESCING assignment failed Feb 10 11:44:29 PM_PL-3 > >>>>>> osafamfnd[15508]: NO recovery action 'comp restart' escalated to > >>>>>> 'comp failover' > >>>>>> Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO SU failover > >>>>>> probation timer started (timeout: 1200000000000 ns) Feb 10 > >>>>>> 11:44:29 PM_PL-3 > >>>>>> osafamfnd[15508]: NO Performing failover of > >>>>>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' (SU failover count: > 1) > >>>>>> Feb > >>>>>> 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO > >>>>>> > 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' > >>>>>> recovery action escalated from 'componentRestart' to > >>>>> 'componentFailover' > >>>>>> Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO > >>>>>> > 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' > >>>>>> faulted due to 'csiSetcallbackFailed' : Recovery is > 'componentFailover' > >>>>>> Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO > >>>>>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State > >>>>> INSTANTIATED > >>>>>> => TERMINATING Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO > >>>>> Removed > >>>>>> 'safSi=AmfDemo,safApp=AmfDemo1' from > >>>>>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' > >>>>>> Feb 10 11:44:29 PM_PL-3 amf_demo[15721]: saAmfHAStateGet > FAILED > >> - 7 > >>>>>> Feb 10 11:44:29 PM_PL-3 amf_demo[15721]: exiting (caught term > >>>>>> signal) Feb 10 11:44:29 PM_PL-3 osafamfnd[15508]: NO > >>>>>> avnd_di_oper_send() deferred as AMF director is offline Feb 10 > >>>>>> 11:44:29 PM_PL-3 > >>>>>> osafimmnd[15760]: AL AMF Node Director is down, terminate this > >>>>>> process > >>>>>> > >>>>>> Program terminated with signal 11, Segmentation fault. > >>>>>> #0 0x0000000000412b50 in > >>>>>> avnd_comp_cmplete_all_assignment(avnd_cb_tag*, > avnd_comp_tag*) > >> () > >>>>>> (gdb) bt > >>>>>> #0 0x0000000000412b50 in > >>>>>> avnd_comp_cmplete_all_assignment(avnd_cb_tag*, > avnd_comp_tag*) > >> () > >>>>>> #1 0x000000000040a093 in > >>>>>> avnd_comp_clc_terming_cleansucc_hdler(avnd_cb_tag*, > >>>>> avnd_comp_tag*) > >>>>>> () > >>>>>> #2 0x000000000040c7d4 in avnd_comp_clc_fsm_run(avnd_cb_tag*, > >>>>>> avnd_comp_tag*, avnd_comp_clc_pres_fsm_ev) () > >>>>>> #3 0x000000000040ce49 in avnd_evt_clc_resp_evh(avnd_cb_tag*, > >>>>>> avnd_evt_tag*) () > >>>>>> #4 0x000000000042133f in avnd_main_process() () at main.cc:667 > >>>>>> #5 0x0000000000405517 in main () at main.cc:186 > >>>>>> (gdb) thread apply all bt > >>>>>> > >>>>>> Thread 4 (Thread 0x7fdaf3c05b00 (LWP 15512)): > >>>>>> #0 0x00007fdaf2b2b415 in __lll_unlock_wake () from > >>>>>> /lib64/libpthread.so.0 > >>>>>> #1 0x00007fdaf2b27ac4 in _L_unlock_553 () from > >>>>>> /lib64/libpthread.so.0 > >>>>>> #2 0x00007fdaf2b279f7 in __pthread_mutex_unlock_usercnt () from > >>>>>> /lib64/libpthread.so.0 > >>>>>> #3 0x00007fdaf37edac3 in ncs_os_lock () from > >>>>>> /usr/local/lib/libopensaf_core.so.0 > >>>>>> #4 0x00007fdaf37e084d in ncs_ipc_send () from > >>>>>> /usr/local/lib/libopensaf_core.so.0 > >>>>>> #5 0x000000000041eea1 in avnd_evt_send(avnd_cb_tag*, > >>>>>> avnd_evt_tag*) > >>>>>> () > >>>>>> #6 0x000000000040a2cb in > >>>>>> > comp_clc_resp_callback(NCS_OS_PROC_EXECUTE_TIMED_CB_INFO*) > >> () > >>>>>> #7 0x00007fdaf37ecdfb in give_exec_mod_cb () from > >>>>>> /usr/local/lib/libopensaf_core.so.0 > >>>>>> #8 0x00007fdaf37ecfde in ncs_exec_mod_hdlr () from > >>>>>> /usr/local/lib/libopensaf_core.so.0 > >>>>>> #9 0x00007fdaf2b247b6 in start_thread () from > >>>>>> /lib64/libpthread.so.0 > >>>>>> #10 0x00007fdaf20da9cd in clone () from /lib64/libc.so.6 > >>>>>> #11 0x0000000000000000 in ?? () > >>>>>> > >>>>>> Thread 3 (Thread 0x7fdaf3c25b00 (LWP 15510)): > >>>>>> #0 0x00007fdaf20d14f6 in poll () from /lib64/libc.so.6 > >>>>>> #1 0x00007fdaf3817623 in mdtm_process_recv_events () from > >>>>>> /usr/local/lib/libopensaf_core.so.0 > >>>>>> #2 0x00007fdaf2b247b6 in start_thread () from > >>>>>> /lib64/libpthread.so.0 > >>>>>> #3 0x00007fdaf20da9cd in clone () from /lib64/libc.so.6 > >>>>>> #4 0x0000000000000000 in ?? () > >>>>>> > >>>>>> Thread 2 (Thread 0x7fdaf3c58b00 (LWP 15509)): > >>>>>> #0 0x00007fdaf20d14f6 in poll () from /lib64/libc.so.6 > >>>>>> #1 0x00007fdaf37db22f in osaf_ppoll () from > >>>>>> /usr/local/lib/libopensaf_core.so.0 > >>>>>> #2 0x00007fdaf37e2acf in ncs_tmr_wait () from > >>>>>> /usr/local/lib/libopensaf_core.so.0 > >>>>>> #3 0x00007fdaf2b247b6 in start_thread () from > >>>>>> /lib64/libpthread.so.0 > >>>>>> #4 0x00007fdaf20da9cd in clone () from /lib64/libc.so.6 > >>>>>> #5 0x0000000000000000 in ?? () > >>>>>> > >>>>>> Thread 1 (Thread 0x7fdaf3c28720 (LWP 15508)): > >>>>>> #0 0x0000000000412b50 in > >>>>>> avnd_comp_cmplete_all_assignment(avnd_cb_tag*, > avnd_comp_tag*) > >> () > >>>>>> #1 0x000000000040a093 in > >>>>>> avnd_comp_clc_terming_cleansucc_hdler(avnd_cb_tag*, > >>>>> avnd_comp_tag*) > >>>>>> () > >>>>>> #2 0x000000000040c7d4 in avnd_comp_clc_fsm_run(avnd_cb_tag*, > >>>>>> avnd_comp_tag*, avnd_comp_clc_pres_fsm_ev) () > >>>>>> #3 0x000000000040ce49 in avnd_evt_clc_resp_evh(avnd_cb_tag*, > >>>>>> avnd_evt_tag*) () > >>>>>> #4 0x000000000042133f in avnd_main_process() () at main.cc:667 > >>>>>> #5 0x0000000000405517 in main () at main.cc:186 > >>>>>> > >>>>>> TC #18. Same configuration as #12: Run SG lock and keep > >>>>>> gdb > >>>>> in > >>>>>> amf_csi_remove_callback and stop controller and then start the > >>>>>> controller and make is up. Now release Amfnd from gdb so that it > >>>>>> can respond to csi remove(Please note that controller has reboot > >>>>>> and is available now). Now, issue SG unlock. Amfnd crashes on > >>>>>> PL-3 and PL-4 at the same location[Please note that this test > >>>>>> case works with controller > >>>>> up]: > >>>>>> Syslog and Bt: > >>>>>> Feb 10 16:35:51 PM_PL-3 amf_demo[26623]: CSI Remove for all CSIs > >>>>>> Feb > >>>>>> 10 16:35:51 PM_PL-3 osafamfnd[26545]: NO Removed > >>>>>> 'safSi=AmfDemo,safApp=AmfDemo1' from > >>>>>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' > >>>>>> Feb 10 16:36:02 PM_PL-3 osafamfnd[26545]: NO Assigning > >>>>>> 'safSi=AmfDemo,safApp=AmfDemo1' ACTIVE to > >>>>>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' > >>>>>> Feb 10 16:36:02 PM_PL-3 amf_demo[26623]: CSI Set - add > >>>>>> 'safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1' HAState > Active > >>>>>> Feb 10 16:36:02 PM_PL-3 amf_demo[26623]: name: abcdef, > value: > >> val1 > >>>>>> Feb 10 16:36:02 PM_PL-3 amf_demo[26623]: name: abcdef, > value: > >> val2 > >>>>>> Feb 10 16:36:02 PM_PL-3 osafamfnd[26545]: NO Assigned > >>>>>> 'safSi=AmfDemo,safApp=AmfDemo1' ACTIVE to > >>>>>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' > >>>>>> Feb 10 16:36:02 PM_PL-3 osafamfnd[26545]: di.cc:850: > >>>>>> avnd_di_susi_resp_send: Assertion 'si' failed. > >>>>>> Feb 10 16:36:02 PM_PL-3 osafclmna[26536]: AL AMF Node Director > is > >>>>>> down, terminate this process > >>>>>> > >>>>>> Core was generated by `/usr/local/lib/opensaf/osafamfnd -- > >>>>>> tracemask=0xffffffff'. > >>>>>> Program terminated with signal 6, Aborted. > >>>>>> #0 0x00007f022ebd9b55 in raise () from /lib64/libc.so.6 > >>>>>> (gdb) bt > >>>>>> #0 0x00007f022ebd9b55 in raise () from /lib64/libc.so.6 > >>>>>> #1 0x00007f022ebdb131 in abort () from /lib64/libc.so.6 > >>>>>> #2 0x00007f023038331b in __osafassert_fail () from > >>>>>> /usr/local/lib/libopensaf_core.so.0 > >>>>>> #3 0x000000000041b399 in avnd_di_susi_resp_send(avnd_cb_tag*, > >>>>>> avnd_su_tag*, avnd_su_si_rec*) () > >>>>>> #4 0x000000000042e9fa in avnd_su_si_oper_done(avnd_cb_tag*, > >>>>>> avnd_su_tag*, avnd_su_si_rec*) () > >>>>>> #5 0x0000000000411622 in > >> avnd_comp_csi_assign_done(avnd_cb_tag*, > >>>>>> avnd_comp_tag*, avnd_comp_csi_rec*) () > >>>>>> #6 0x0000000000407397 in avnd_evt_ava_resp_evh(avnd_cb_tag*, > >>>>>> avnd_evt_tag*) () > >>>>>> #7 0x000000000042133f in avnd_main_process() () at main.cc:667 > >>>>>> #8 0x0000000000405517 in main () at main.cc:186 > >>>>>> > >>>>>> TC #19. Same configuration as #12: Run Node lock and > >>>>>> keep > >>>>> sleep of > >>>>>> 5 sec in amf_csi_set_callback and stop controller. Reject quisced > >>>>>> assignment in amf_csi_set_callback, Amfnd crashes. Syslog and gdb > >>>>>> is the same as in TC #17. > >>>>>> > >>>>>> TC #20. Same configuration as #12: Issue Node shutdown: > >>>>> and keep > >>>>>> sleep of 5 sec in amf_csi_set_callback before sending > >>>>>> saAmfResponse() and stop controller. Amfnd crashes: > >>>>>> Syslog: > >>>>>> > >>>>>> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO component with > >>>>>> QUIESCED/QUIESCING a > >>>>>> ssignment failed > >>>>>> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO recovery action > >> 'comp > >>>>>> restart' esca lated to > >>>>>> 'comp failover' > >>>>>> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO SU failover > >>>>>> probation > >>>>> timer > >>>>>> started (timeout: > >>>>>> 1200000000000 ns) > >>>>>> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO Performing failover > >> of > >>>>>> 'safSu=SU1,s > >> afSg=AmfDemo,safApp=AmfDemo1' > >>>>>> (SU failover count: 1) > >>>>>> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO > >>>>>> 'safComp=AmfDemo,safSu=SU1,safSg=Am > >>>>>> fDemo,safApp=AmfDemo1' recovery action escalated from > >>>>>> 'componentRestart' to 'com > >>>>>> ponentFailover' > >>>>>> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO > >>>>>> 'safComp=AmfDemo,safSu=SU1,safSg=Am > >>>>>> fDemo,safApp=AmfDemo1' faulted due to 'csiSetcallbackFailed' : > >>>>>> Recovery > >>>>> is > >>>>>> 'comp onentFailover' > >>>>>> Feb 10 17:21:10 PM_PL-3 osafamfnd[29330]: NO > >>>>>> 'safSu=SU1,safSg=AmfDemo,safApp=Amf > >> Demo1' > >>>>>> Presence State INSTANTIATED => TERMINATING Feb 10 17:21:10 > >> PM_PL-3 > >>>>>> osafamfnd[29330]: NO Removed > >>>>>> 'safSi=AmfDemo,safApp=AmfDe > >>>>>> mo1' from > >>>>>> 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' > >>>>>> Feb 10 17:21:10 PM_PL-3 amf_demo[29519]: saAmfHAStateGet > FAILED > >> - 7 > >>>>>> Feb 10 17:21:10 PM_PL-3 osafimmnd[29561]: AL AMF Node Director > is > >>>>>> down, terminat e this > >>>>>> process > >>>>>> Feb 10 17:21:10 PM_PL-3 amf_demo[29519]: AL AMF Node Director > is > >>>>> down, > >>>>>> terminate this process > >>>>>> Feb 10 17:21:10 PM_PL-3 amf_demo[29519]: exiting (caught term > >>>>>> signal) Feb 10 17:21:10 PM_PL-3 osafclmna[29321]: AL AMF Node > >>>>>> Director is > >>>>> down, > >>>>>> terminat e this process > >>>>>> > >>>>>> Bt: > >>>>>> Core was generated by `/usr/local/lib/opensaf/osafamfnd -- > >>>>>> tracemask=0xffffffff'. > >>>>>> Program terminated with signal 11, Segmentation fault. > >>>>>> #0 0x00000000004117c9 in > >> avnd_comp_csi_assign_done(avnd_cb_tag*, > >>>>>> avnd_comp_tag*, avnd_comp_csi_rec*) () > >>>>>> (gdb) bt > >>>>>> #0 0x00000000004117c9 in > >> avnd_comp_csi_assign_done(avnd_cb_tag*, > >>>>>> avnd_comp_tag*, avnd_comp_csi_rec*) () > >>>>>> #1 0x0000000000406a3b in > >>>>>> avnd_evt_ava_csi_quiescing_compl_evh(avnd_cb_tag*, > >> avnd_evt_tag*) > >>>>>> () > >>>>>> #2 0x000000000042133f in avnd_main_process() () at main.cc:667 > >>>>>> #3 0x0000000000405517 in main () at main.cc:186 > >>>>>> > >>>>>> > >>>>>> Thanks > >>>>>> -Nagu > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Nagendra Kumar > >>>>>>> Sent: 09 February 2016 21:39 > >>>>>>> To: minh chau; hans.nordeb...@ericsson.com; > >>>>> gary....@dektech.com.au; > >>>>>>> Praveen Malviya > >>>>>>> Cc: opensaf-devel@lists.sourceforge.net > >>>>>>> Subject: Re: [devel] FW: [PATCH 0 of 5] Review Request for amf: > >>>>>>> Add > >>>>>> support > >>>>>>> for cloud resilience [#1620] V2 > >>>>>>> > >>>>>>> 15. Same configuration as Test Case #12, SI lock. Keep gdb in > >>>>>>> both the SUs for csi remove and keep timeout as 100 sec. Slock > >>>>>>> SI and stop > >>>>> controller. > >>>>>>> Start controller and allow csi remove to timeout. > >>>>>>> Two things: > >>>>>>> SU2 has Standby assignment(which is wrong), SU1 has not > >>>>> assignment. > >>>>>>> Error at PL-4 : SU-SI record addition failed > >>>>>>> > >>>>>>> PM_SC-1:/home/nagu/views/staging # amf-state siass > >>>>>>> safSISU=safSu=PL- > >>>>> 4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF > >>>>>>> saAmfSISUHAState=ACTIVE(1) > >>>>>>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > >>>>>>> safSISU=safSu=PL- > >>>>>>> 3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF > >>>>>>> saAmfSISUHAState=ACTIVE(1) > >>>>>>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > >>>>>>> > >> > safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,s > >>>>>>> afApp=AmfDemo1 > >>>>>>> saAmfSISUHAState=STANDBY(2) > >>>>>>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > >>>>>>> safSISU=safSu=SC- > >>>>>>> 1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF > >>>>>>> saAmfSISUHAState=ACTIVE(1) > >>>>>>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > >>>>>>> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC- > >>>>>>> 2N,safApp=OpenSAF > >>>>>>> saAmfSISUHAState=ACTIVE(1) > >>>>>>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) > >>>>>>> > >>>>>>> Syslog of PL-4: > >>>>>>> > >>>>>>> Feb 9 21:24:50 PM_PL-4 osafamfnd[7998]: NO > >>>>>>> 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' component > restart > >>>>>>> probation timer started (timeout: 60000000000 ns) Feb 9 > >>>>>>> 21:24:50 > >>>>>>> PM_PL- > >>>>>> 4 > >>>>>>> osafamfnd[7998]: NO Restarting a component of > >>>>>>> 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' (comp restart > count: > >> 1) > >>>>>> Feb > >>>>>>> 9 21:24:50 PM_PL-4 osafamfnd[7998]: NO > >>>>>>> > 'safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' > >>>>>>> faulted due to 'csiRemovecallbackTimeout' : Recovery is > >>>>>> 'componentRestart' > >>>>>>> Feb 9 21:24:55 PM_PL-4 amf_demo_script: killproc > >>>>>>> /opt/amf_demo/amf_demo failed Feb 9 21:24:55 PM_PL-4 > >>>>>>> amf_demo[8200]: > >>>>>>> > 'safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' > >>>>>>> started Feb 9 21:24:55 PM_PL-4 osafamfnd[7998]: NO Removed > >>>>>>> 'safSi=AmfDemo1,safApp=AmfDemo1' from > >>>>>>> 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' > >>>>>>> Feb 9 21:24:55 PM_PL-4 amf_demo[8200]: HC started with AMF > Feb > >> 9 > >>>>>>> 21:24:55 PM_PL-4 amf_demo[8200]: Registered with AMF Feb 9 > >>>>> 21:24:55 > >>>>>>> PM_PL-4 amf_demo[8200]: CSI Set - add > >>>>>>> 'safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1' HAState > >> Standby > >>>>>>> Feb 9 21:24:55 PM_PL-4 amf_demo[8200]: name: abcdef, > value: > >>>>> val1 > >>>>>>> Feb 9 21:24:55 PM_PL-4 amf_demo[8200]: name: abcdef, > value: > >>>>> val2 > >>>>>>> Feb 9 21:24:55 PM_PL-4 osafamfnd[7998]: CR SU-SI record > >>>>>>> addition failed, SU= > safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 : > >>>>>>> SI=safSi=AmfDemo,safApp=AmfDemo1 Feb 9 21:24:55 PM_PL-4 > >>>>>>> amf_demo[8200]: Health check 1 Feb 9 21:25:50 PM_PL-4 > >>>>>>> osafamfnd[7998]: NO > >> 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' > >>>>>>> Component or SU restart probation timer expired > >>>>>>> > >>>>>>> Thanks > >>>>>>> -Nagu > >>>>>>> > >>>>>>> ---------------------------------------------------------------- > >>>>>>> -- > >>>>>>> -- > >>>>>>> ---------- > >>>>>>> Site24x7 APM Insight: Get Deep Visibility into Application > >>>>>>> Performance APM > >>>>>> + > >>>>>>> Mobile APM + RUM: Monitor 3 App instances at just $35/Month > >>>>>>> Monitor end-to-end web transactions and take corrective actions > >>>>>>> now > >>>>>> Troubleshoot > >>>>>>> faster and improve end-user experience. Signup Now! > >>>>>>> > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 > >>>>>>> _______________________________________________ > >>>>>>> Opensaf-devel mailing list > >>>>>>> Opensaf-devel@lists.sourceforge.net > >>>>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel > >>>>>> ----------------------------------------------------------------- > >>>>>> -- > >>>>>> --- > >>>>>> -------- > >>>>>> Site24x7 APM Insight: Get Deep Visibility into Application > >>>>>> Performance APM + Mobile APM + RUM: Monitor 3 App instances at > >> just > >>>>>> $35/Month Monitor end-to-end web transactions and take > corrective > >>>>>> actions now Troubleshoot faster and improve end-user experience. > >> Signup Now! > >>>>>> > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 > >>>>>> _______________________________________________ > >>>>>> Opensaf-devel mailing list > >>>>>> Opensaf-devel@lists.sourceforge.net > >>>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel > >>>>> ------------------------------------------------------------------ > >>>>> -- > >>>>> ---------- > >>>>> Site24x7 APM Insight: Get Deep Visibility into Application > >>>>> Performance APM + Mobile APM + RUM: Monitor 3 App instances at > >>>>> just $35/Month Monitor end-to-end web transactions and take > >>>>> corrective actions now Troubleshoot faster and improve end-user > experience. > >> Signup Now! > >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 > >>>>> _______________________________________________ > >>>>> Opensaf-devel mailing list > >>>>> Opensaf-devel@lists.sourceforge.net > >>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel