[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down
- **status**: review --> fixed - **Comment**: changeset: 8656:a56101161326 branch: opensaf-5.0.x parent: 8651:a90faf589254 user:Nagendra Kumardate:Tue Mar 07 13:18:45 2017 +0530 summary: amfnd: avoid null pointer access [#2213] changeset: 8657:a203318fb21e branch: opensaf-5.1.x parent: 8652:a7c62f1de1a3 user:Nagendra Kumar date:Tue Mar 07 13:19:02 2017 +0530 summary: amfnd: avoid null pointer access [#2213] changeset: 8658:136a8f432da6 tag: tip parent: 8655:45be1e612ab6 user:Nagendra Kumar date:Tue Mar 07 13:19:16 2017 +0530 summary: amfnd: avoid null pointer access [#2213] [staging:a56101] [staging:a20331] [staging:136a8f] --- ** [tickets:#2213] AMFND: Coredump if suFailover while shutting down** **Status:** fixed **Milestone:** 5.2.RC1 **Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau **Last Updated:** Tue Mar 07, 2017 07:21 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) (548.6 kB; application/x-compressed) Seen amfnd coredump in PL5 with bt as below while cluster is shutting down ~~~ Thread 1 (Thread 0x7f92a8925780 (LWP 411)): #0 __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358 No locals. #1 0x00449cc9 in avsv_dblist_sastring_cmp (key1=, key2=) at util.c:361 i = 0 str1 = str2 = #2 0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, key=0x656d6e6769737361 ) at ncsdlib.c:169 start_ptr = 0x1ee3168 #3 0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 <_avnd_cb>, comp=0x1ee8200) at comp.cc:2652 curr = 0x1ee8060 prv = 0x1ee3150 __FUNCTION__ = "avnd_comp_cmplete_all_csi_rec" #4 0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc .cc:3161 rc = #5 avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, prv_st=prv_st@entry= SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967 csi = 0x0 __FUNCTION__ = "avnd_comp_clc_st_chng_prc" ev = AVND_SU_PRES_FSM_EV_MAX is_en = rc = 1 #6 0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, ev= AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906 prv_st = final_st = rc = 1 __FUNCTION__ = "avnd_comp_clc_fsm_run" #7 0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, evt=0x7f9298c0) at clc.cc:414 __FUNCTION__ = "avnd_evt_clc_resp_evh" ev = clc_evt = 0x7f9298e0 comp = 0x1ee8200 rc = 1 #8 0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626 cb = 0x666940 <_avnd_cb> rc = 1 #9 avnd_main_process () at main.cc:577 ret = fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, revents = 0}, {fd = 14, events = 1, revents = 0}, {fd = 0, events = 0, revents = 0}} evt = 0x7f9298c0 __FUNCTION__ = "avnd_main_process" result = rc = #10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202 error = 0 1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory. ~~~ In syslog of PL5: 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=4,safSg=1,safApp=npm_2' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=4,safSg=1,safApp=npm_2'
[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down
- **status**: assigned --> review - **Version**: --> 5.1 GA --- ** [tickets:#2213] AMFND: Coredump if suFailover while shutting down** **Status:** review **Milestone:** 5.2.RC1 **Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau **Last Updated:** Thu Mar 02, 2017 08:08 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) (548.6 kB; application/x-compressed) Seen amfnd coredump in PL5 with bt as below while cluster is shutting down ~~~ Thread 1 (Thread 0x7f92a8925780 (LWP 411)): #0 __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358 No locals. #1 0x00449cc9 in avsv_dblist_sastring_cmp (key1=, key2=) at util.c:361 i = 0 str1 = str2 = #2 0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, key=0x656d6e6769737361 ) at ncsdlib.c:169 start_ptr = 0x1ee3168 #3 0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 <_avnd_cb>, comp=0x1ee8200) at comp.cc:2652 curr = 0x1ee8060 prv = 0x1ee3150 __FUNCTION__ = "avnd_comp_cmplete_all_csi_rec" #4 0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc .cc:3161 rc = #5 avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, prv_st=prv_st@entry= SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967 csi = 0x0 __FUNCTION__ = "avnd_comp_clc_st_chng_prc" ev = AVND_SU_PRES_FSM_EV_MAX is_en = rc = 1 #6 0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, ev= AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906 prv_st = final_st = rc = 1 __FUNCTION__ = "avnd_comp_clc_fsm_run" #7 0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, evt=0x7f9298c0) at clc.cc:414 __FUNCTION__ = "avnd_evt_clc_resp_evh" ev = clc_evt = 0x7f9298e0 comp = 0x1ee8200 rc = 1 #8 0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626 cb = 0x666940 <_avnd_cb> rc = 1 #9 avnd_main_process () at main.cc:577 ret = fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, revents = 0}, {fd = 14, events = 1, revents = 0}, {fd = 0, events = 0, revents = 0}} evt = 0x7f9298c0 __FUNCTION__ = "avnd_main_process" result = rc = #10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202 error = 0 1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory. ~~~ In syslog of PL5: 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=4,safSg=1,safApp=npm_2' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=4,safSg=1,safApp=npm_2' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 amfclccli[729]: CLEANUP request 'safComp=A,safSu=4,safSg=1,safApp=npm_2' 2016-11-20 22:01:21 PL-5 amfclccli[728]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:01:21 PL-5 amfclccli[727]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=npm_1' 2016-11-20 22:02:12 PL-5 osafamfnd[411]: NO Removed 'safSi=2,safApp=nway_1' from 'safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:02:12 PL-5 osafimmnd[394]: NO Global discard node received for nodeId:2040f pid:399 2016-11-20 22:02:12 PL-5 osafdtmd[380]:
[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down
I think it will --- ** [tickets:#2213] AMFND: Coredump if suFailover while shutting down** **Status:** assigned **Milestone:** 5.2.RC1 **Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau **Last Updated:** Thu Mar 02, 2017 07:09 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) (548.6 kB; application/x-compressed) Seen amfnd coredump in PL5 with bt as below while cluster is shutting down ~~~ Thread 1 (Thread 0x7f92a8925780 (LWP 411)): #0 __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358 No locals. #1 0x00449cc9 in avsv_dblist_sastring_cmp (key1=, key2=) at util.c:361 i = 0 str1 = str2 = #2 0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, key=0x656d6e6769737361 ) at ncsdlib.c:169 start_ptr = 0x1ee3168 #3 0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 <_avnd_cb>, comp=0x1ee8200) at comp.cc:2652 curr = 0x1ee8060 prv = 0x1ee3150 __FUNCTION__ = "avnd_comp_cmplete_all_csi_rec" #4 0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc .cc:3161 rc = #5 avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, prv_st=prv_st@entry= SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967 csi = 0x0 __FUNCTION__ = "avnd_comp_clc_st_chng_prc" ev = AVND_SU_PRES_FSM_EV_MAX is_en = rc = 1 #6 0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, ev= AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906 prv_st = final_st = rc = 1 __FUNCTION__ = "avnd_comp_clc_fsm_run" #7 0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, evt=0x7f9298c0) at clc.cc:414 __FUNCTION__ = "avnd_evt_clc_resp_evh" ev = clc_evt = 0x7f9298e0 comp = 0x1ee8200 rc = 1 #8 0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626 cb = 0x666940 <_avnd_cb> rc = 1 #9 avnd_main_process () at main.cc:577 ret = fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, revents = 0}, {fd = 14, events = 1, revents = 0}, {fd = 0, events = 0, revents = 0}} evt = 0x7f9298c0 __FUNCTION__ = "avnd_main_process" result = rc = #10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202 error = 0 1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory. ~~~ In syslog of PL5: 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=4,safSg=1,safApp=npm_2' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=4,safSg=1,safApp=npm_2' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 amfclccli[729]: CLEANUP request 'safComp=A,safSu=4,safSg=1,safApp=npm_2' 2016-11-20 22:01:21 PL-5 amfclccli[728]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:01:21 PL-5 amfclccli[727]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=npm_1' 2016-11-20 22:02:12 PL-5 osafamfnd[411]: NO Removed 'safSi=2,safApp=nway_1' from 'safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:02:12 PL-5 osafimmnd[394]: NO Global discard node received for nodeId:2040f pid:399 2016-11-20 22:02:12 PL-5 osafdtmd[380]: NO Lost contact with 'PL-4' 2016-11-20
[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down
Will this work? : diff --git a/src/amf/amfnd/comp.cc b/src/amf/amfnd/comp.cc --- a/src/amf/amfnd/comp.cc +++ b/src/amf/amfnd/comp.cc @@ -2650,6 +2650,9 @@ void avnd_comp_cmplete_all_csi_rec(AVND_ /* generate csi-remove-done event... csi may be deleted */ (void)avnd_comp_csi_remove_done(cb, comp, curr); + if (curr == nullptr) + break; + if (0 == m_AVND_COMPDB_REC_CSI_GET(*comp, curr->name.c_str())) { curr = (prv) ? --- ** [tickets:#2213] AMFND: Coredump if suFailover while shutting down** **Status:** assigned **Milestone:** 5.2.RC1 **Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau **Last Updated:** Thu Mar 02, 2017 06:09 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) (548.6 kB; application/x-compressed) Seen amfnd coredump in PL5 with bt as below while cluster is shutting down ~~~ Thread 1 (Thread 0x7f92a8925780 (LWP 411)): #0 __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358 No locals. #1 0x00449cc9 in avsv_dblist_sastring_cmp (key1=, key2=) at util.c:361 i = 0 str1 = str2 = #2 0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, key=0x656d6e6769737361 ) at ncsdlib.c:169 start_ptr = 0x1ee3168 #3 0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 <_avnd_cb>, comp=0x1ee8200) at comp.cc:2652 curr = 0x1ee8060 prv = 0x1ee3150 __FUNCTION__ = "avnd_comp_cmplete_all_csi_rec" #4 0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc .cc:3161 rc = #5 avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, prv_st=prv_st@entry= SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967 csi = 0x0 __FUNCTION__ = "avnd_comp_clc_st_chng_prc" ev = AVND_SU_PRES_FSM_EV_MAX is_en = rc = 1 #6 0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, ev= AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906 prv_st = final_st = rc = 1 __FUNCTION__ = "avnd_comp_clc_fsm_run" #7 0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, evt=0x7f9298c0) at clc.cc:414 __FUNCTION__ = "avnd_evt_clc_resp_evh" ev = clc_evt = 0x7f9298e0 comp = 0x1ee8200 rc = 1 #8 0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626 cb = 0x666940 <_avnd_cb> rc = 1 #9 avnd_main_process () at main.cc:577 ret = fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, revents = 0}, {fd = 14, events = 1, revents = 0}, {fd = 0, events = 0, revents = 0}} evt = 0x7f9298c0 __FUNCTION__ = "avnd_main_process" result = rc = #10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202 error = 0 1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory. ~~~ In syslog of PL5: 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=4,safSg=1,safApp=npm_2' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=4,safSg=1,safApp=npm_2' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' Presence State
[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down
Hi Nagu, I can not reproduce this scenario. I think there's one place we could improve to avoid coredump. It's in avnd_comp_cmplete_all_csi_rec() ~~~ { ... /* generate csi-remove-done event... csi may be deleted */ (void)avnd_comp_csi_remove_done(cb, comp, curr); if (0 == m_AVND_COMPDB_REC_CSI_GET(*comp, curr->name.c_str())) { curr = (prv) ? m_AVND_CSI_REC_FROM_COMP_DLL_NODE_GET(m_NCS_DBLIST_FIND_NEXT(>comp_dll_node)) : m_AVND_CSI_REC_FROM_COMP_DLL_NODE_GET(m_NCS_DBLIST_FIND_FIRST(>csi_list)); } else ... } ~~~ The call avnd_comp_csi_remove_done() is a recursion and can lead to a deletion of csi. Maybe we can also check @curr is non-null pointer in the "if" block of m_AVND_COMPDB_REC_CSI_GET ? Thanks, Minh --- ** [tickets:#2213] AMFND: Coredump if suFailover while shutting down** **Status:** assigned **Milestone:** 5.2.RC1 **Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau **Last Updated:** Wed Mar 01, 2017 09:06 PM UTC **Owner:** Nagendra Kumar **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) (548.6 kB; application/x-compressed) Seen amfnd coredump in PL5 with bt as below while cluster is shutting down ~~~ Thread 1 (Thread 0x7f92a8925780 (LWP 411)): #0 __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358 No locals. #1 0x00449cc9 in avsv_dblist_sastring_cmp (key1=, key2=) at util.c:361 i = 0 str1 = str2 = #2 0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, key=0x656d6e6769737361 ) at ncsdlib.c:169 start_ptr = 0x1ee3168 #3 0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 <_avnd_cb>, comp=0x1ee8200) at comp.cc:2652 curr = 0x1ee8060 prv = 0x1ee3150 __FUNCTION__ = "avnd_comp_cmplete_all_csi_rec" #4 0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc .cc:3161 rc = #5 avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, prv_st=prv_st@entry= SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967 csi = 0x0 __FUNCTION__ = "avnd_comp_clc_st_chng_prc" ev = AVND_SU_PRES_FSM_EV_MAX is_en = rc = 1 #6 0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, ev= AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906 prv_st = final_st = rc = 1 __FUNCTION__ = "avnd_comp_clc_fsm_run" #7 0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, evt=0x7f9298c0) at clc.cc:414 __FUNCTION__ = "avnd_evt_clc_resp_evh" ev = clc_evt = 0x7f9298e0 comp = 0x1ee8200 rc = 1 #8 0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626 cb = 0x666940 <_avnd_cb> rc = 1 #9 avnd_main_process () at main.cc:577 ret = fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, revents = 0}, {fd = 14, events = 1, revents = 0}, {fd = 0, events = 0, revents = 0}} evt = 0x7f9298c0 __FUNCTION__ = "avnd_main_process" result = rc = #10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202 error = 0 1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory. ~~~ In syslog of PL5: 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5
[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down
Hi Nagu, We only have seen it once so far and didn't have trace, I try to reproduce it again on latest changeset to see if it happens. Thanks, Minh --- ** [tickets:#2213] AMFND: Coredump if suFailover while shutting down** **Status:** assigned **Milestone:** 5.2.RC1 **Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau **Last Updated:** Wed Mar 01, 2017 11:30 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) (548.6 kB; application/x-compressed) Seen amfnd coredump in PL5 with bt as below while cluster is shutting down ~~~ Thread 1 (Thread 0x7f92a8925780 (LWP 411)): #0 __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358 No locals. #1 0x00449cc9 in avsv_dblist_sastring_cmp (key1=, key2=) at util.c:361 i = 0 str1 = str2 = #2 0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, key=0x656d6e6769737361 ) at ncsdlib.c:169 start_ptr = 0x1ee3168 #3 0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 <_avnd_cb>, comp=0x1ee8200) at comp.cc:2652 curr = 0x1ee8060 prv = 0x1ee3150 __FUNCTION__ = "avnd_comp_cmplete_all_csi_rec" #4 0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc .cc:3161 rc = #5 avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, prv_st=prv_st@entry= SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967 csi = 0x0 __FUNCTION__ = "avnd_comp_clc_st_chng_prc" ev = AVND_SU_PRES_FSM_EV_MAX is_en = rc = 1 #6 0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, ev= AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906 prv_st = final_st = rc = 1 __FUNCTION__ = "avnd_comp_clc_fsm_run" #7 0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, evt=0x7f9298c0) at clc.cc:414 __FUNCTION__ = "avnd_evt_clc_resp_evh" ev = clc_evt = 0x7f9298e0 comp = 0x1ee8200 rc = 1 #8 0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626 cb = 0x666940 <_avnd_cb> rc = 1 #9 avnd_main_process () at main.cc:577 ret = fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, revents = 0}, {fd = 14, events = 1, revents = 0}, {fd = 0, events = 0, revents = 0}} evt = 0x7f9298c0 __FUNCTION__ = "avnd_main_process" result = rc = #10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202 error = 0 1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory. ~~~ In syslog of PL5: 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=4,safSg=1,safApp=npm_2' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=4,safSg=1,safApp=npm_2' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 amfclccli[729]: CLEANUP request 'safComp=A,safSu=4,safSg=1,safApp=npm_2' 2016-11-20 22:01:21 PL-5 amfclccli[728]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:01:21 PL-5 amfclccli[727]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=npm_1' 2016-11-20 22:02:12 PL-5 osafamfnd[411]: NO Removed 'safSi=2,safApp=nway_1' from 'safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:02:12 PL-5 osafimmnd[394]: NO
[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down
Hi Minh, I am not getting any clue about the faults. Can you please provide traces if possible (via email). Thanks -Nagu --- ** [tickets:#2213] AMFND: Coredump if suFailover while shutting down** **Status:** assigned **Milestone:** 5.2.RC1 **Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau **Last Updated:** Wed Mar 01, 2017 10:57 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) (548.6 kB; application/x-compressed) Seen amfnd coredump in PL5 with bt as below while cluster is shutting down ~~~ Thread 1 (Thread 0x7f92a8925780 (LWP 411)): #0 __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358 No locals. #1 0x00449cc9 in avsv_dblist_sastring_cmp (key1=, key2=) at util.c:361 i = 0 str1 = str2 = #2 0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, key=0x656d6e6769737361 ) at ncsdlib.c:169 start_ptr = 0x1ee3168 #3 0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 <_avnd_cb>, comp=0x1ee8200) at comp.cc:2652 curr = 0x1ee8060 prv = 0x1ee3150 __FUNCTION__ = "avnd_comp_cmplete_all_csi_rec" #4 0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc .cc:3161 rc = #5 avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, prv_st=prv_st@entry= SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967 csi = 0x0 __FUNCTION__ = "avnd_comp_clc_st_chng_prc" ev = AVND_SU_PRES_FSM_EV_MAX is_en = rc = 1 #6 0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, ev= AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906 prv_st = final_st = rc = 1 __FUNCTION__ = "avnd_comp_clc_fsm_run" #7 0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, evt=0x7f9298c0) at clc.cc:414 __FUNCTION__ = "avnd_evt_clc_resp_evh" ev = clc_evt = 0x7f9298e0 comp = 0x1ee8200 rc = 1 #8 0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626 cb = 0x666940 <_avnd_cb> rc = 1 #9 avnd_main_process () at main.cc:577 ret = fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, revents = 0}, {fd = 14, events = 1, revents = 0}, {fd = 0, events = 0, revents = 0}} evt = 0x7f9298c0 __FUNCTION__ = "avnd_main_process" result = rc = #10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202 error = 0 1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory. ~~~ In syslog of PL5: 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=4,safSg=1,safApp=npm_2' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=4,safSg=1,safApp=npm_2' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 amfclccli[729]: CLEANUP request 'safComp=A,safSu=4,safSg=1,safApp=npm_2' 2016-11-20 22:01:21 PL-5 amfclccli[728]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:01:21 PL-5 amfclccli[727]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=npm_1' 2016-11-20 22:02:12 PL-5 osafamfnd[411]: NO Removed 'safSi=2,safApp=nway_1' from 'safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:02:12 PL-5 osafimmnd[394]: NO Global discard node received for
[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down
- **status**: unassigned --> assigned - **assigned_to**: Nagendra Kumar --- ** [tickets:#2213] AMFND: Coredump if suFailover while shutting down** **Status:** assigned **Milestone:** 5.2.RC1 **Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau **Last Updated:** Fri Dec 02, 2016 04:54 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) (548.6 kB; application/x-compressed) Seen amfnd coredump in PL5 with bt as below while cluster is shutting down ~~~ Thread 1 (Thread 0x7f92a8925780 (LWP 411)): #0 __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358 No locals. #1 0x00449cc9 in avsv_dblist_sastring_cmp (key1=, key2=) at util.c:361 i = 0 str1 = str2 = #2 0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, key=0x656d6e6769737361 ) at ncsdlib.c:169 start_ptr = 0x1ee3168 #3 0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 <_avnd_cb>, comp=0x1ee8200) at comp.cc:2652 curr = 0x1ee8060 prv = 0x1ee3150 __FUNCTION__ = "avnd_comp_cmplete_all_csi_rec" #4 0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc .cc:3161 rc = #5 avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, prv_st=prv_st@entry= SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967 csi = 0x0 __FUNCTION__ = "avnd_comp_clc_st_chng_prc" ev = AVND_SU_PRES_FSM_EV_MAX is_en = rc = 1 #6 0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, ev= AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906 prv_st = final_st = rc = 1 __FUNCTION__ = "avnd_comp_clc_fsm_run" #7 0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, evt=0x7f9298c0) at clc.cc:414 __FUNCTION__ = "avnd_evt_clc_resp_evh" ev = clc_evt = 0x7f9298e0 comp = 0x1ee8200 rc = 1 #8 0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626 cb = 0x666940 <_avnd_cb> rc = 1 #9 avnd_main_process () at main.cc:577 ret = fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, revents = 0}, {fd = 14, events = 1, revents = 0}, {fd = 0, events = 0, revents = 0}} evt = 0x7f9298c0 __FUNCTION__ = "avnd_main_process" result = rc = #10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202 error = 0 1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory. ~~~ In syslog of PL5: 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=4,safSg=1,safApp=npm_2' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=4,safSg=1,safApp=npm_2' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 amfclccli[729]: CLEANUP request 'safComp=A,safSu=4,safSg=1,safApp=npm_2' 2016-11-20 22:01:21 PL-5 amfclccli[728]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:01:21 PL-5 amfclccli[727]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=npm_1' 2016-11-20 22:02:12 PL-5 osafamfnd[411]: NO Removed 'safSi=2,safApp=nway_1' from 'safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:02:12 PL-5 osafimmnd[394]: NO Global discard node received for nodeId:2040f pid:399 2016-11-20 22:02:12 PL-5
[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down
--- ** [tickets:#2213] AMFND: Coredump if suFailover while shutting down** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau **Last Updated:** Fri Dec 02, 2016 04:54 AM UTC **Owner:** nobody **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) (548.6 kB; application/x-compressed) Seen amfnd coredump in PL5 with bt as below while cluster is shutting down ~~~ Thread 1 (Thread 0x7f92a8925780 (LWP 411)): #0 __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358 No locals. #1 0x00449cc9 in avsv_dblist_sastring_cmp (key1=, key2=) at util.c:361 i = 0 str1 = str2 = #2 0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, key=0x656d6e6769737361 ) at ncsdlib.c:169 start_ptr = 0x1ee3168 #3 0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 <_avnd_cb>, comp=0x1ee8200) at comp.cc:2652 curr = 0x1ee8060 prv = 0x1ee3150 __FUNCTION__ = "avnd_comp_cmplete_all_csi_rec" #4 0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc .cc:3161 rc = #5 avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, prv_st=prv_st@entry= SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967 csi = 0x0 __FUNCTION__ = "avnd_comp_clc_st_chng_prc" ev = AVND_SU_PRES_FSM_EV_MAX is_en = rc = 1 #6 0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, ev= AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906 prv_st = final_st = rc = 1 __FUNCTION__ = "avnd_comp_clc_fsm_run" #7 0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, evt=0x7f9298c0) at clc.cc:414 __FUNCTION__ = "avnd_evt_clc_resp_evh" ev = clc_evt = 0x7f9298e0 comp = 0x1ee8200 rc = 1 #8 0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626 cb = 0x666940 <_avnd_cb> rc = 1 #9 avnd_main_process () at main.cc:577 ret = fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, revents = 0}, {fd = 14, events = 1, revents = 0}, {fd = 0, events = 0, revents = 0}} evt = 0x7f9298c0 __FUNCTION__ = "avnd_main_process" result = rc = #10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202 error = 0 1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory. ~~~ In syslog of PL5: 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=4,safSg=1,safApp=npm_2' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=4,safSg=1,safApp=npm_2' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 amfclccli[729]: CLEANUP request 'safComp=A,safSu=4,safSg=1,safApp=npm_2' 2016-11-20 22:01:21 PL-5 amfclccli[728]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:01:21 PL-5 amfclccli[727]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=npm_1' 2016-11-20 22:02:12 PL-5 osafamfnd[411]: NO Removed 'safSi=2,safApp=nway_1' from 'safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:02:12 PL-5 osafimmnd[394]: NO Global discard node received for nodeId:2040f pid:399 2016-11-20 22:02:12 PL-5 osafdtmd[380]: NO Lost contact with 'PL-4' 2016-11-20 22:02:13 PL-5 opensafd: