[tickets] [opensaf:tickets] #2321 Incorrect error messages "mkfifo already exists" observed in syslog
Thanks. I have sent patch for review. Please review it. --- ** [tickets:#2321] Incorrect error messages "mkfifo already exists" observed in syslog** **Status:** review **Milestone:** 5.2.RC1 **Created:** Thu Feb 23, 2017 05:46 AM UTC by Ritu Raj **Last Updated:** Thu Mar 02, 2017 07:00 AM UTC **Owner:** Nagendra Kumar # Environment details OS : Suse 64bit Changeset : 8603( 5.2.MO-1) # Summary Incorrect error messages "mkfifo already exists" observed in syslog after perfoming opensaf stop and start operation. #Steps 1. Started the OpenSAF on single controller 2. Stop the OpenSAF and start agian, while starting OpnSAF again on same node following error message observed in syslog for component osafamfnd and osafamfwd: Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: mkfifo already exists: /var/lib/opensaf/osafamfnd.fifo File exists Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: Started Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: mkfifo already exists: /var/lib/opensaf/osafamfwd.fifo File exists Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: Started --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down
Will this work? : diff --git a/src/amf/amfnd/comp.cc b/src/amf/amfnd/comp.cc --- a/src/amf/amfnd/comp.cc +++ b/src/amf/amfnd/comp.cc @@ -2650,6 +2650,9 @@ void avnd_comp_cmplete_all_csi_rec(AVND_ /* generate csi-remove-done event... csi may be deleted */ (void)avnd_comp_csi_remove_done(cb, comp, curr); + if (curr == nullptr) + break; + if (0 == m_AVND_COMPDB_REC_CSI_GET(*comp, curr->name.c_str())) { curr = (prv) ? --- ** [tickets:#2213] AMFND: Coredump if suFailover while shutting down** **Status:** assigned **Milestone:** 5.2.RC1 **Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau **Last Updated:** Thu Mar 02, 2017 06:09 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) (548.6 kB; application/x-compressed) Seen amfnd coredump in PL5 with bt as below while cluster is shutting down ~~~ Thread 1 (Thread 0x7f92a8925780 (LWP 411)): #0 __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358 No locals. #1 0x00449cc9 in avsv_dblist_sastring_cmp (key1=, key2=) at util.c:361 i = 0 str1 = str2 = #2 0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, key=0x656d6e6769737361 ) at ncsdlib.c:169 start_ptr = 0x1ee3168 #3 0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 <_avnd_cb>, comp=0x1ee8200) at comp.cc:2652 curr = 0x1ee8060 prv = 0x1ee3150 __FUNCTION__ = "avnd_comp_cmplete_all_csi_rec" #4 0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc .cc:3161 rc = #5 avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, prv_st=prv_st@entry= SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967 csi = 0x0 __FUNCTION__ = "avnd_comp_clc_st_chng_prc" ev = AVND_SU_PRES_FSM_EV_MAX is_en = rc = 1 #6 0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, ev= AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906 prv_st = final_st = rc = 1 __FUNCTION__ = "avnd_comp_clc_fsm_run" #7 0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, evt=0x7f9298c0) at clc.cc:414 __FUNCTION__ = "avnd_evt_clc_resp_evh" ev = clc_evt = 0x7f9298e0 comp = 0x1ee8200 rc = 1 #8 0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626 cb = 0x666940 <_avnd_cb> rc = 1 #9 avnd_main_process () at main.cc:577 ret = fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, revents = 0}, {fd = 14, events = 1, revents = 0}, {fd = 0, events = 0, revents = 0}} evt = 0x7f9298c0 __FUNCTION__ = "avnd_main_process" result = rc = #10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202 error = 0 1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory. ~~~ In syslog of PL5: 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=4,safSg=1,safApp=npm_2' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=4,safSg=1,safApp=npm_2' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' Presence State
[tickets] [opensaf:tickets] #2321 Incorrect error messages "mkfifo already exists" observed in syslog
- **status**: unassigned --> review - **assigned_to**: Nagendra Kumar --- ** [tickets:#2321] Incorrect error messages "mkfifo already exists" observed in syslog** **Status:** review **Milestone:** 5.2.RC1 **Created:** Thu Feb 23, 2017 05:46 AM UTC by Ritu Raj **Last Updated:** Wed Mar 01, 2017 02:25 PM UTC **Owner:** Nagendra Kumar # Environment details OS : Suse 64bit Changeset : 8603( 5.2.MO-1) # Summary Incorrect error messages "mkfifo already exists" observed in syslog after perfoming opensaf stop and start operation. #Steps 1. Started the OpenSAF on single controller 2. Stop the OpenSAF and start agian, while starting OpnSAF again on same node following error message observed in syslog for component osafamfnd and osafamfwd: Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: mkfifo already exists: /var/lib/opensaf/osafamfnd.fifo File exists Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: Started Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: mkfifo already exists: /var/lib/opensaf/osafamfwd.fifo File exists Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: Started --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down
Hi Nagu, I can not reproduce this scenario. I think there's one place we could improve to avoid coredump. It's in avnd_comp_cmplete_all_csi_rec() ~~~ { ... /* generate csi-remove-done event... csi may be deleted */ (void)avnd_comp_csi_remove_done(cb, comp, curr); if (0 == m_AVND_COMPDB_REC_CSI_GET(*comp, curr->name.c_str())) { curr = (prv) ? m_AVND_CSI_REC_FROM_COMP_DLL_NODE_GET(m_NCS_DBLIST_FIND_NEXT(>comp_dll_node)) : m_AVND_CSI_REC_FROM_COMP_DLL_NODE_GET(m_NCS_DBLIST_FIND_FIRST(>csi_list)); } else ... } ~~~ The call avnd_comp_csi_remove_done() is a recursion and can lead to a deletion of csi. Maybe we can also check @curr is non-null pointer in the "if" block of m_AVND_COMPDB_REC_CSI_GET ? Thanks, Minh --- ** [tickets:#2213] AMFND: Coredump if suFailover while shutting down** **Status:** assigned **Milestone:** 5.2.RC1 **Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau **Last Updated:** Wed Mar 01, 2017 09:06 PM UTC **Owner:** Nagendra Kumar **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) (548.6 kB; application/x-compressed) Seen amfnd coredump in PL5 with bt as below while cluster is shutting down ~~~ Thread 1 (Thread 0x7f92a8925780 (LWP 411)): #0 __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358 No locals. #1 0x00449cc9 in avsv_dblist_sastring_cmp (key1=, key2=) at util.c:361 i = 0 str1 = str2 = #2 0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, key=0x656d6e6769737361 ) at ncsdlib.c:169 start_ptr = 0x1ee3168 #3 0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 <_avnd_cb>, comp=0x1ee8200) at comp.cc:2652 curr = 0x1ee8060 prv = 0x1ee3150 __FUNCTION__ = "avnd_comp_cmplete_all_csi_rec" #4 0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc .cc:3161 rc = #5 avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, prv_st=prv_st@entry= SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967 csi = 0x0 __FUNCTION__ = "avnd_comp_clc_st_chng_prc" ev = AVND_SU_PRES_FSM_EV_MAX is_en = rc = 1 #6 0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, ev= AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906 prv_st = final_st = rc = 1 __FUNCTION__ = "avnd_comp_clc_fsm_run" #7 0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, evt=0x7f9298c0) at clc.cc:414 __FUNCTION__ = "avnd_evt_clc_resp_evh" ev = clc_evt = 0x7f9298e0 comp = 0x1ee8200 rc = 1 #8 0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626 cb = 0x666940 <_avnd_cb> rc = 1 #9 avnd_main_process () at main.cc:577 ret = fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, revents = 0}, {fd = 14, events = 1, revents = 0}, {fd = 0, events = 0, revents = 0}} evt = 0x7f9298c0 __FUNCTION__ = "avnd_main_process" result = rc = #10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202 error = 0 1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory. ~~~ In syslog of PL5: 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5
[tickets] [opensaf:tickets] #2324 amfd: SG admin state is nor honored during node group unlock_instantiation
- **status**: assigned --> review --- ** [tickets:#2324] amfd: SG admin state is nor honored during node group unlock_instantiation** **Status:** review **Milestone:** 5.2.RC1 **Created:** Fri Feb 24, 2017 07:20 AM UTC by Tai Dinh **Last Updated:** Wed Mar 01, 2017 12:08 PM UTC **Owner:** Tai Dinh SG admin state is not honored during node group unlock_instantiation. This can always be reproduced: - Put the SG under locked_instantiation state. SU is now UNINSTANTIATED - Lock/lock_in the node group that the SG or its SUs hosted on - Unlock_in the node group, the expectation here is that SU should be kept as UNINSTANTIATED state but they're not. - AMF always try to instantiate the SUs without checking their SG state. if ((su->saAmfSUAdminState != SA_AMF_ADMIN_LOCKED_INSTANTIATION) && (su->su_on_node->saAmfNodeAdminState != SA_AMF_ADMIN_LOCKED_INSTANTIATION) && (su->saAmfSUOperState == SA_AMF_OPERATIONAL_ENABLED) && (su->saAmfSUPresenceState == SA_AMF_PRESENCE_UNINSTANTIATED) && (su->su_on_node->saAmfNodeOperState == SA_AMF_OPERATIONAL_ENABLED)) { if ((su->saAmfSUPreInstantiable == false) || (su->su_on_node->node_state != AVD_AVND_STATE_PRESENT)) continue; if (sg->saAmfSGNumPrefInserviceSUs > su_try_inst) { if (avd_snd_presence_msg(avd_cb, su, false) != NCSCC_RC_SUCCESS) { LOG_NO("Failed to send Instantiation of '%s'", su->name.c_str()); } else { su->su_on_node->su_cnt_admin_oper++; su_try_inst++; } } } --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down
Hi Nagu, We only have seen it once so far and didn't have trace, I try to reproduce it again on latest changeset to see if it happens. Thanks, Minh --- ** [tickets:#2213] AMFND: Coredump if suFailover while shutting down** **Status:** assigned **Milestone:** 5.2.RC1 **Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau **Last Updated:** Wed Mar 01, 2017 11:30 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) (548.6 kB; application/x-compressed) Seen amfnd coredump in PL5 with bt as below while cluster is shutting down ~~~ Thread 1 (Thread 0x7f92a8925780 (LWP 411)): #0 __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358 No locals. #1 0x00449cc9 in avsv_dblist_sastring_cmp (key1=, key2=) at util.c:361 i = 0 str1 = str2 = #2 0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, key=0x656d6e6769737361 ) at ncsdlib.c:169 start_ptr = 0x1ee3168 #3 0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 <_avnd_cb>, comp=0x1ee8200) at comp.cc:2652 curr = 0x1ee8060 prv = 0x1ee3150 __FUNCTION__ = "avnd_comp_cmplete_all_csi_rec" #4 0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc .cc:3161 rc = #5 avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, prv_st=prv_st@entry= SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967 csi = 0x0 __FUNCTION__ = "avnd_comp_clc_st_chng_prc" ev = AVND_SU_PRES_FSM_EV_MAX is_en = rc = 1 #6 0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, ev= AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906 prv_st = final_st = rc = 1 __FUNCTION__ = "avnd_comp_clc_fsm_run" #7 0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, evt=0x7f9298c0) at clc.cc:414 __FUNCTION__ = "avnd_evt_clc_resp_evh" ev = clc_evt = 0x7f9298e0 comp = 0x1ee8200 rc = 1 #8 0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626 cb = 0x666940 <_avnd_cb> rc = 1 #9 avnd_main_process () at main.cc:577 ret = fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, revents = 0}, {fd = 14, events = 1, revents = 0}, {fd = 0, events = 0, revents = 0}} evt = 0x7f9298c0 __FUNCTION__ = "avnd_main_process" result = rc = #10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202 error = 0 1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory. ~~~ In syslog of PL5: 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=4,safSg=1,safApp=npm_2' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=4,safSg=1,safApp=npm_2' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 amfclccli[729]: CLEANUP request 'safComp=A,safSu=4,safSg=1,safApp=npm_2' 2016-11-20 22:01:21 PL-5 amfclccli[728]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:01:21 PL-5 amfclccli[727]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=npm_1' 2016-11-20 22:02:12 PL-5 osafamfnd[411]: NO Removed 'safSi=2,safApp=nway_1' from 'safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:02:12 PL-5 osafimmnd[394]: NO
[tickets] [opensaf:tickets] #2321 Incorrect error messages "mkfifo already exists" observed in syslog
I suggest to change amfnd and amfwd to call daemon_exit instead of exit. --- ** [tickets:#2321] Incorrect error messages "mkfifo already exists" observed in syslog** **Status:** unassigned **Milestone:** 5.2.RC1 **Created:** Thu Feb 23, 2017 05:46 AM UTC by Ritu Raj **Last Updated:** Wed Mar 01, 2017 01:03 PM UTC **Owner:** nobody # Environment details OS : Suse 64bit Changeset : 8603( 5.2.MO-1) # Summary Incorrect error messages "mkfifo already exists" observed in syslog after perfoming opensaf stop and start operation. #Steps 1. Started the OpenSAF on single controller 2. Stop the OpenSAF and start agian, while starting OpnSAF again on same node following error message observed in syslog for component osafamfnd and osafamfwd: Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: mkfifo already exists: /var/lib/opensaf/osafamfnd.fifo File exists Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: Started Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: mkfifo already exists: /var/lib/opensaf/osafamfwd.fifo File exists Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: Started --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2321 Incorrect error messages "mkfifo already exists" observed in syslog
ok, I see, this is because amfnd and amfwd calls daemonize but don't call daemon_exit but exit(). Their respective fifo file are therefor not deleted. --- ** [tickets:#2321] Incorrect error messages "mkfifo already exists" observed in syslog** **Status:** unassigned **Milestone:** 5.2.RC1 **Created:** Thu Feb 23, 2017 05:46 AM UTC by Ritu Raj **Last Updated:** Wed Mar 01, 2017 11:40 AM UTC **Owner:** nobody # Environment details OS : Suse 64bit Changeset : 8603( 5.2.MO-1) # Summary Incorrect error messages "mkfifo already exists" observed in syslog after perfoming opensaf stop and start operation. #Steps 1. Started the OpenSAF on single controller 2. Stop the OpenSAF and start agian, while starting OpnSAF again on same node following error message observed in syslog for component osafamfnd and osafamfwd: Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: mkfifo already exists: /var/lib/opensaf/osafamfnd.fifo File exists Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: Started Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: mkfifo already exists: /var/lib/opensaf/osafamfwd.fifo File exists Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: Started --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2324 amfd: SG admin state is nor honored during node group unlock_instantiation
- **status**: unassigned --> assigned - **assigned_to**: Tai Dinh --- ** [tickets:#2324] amfd: SG admin state is nor honored during node group unlock_instantiation** **Status:** assigned **Milestone:** 5.2.RC1 **Created:** Fri Feb 24, 2017 07:20 AM UTC by Tai Dinh **Last Updated:** Fri Feb 24, 2017 11:07 AM UTC **Owner:** Tai Dinh SG admin state is not honored during node group unlock_instantiation. This can always be reproduced: - Put the SG under locked_instantiation state. SU is now UNINSTANTIATED - Lock/lock_in the node group that the SG or its SUs hosted on - Unlock_in the node group, the expectation here is that SU should be kept as UNINSTANTIATED state but they're not. - AMF always try to instantiate the SUs without checking their SG state. if ((su->saAmfSUAdminState != SA_AMF_ADMIN_LOCKED_INSTANTIATION) && (su->su_on_node->saAmfNodeAdminState != SA_AMF_ADMIN_LOCKED_INSTANTIATION) && (su->saAmfSUOperState == SA_AMF_OPERATIONAL_ENABLED) && (su->saAmfSUPresenceState == SA_AMF_PRESENCE_UNINSTANTIATED) && (su->su_on_node->saAmfNodeOperState == SA_AMF_OPERATIONAL_ENABLED)) { if ((su->saAmfSUPreInstantiable == false) || (su->su_on_node->node_state != AVD_AVND_STATE_PRESENT)) continue; if (sg->saAmfSGNumPrefInserviceSUs > su_try_inst) { if (avd_snd_presence_msg(avd_cb, su, false) != NCSCC_RC_SUCCESS) { LOG_NO("Failed to send Instantiation of '%s'", su->name.c_str()); } else { su->su_on_node->su_cnt_admin_oper++; su_try_inst++; } } } --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2321 Incorrect error messages "mkfifo already exists" observed in syslog
This mesages only observed when we start opensaf after perfoming "opensaf stop" operation. Sequence : Stop opensaf> Start opensaf (ERR Message will observed in syslog) while there is no impact of these syslog ERR on opensaf services, opensaf successfully started without any issues. --- ** [tickets:#2321] Incorrect error messages "mkfifo already exists" observed in syslog** **Status:** unassigned **Milestone:** 5.2.RC1 **Created:** Thu Feb 23, 2017 05:46 AM UTC by Ritu Raj **Last Updated:** Wed Mar 01, 2017 11:17 AM UTC **Owner:** nobody # Environment details OS : Suse 64bit Changeset : 8603( 5.2.MO-1) # Summary Incorrect error messages "mkfifo already exists" observed in syslog after perfoming opensaf stop and start operation. #Steps 1. Started the OpenSAF on single controller 2. Stop the OpenSAF and start agian, while starting OpnSAF again on same node following error message observed in syslog for component osafamfnd and osafamfwd: Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: mkfifo already exists: /var/lib/opensaf/osafamfnd.fifo File exists Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: Started Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: mkfifo already exists: /var/lib/opensaf/osafamfwd.fifo File exists Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: Started --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down
Hi Minh, I am not getting any clue about the faults. Can you please provide traces if possible (via email). Thanks -Nagu --- ** [tickets:#2213] AMFND: Coredump if suFailover while shutting down** **Status:** assigned **Milestone:** 5.2.RC1 **Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau **Last Updated:** Wed Mar 01, 2017 10:57 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) (548.6 kB; application/x-compressed) Seen amfnd coredump in PL5 with bt as below while cluster is shutting down ~~~ Thread 1 (Thread 0x7f92a8925780 (LWP 411)): #0 __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358 No locals. #1 0x00449cc9 in avsv_dblist_sastring_cmp (key1=, key2=) at util.c:361 i = 0 str1 = str2 = #2 0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, key=0x656d6e6769737361 ) at ncsdlib.c:169 start_ptr = 0x1ee3168 #3 0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 <_avnd_cb>, comp=0x1ee8200) at comp.cc:2652 curr = 0x1ee8060 prv = 0x1ee3150 __FUNCTION__ = "avnd_comp_cmplete_all_csi_rec" #4 0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc .cc:3161 rc = #5 avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, prv_st=prv_st@entry= SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967 csi = 0x0 __FUNCTION__ = "avnd_comp_clc_st_chng_prc" ev = AVND_SU_PRES_FSM_EV_MAX is_en = rc = 1 #6 0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, ev= AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906 prv_st = final_st = rc = 1 __FUNCTION__ = "avnd_comp_clc_fsm_run" #7 0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, evt=0x7f9298c0) at clc.cc:414 __FUNCTION__ = "avnd_evt_clc_resp_evh" ev = clc_evt = 0x7f9298e0 comp = 0x1ee8200 rc = 1 #8 0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626 cb = 0x666940 <_avnd_cb> rc = 1 #9 avnd_main_process () at main.cc:577 ret = fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, revents = 0}, {fd = 14, events = 1, revents = 0}, {fd = 0, events = 0, revents = 0}} evt = 0x7f9298c0 __FUNCTION__ = "avnd_main_process" result = rc = #10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202 error = 0 1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory. ~~~ In syslog of PL5: 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=4,safSg=1,safApp=npm_2' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=4,safSg=1,safApp=npm_2' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 amfclccli[729]: CLEANUP request 'safComp=A,safSu=4,safSg=1,safApp=npm_2' 2016-11-20 22:01:21 PL-5 amfclccli[728]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:01:21 PL-5 amfclccli[727]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=npm_1' 2016-11-20 22:02:12 PL-5 osafamfnd[411]: NO Removed 'safSi=2,safApp=nway_1' from 'safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:02:12 PL-5 osafimmnd[394]: NO Global discard node received for
[tickets] [opensaf:tickets] #2321 Incorrect error messages "mkfifo already exists" observed in syslog
perhaps this message can be changed to a trace message? The fifo file is deleted at graceful shutdown but if e.g. the node has crashed the fifo file is not removed and at the next node start this log message will be written. --- ** [tickets:#2321] Incorrect error messages "mkfifo already exists" observed in syslog** **Status:** unassigned **Milestone:** 5.2.RC1 **Created:** Thu Feb 23, 2017 05:46 AM UTC by Ritu Raj **Last Updated:** Wed Mar 01, 2017 10:59 AM UTC **Owner:** nobody # Environment details OS : Suse 64bit Changeset : 8603( 5.2.MO-1) # Summary Incorrect error messages "mkfifo already exists" observed in syslog after perfoming opensaf stop and start operation. #Steps 1. Started the OpenSAF on single controller 2. Stop the OpenSAF and start agian, while starting OpnSAF again on same node following error message observed in syslog for component osafamfnd and osafamfwd: Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: mkfifo already exists: /var/lib/opensaf/osafamfnd.fifo File exists Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: Started Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: mkfifo already exists: /var/lib/opensaf/osafamfwd.fifo File exists Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: Started --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2321 Incorrect error messages "mkfifo already exists" observed in syslog
Hi Hans N, are these logs required in syslog? --- ** [tickets:#2321] Incorrect error messages "mkfifo already exists" observed in syslog** **Status:** unassigned **Milestone:** 5.2.RC1 **Created:** Thu Feb 23, 2017 05:46 AM UTC by Ritu Raj **Last Updated:** Wed Mar 01, 2017 07:39 AM UTC **Owner:** nobody # Environment details OS : Suse 64bit Changeset : 8603( 5.2.MO-1) # Summary Incorrect error messages "mkfifo already exists" observed in syslog after perfoming opensaf stop and start operation. #Steps 1. Started the OpenSAF on single controller 2. Stop the OpenSAF and start agian, while starting OpnSAF again on same node following error message observed in syslog for component osafamfnd and osafamfwd: Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: mkfifo already exists: /var/lib/opensaf/osafamfnd.fifo File exists Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: Started Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: mkfifo already exists: /var/lib/opensaf/osafamfwd.fifo File exists Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: Started --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down
- **status**: unassigned --> assigned - **assigned_to**: Nagendra Kumar --- ** [tickets:#2213] AMFND: Coredump if suFailover while shutting down** **Status:** assigned **Milestone:** 5.2.RC1 **Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau **Last Updated:** Fri Dec 02, 2016 04:54 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) (548.6 kB; application/x-compressed) Seen amfnd coredump in PL5 with bt as below while cluster is shutting down ~~~ Thread 1 (Thread 0x7f92a8925780 (LWP 411)): #0 __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358 No locals. #1 0x00449cc9 in avsv_dblist_sastring_cmp (key1=, key2=) at util.c:361 i = 0 str1 = str2 = #2 0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, key=0x656d6e6769737361 ) at ncsdlib.c:169 start_ptr = 0x1ee3168 #3 0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 <_avnd_cb>, comp=0x1ee8200) at comp.cc:2652 curr = 0x1ee8060 prv = 0x1ee3150 __FUNCTION__ = "avnd_comp_cmplete_all_csi_rec" #4 0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc .cc:3161 rc = #5 avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, prv_st=prv_st@entry= SA_AMF_PRESENCE_RESTARTING, final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967 csi = 0x0 __FUNCTION__ = "avnd_comp_clc_st_chng_prc" ev = AVND_SU_PRES_FSM_EV_MAX is_en = rc = 1 #6 0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 <_avnd_cb>, comp=comp@entry=0x1ee8200, ev= AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906 prv_st = final_st = rc = 1 __FUNCTION__ = "avnd_comp_clc_fsm_run" #7 0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, evt=0x7f9298c0) at clc.cc:414 __FUNCTION__ = "avnd_evt_clc_resp_evh" ev = clc_evt = 0x7f9298e0 comp = 0x1ee8200 rc = 1 #8 0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626 cb = 0x666940 <_avnd_cb> rc = 1 #9 avnd_main_process () at main.cc:577 ret = fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, revents = 0}, {fd = 14, events = 1, revents = 0}, {fd = 0, events = 0, revents = 0}} evt = 0x7f9298c0 __FUNCTION__ = "avnd_main_process" result = rc = #10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202 error = 0 1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory. ~~~ In syslog of PL5: 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' component restart probation timer started (timeout: 600 ns) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 'safSu=4,safSg=1,safApp=npm_2' (comp restart count: 1) 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safComp=A,safSu=4,safSg=1,safApp=npm_2' faulted due to 'csiRemovecallbackTimeout' : Recovery is 'componentRestart' 2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' Presence State INSTANTIATED => RESTARTING 2016-11-20 22:01:21 PL-5 amfclccli[729]: CLEANUP request 'safComp=A,safSu=4,safSg=1,safApp=npm_2' 2016-11-20 22:01:21 PL-5 amfclccli[728]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:01:21 PL-5 amfclccli[727]: CLEANUP request 'safComp=A,safSu=3,safSg=1,safApp=npm_1' 2016-11-20 22:02:12 PL-5 osafamfnd[411]: NO Removed 'safSi=2,safApp=nway_1' from 'safSu=3,safSg=1,safApp=nway_1' 2016-11-20 22:02:12 PL-5 osafimmnd[394]: NO Global discard node received for nodeId:2040f pid:399 2016-11-20 22:02:12 PL-5
[tickets] [opensaf:tickets] #2106 amf: Admin Operations on middleware SUs / SIs should not be supported
- **status**: accepted --> review --- ** [tickets:#2106] amf: Admin Operations on middleware SUs / SIs should not be supported** **Status:** review **Milestone:** 5.2.RC1 **Created:** Sun Oct 09, 2016 11:18 AM UTC by Srikanth R **Last Updated:** Wed Mar 01, 2017 10:02 AM UTC **Owner:** Nagendra Kumar Changeset : 8190 5.1.GA -> Bring up a single controller SC-1 -> Now perform lock and unlock operation of middleware SU .i.e safSu=SC-2,safSg=NoRed,safApp=OpenSAF which is hosted on SC-2. -> Admin lock operation succeeds, but admin unlock operation times out with the assignment to one of middleware SI. Following is the opensafd status after the unlock operation. safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) Admin operations on middleware objects should not be supported. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2328 log: logtest 5 17 fail
- **status**: assigned --> review --- ** [tickets:#2328] log: logtest 5 17 fail** **Status:** review **Milestone:** 5.2.RC1 **Created:** Wed Mar 01, 2017 07:56 AM UTC by Canh Truong **Last Updated:** Wed Mar 01, 2017 07:56 AM UTC **Owner:** Canh Truong 2017-03-01 00:02:45,535 DEBUG - *** Test 17 (Suite 5): Add logRecordDestinationConfiguration. OK 2017-03-01 00:02:45,860 ERROR - *** FAILED in Test 17 (Suite 5): Add logRecordDestinationConfiguration. OK Command: 'logtest 5 17' rc: 255 - output: Suite 5: LOG OI tests, Service configuration object Check OpenSafLogConfig Fail ?[1;31m 17 FAILED (expected EXIT_SUCCESS, got EXIT_FAILURE (1)) Add logRecordDestinationConfiguration. OK?[0m; step to reproduce: logtest 18 1 logtest 5 17 Some test cases in test suite logtest 18 do not restore the system to beginning of test case after testing. This causes the checking in logtest 5 17 failed. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2106 amf: Admin Operations on middleware SUs / SIs should not be supported
- **status**: unassigned --> accepted - **assigned_to**: Nagendra Kumar - **Part**: - --> d - **Version**: --> 5.1 GA --- ** [tickets:#2106] amf: Admin Operations on middleware SUs / SIs should not be supported** **Status:** accepted **Milestone:** 5.2.RC1 **Created:** Sun Oct 09, 2016 11:18 AM UTC by Srikanth R **Last Updated:** Sun Oct 09, 2016 11:18 AM UTC **Owner:** Nagendra Kumar Changeset : 8190 5.1.GA -> Bring up a single controller SC-1 -> Now perform lock and unlock operation of middleware SU .i.e safSu=SC-2,safSg=NoRed,safApp=OpenSAF which is hosted on SC-2. -> Admin lock operation succeeds, but admin unlock operation times out with the assignment to one of middleware SI. Following is the opensafd status after the unlock operation. safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) Admin operations on middleware objects should not be supported. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster
- **status**: unassigned --> duplicate - **Comment**: Closing as duplicate of #2160 --- ** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster** **Status:** duplicate **Milestone:** 5.2.RC1 **Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava **Last Updated:** Wed Mar 01, 2017 06:46 AM UTC **Owner:** nobody OS : Ubuntu 64bit Changeset : 7997 ( 5.1.FC) Setup : 2-node cluster (both controllers) Remote fencing enabled Steps: 1. Bring up OpenSaf on two nodes 2. Enable STONITH 3. Stop opensaf on Standby Active controller triggers reboot of standby SC-1 Syslog Oct 5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, dest:565215202263055) Oct 5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for nodeId:2020f pid:3579 Oct 5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 2020f(down)> (@safAmfService2020f) Oct 5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster** Oct 5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f: Oct 5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE Oct 5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name = SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, SupervisionTime = 60 Oct 5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was stopped** Oct 5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link <1.1.1:eth0-1.1.2:eth0>, peer not responding Oct 5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link <1.1.1:eth0-1.1.2:eth0> on network plane A Oct 5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2> Oct 5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was started Oct 5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link <1.1.1:eth0-1.1.2:eth0> on network plane A Oct 5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, dest:565217457979407) Oct 5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY Controller at 2020f Oct 5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently coord) requests sync Oct 5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 epoch:0 Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4 Oct 5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling epoch:4 Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE Oct 5 13:01:43 SC-1 osafimmloadd: NO Sync starting Oct 5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 18430 Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel Oct 5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at node 2010f old epoch: 3 new epoch:4 Oct 5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at node 2020f old epoch: 0 new epoch:4 Oct 5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY Oct 5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1 Oct 5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 (MsgQueueService131599) <467, 2010f> Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. Marking it as doomed 16 <467, 2010f> (MsgQueueService131599) Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 2010f> (MsgQueueService131599) Oct 5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f Oct 5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK Oct 5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, dest:13) Oct 5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2 Oct 5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f with role STANDBY Oct 5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 0x2020f with role STANDBY --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2116 EDS faulted on new Active controller after being promoted from QUIESCED to ACTIVE
OS : Suse 64bit Changeset : 8634 ( 5.2.FC) Setup : 4 nodes ( 2 controllers and 2 payloads & PBE enabled) similar issue observed again while running switchover scenarios. Mar 1 14:50:58 TestBed-R2 osafsmfd[487]: NO Verify Timeout = 1000 Mar 1 14:50:58 TestBed-R2 osafsmfd[487]: NO smfKeepDuState = 0 Mar 1 14:50:50 TestBed-R2 osaflcknd[403]: ER GLND agent node not found: 2020f5754c046 Mar 1 14:50:49 TestBed-R2 osafntfimcnd[3041]: WA ntfimcn_imm_init saImmOiInitialize_2() returned SA_AIS_ERR_TIMEOUT (5) Mar 1 14:50:48 TestBed-R2 osafimmnd[32765]: NO Implementer connected: 23 (safLckService) <135, 2020f> Mar 1 14:50:57 TestBed-R2 osafevtd[388]: ER saImmOiImplementerSet failed with error: 5 Mar 1 14:50:57 TestBed-R2 osaflckd[421]: ER saImmOiImplementerSet FAILED, rc = 5 Mar 1 14:50:58 TestBed-R2 osafamfnd[349]: NO 'safComp=EDS,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Mar 1 14:50:58 TestBed-R2 osafamfnd[349]: ER safComp=EDS,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Mar 1 14:50:58 TestBed-R2 osafamfnd[349]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Mar 1 14:50:57 TestBed-R2 osafclmd[329]: ER saImmOiImplementerSet failed, rc = 5 Mar 1 14:50:58 TestBed-R2 osaflogd[309]: WA saImmOiClassImplementerSet returned SA_AIS_ERR_TIMEOUT (5) Mar 1 14:50:58 TestBed-R2 osafimmnd[32765]: NO Implementer connected: 24 (safEvtService) <123, 2020f> Mar 1 14:50:58 TestBed-R2 opensaf_reboot: Rebooting local node; timeout=60 Mar 1 14:50:59 TestBed-R2 osafntfimcnd[3041]: WA ntfimcn_imm_init saImmOiInitialize_2() returned SA_AIS_ERR_TIMEOUT (5) Mar 1 14:50:59 TestBed-R2 osafimmnd[32765]: WA MDS Send Failed Mar 1 14:50:59 TestBed-R2 osafimmnd[32765]: WA Failed to send response to agent/client over MDS --- ** [tickets:#2116] EDS faulted on new Active controller after being promoted from QUIESCED to ACTIVE** **Status:** unassigned **Milestone:** 5.0.2 **Created:** Thu Oct 13, 2016 09:49 AM UTC by Ritu Raj **Last Updated:** Thu Nov 24, 2016 05:26 AM UTC **Owner:** nobody **Attachments:** - [messages](https://sourceforge.net/p/opensaf/tickets/2116/attachment/messages) (2.9 MB; application/octet-stream) - [osafevtd](https://sourceforge.net/p/opensaf/tickets/2116/attachment/osafevtd) (102.4 kB; application/octet-stream) # Environment details OS : Suse 64bit Changeset : 8190 ( 5.1.GA) Setup : 3 nodes ( 3 controllers with headless feature enabled & PBE disabled) # Summary EDS faulted on new Active controller after being promoted from QUIESCED to ACTIVE # Steps followed & Observed behaviour 1. Initially started OpenSAF on 3 controller with HEADLESS feature enabled (SC-1 ACTIVE, SC-2 Standby, SC-3 QUIESCED) 2. Stop OpenSAF on both the controller(Active/Standby) simultaneously 3. QUIESCED controller become Active as clmna Starting to promote this node to a system controller Oct 13 14:29:05 SCALE_SLOT-73 osafclmna[3434]: NO Starting to promote this node to a system controller Oct 13 14:29:05 SCALE_SLOT-73 osafrded[3443]: NO Requesting ACTIVE role Oct 13 14:29:10 SCALE_SLOT-73 osafimmd[3462]: IN AMF HA ACTIVE request Oct 13 14:29:10 SCALE_SLOT-73 osaffmd[3452]: NO Stopped activation supervision due to new AMF state 1 Oct 13 14:29:10 SCALE_SLOT-73 osafamfd[3513]: NO Received node_up from 2030f: msg_id 1 Oct 13 14:29:10 SCALE_SLOT-73 osafamfd[3513]: NO Node 'SC-3' joined the cluster 3. After few second EDS faulted and node went for reboot Oct 13 14:30:11 SCALE_SLOT-73 osafamfnd[3523]: NO 'safComp=EDS,safSu=SC-3,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Oct 13 14:30:11 SCALE_SLOT-73 osafamfnd[3523]: ER safComp=EDS,safSu=SC-3,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Oct 13 14:30:11 SCALE_SLOT-73 osafamfnd[3523]: Rebooting OpenSAF NodeId = 131855 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131855, SupervisionTime = 60 ** Notes 1. Syslog attached 2. osafevtd trace attached --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2327 imm: deadlock between IMM and CLM when integrated with CLM
- **summary**: imm: deadlock between IMM when integrated with CLM --> imm: deadlock between IMM and CLM when integrated with CLM --- ** [tickets:#2327] imm: deadlock between IMM and CLM when integrated with CLM** **Status:** review **Milestone:** 5.2.RC1 **Created:** Wed Mar 01, 2017 06:22 AM UTC by Srikanth R **Last Updated:** Wed Mar 01, 2017 09:16 AM UTC **Owner:** Neelakanta Reddy **Attachments:** - [opensafStartup.tgz](https://sourceforge.net/p/opensaf/tickets/2327/attachment/opensafStartup.tgz) (1.4 MB; application/x-compressed-tar) Changeset: 8634 5.2.FC SLES single node TIPC setup. Issue : opensafd failed to startup on active controller for the first time. Below is the output from syslog Mar 6 01:27:19 SUSE-S1-C1 opensafd[11180]: NO Monitoring of CLMD started Mar 6 01:27:19 SUSE-S1-C1 osafclmna[11211]: NO safNode=SC-1,safCluster=myClmCluster Joined cluster, nodeid=2010f Mar 6 01:27:19 SUSE-S1-C1 osafamfd[11301]: Started Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: WA saClmInitialize_4 returned 5 Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER saImmOiInitialize failed 5 Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER avd_imm_init FAILED Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize_for_assignment FAILED 2 Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize failed, exiting Mar 6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Failed DESC:AMFD Mar 6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Going for recovery Below is the output from clmd. Mar 6 1:27:29.273608 osafclmd [11291:src/clm/clmd/clms_mds.c:1194] << clms_mds_svc_event Mar 6 1:27:29.273644 osafclmd [11291:src/mbc/mbcsv_mds.c:0420] << mbcsv_mds_evt: Msg is not from same vdest, discarding Mar 6 1:27:29.269263 osafclmd [11291:src/imm/agent/imma_oi_api.cc:2783] << rt_object_update_common Mar 6 1:27:29.273697 osafclmd [11291:src/clm/clmd/clms_imm.c:0842] IN saImmOiRtObjectUpdate failed for cluster object with rc = 5. Trying again Mar 6 1:27:29.273709 osafclmd [11291:src/clm/clmd/clms_imm.c:0871] << clms_cluster_update_rattr Traces of clmd,amfd,amfnd,immd and immnd along with mds.log and syslog are attached. This issue is random. Observed two times out of three times when started on lone active controller. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2327 imm: deadlock between IMM when integrated with CLM
- **summary**: Opensaf failed to start on active controller ( random) --> imm: deadlock between IMM when integrated with CLM - **status**: accepted --> review --- ** [tickets:#2327] imm: deadlock between IMM when integrated with CLM** **Status:** review **Milestone:** 5.2.RC1 **Created:** Wed Mar 01, 2017 06:22 AM UTC by Srikanth R **Last Updated:** Wed Mar 01, 2017 08:42 AM UTC **Owner:** Neelakanta Reddy **Attachments:** - [opensafStartup.tgz](https://sourceforge.net/p/opensaf/tickets/2327/attachment/opensafStartup.tgz) (1.4 MB; application/x-compressed-tar) Changeset: 8634 5.2.FC SLES single node TIPC setup. Issue : opensafd failed to startup on active controller for the first time. Below is the output from syslog Mar 6 01:27:19 SUSE-S1-C1 opensafd[11180]: NO Monitoring of CLMD started Mar 6 01:27:19 SUSE-S1-C1 osafclmna[11211]: NO safNode=SC-1,safCluster=myClmCluster Joined cluster, nodeid=2010f Mar 6 01:27:19 SUSE-S1-C1 osafamfd[11301]: Started Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: WA saClmInitialize_4 returned 5 Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER saImmOiInitialize failed 5 Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER avd_imm_init FAILED Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize_for_assignment FAILED 2 Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize failed, exiting Mar 6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Failed DESC:AMFD Mar 6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Going for recovery Below is the output from clmd. Mar 6 1:27:29.273608 osafclmd [11291:src/clm/clmd/clms_mds.c:1194] << clms_mds_svc_event Mar 6 1:27:29.273644 osafclmd [11291:src/mbc/mbcsv_mds.c:0420] << mbcsv_mds_evt: Msg is not from same vdest, discarding Mar 6 1:27:29.269263 osafclmd [11291:src/imm/agent/imma_oi_api.cc:2783] << rt_object_update_common Mar 6 1:27:29.273697 osafclmd [11291:src/clm/clmd/clms_imm.c:0842] IN saImmOiRtObjectUpdate failed for cluster object with rc = 5. Trying again Mar 6 1:27:29.273709 osafclmd [11291:src/clm/clmd/clms_imm.c:0871] << clms_cluster_update_rattr Traces of clmd,amfd,amfnd,immd and immnd along with mds.log and syslog are attached. This issue is random. Observed two times out of three times when started on lone active controller. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2323 imm: CCB operations fail after SC absence (Headless)
- **summary**: imm: CCB operations fail after SC absence --> imm: CCB operations fail after SC absence (Headless) - **Comment**: Added "headless" clarification because "AC absence" can be missunderstood as just one (out of normally two) SCs being absent. --- ** [tickets:#2323] imm: CCB operations fail after SC absence (Headless)** **Status:** review **Milestone:** 5.0.2 **Created:** Thu Feb 23, 2017 03:36 PM UTC by Hung Nguyen **Last Updated:** Wed Mar 01, 2017 07:04 AM UTC **Owner:** Hung Nguyen **Attachments:** - [logs_n_traces.tgz](https://sourceforge.net/p/opensaf/tickets/2323/attachment/logs_n_traces.tgz) (658.6 kB; application/gzip) Reproduce steps: ~~~ 1. Start SC-1 2. Commit some CCBs # immcfg -c Test test=0 # immcfg -c Test test=1 # immcfg -c Test test=2 # immcfg -c Test test=3 3. Start PL-3 4. Restart SC-1 5. When SC-1 is back, it fails to add operations to CCB # immcfg -c Test test=10 error - saImmOmCcbObjectCreate_2 FAILED with SA_AIS_ERR_FAILED_OPERATION (21) OI reports: IMM: Resource abort: CCB is not in an expected state error - saImmOmCcbApply FAILED: SA_AIS_ERR_FAILED_OPERATION (21) OI reports: IMM: Resource abort: CCB is not in an expected state ~~~ **cb->mLatestCcbId** was not updated on PL-3 when it joined the cluster so it still had value of zero. When SC-1 was back from headless, IMMND on PL-3 sent re-introduce message to IMMD on SC-1 with **cb->mLatestCcbId = 0**. IMMD failed to update **cb->ccb_id_count** so when new CCB is created, it will start from **0+1** instead of **mLatestCcbId + 1**. That results in the conflict with the CCB in **sCcbVector** and the CCB operation failure. Attached is logs and traces. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2327 Opensaf failed to start on active controller ( random)
The case is similar to #1731, when NTF is integrated with CLM --- ** [tickets:#2327] Opensaf failed to start on active controller ( random)** **Status:** accepted **Milestone:** 5.2.RC1 **Created:** Wed Mar 01, 2017 06:22 AM UTC by Srikanth R **Last Updated:** Wed Mar 01, 2017 07:28 AM UTC **Owner:** Neelakanta Reddy **Attachments:** - [opensafStartup.tgz](https://sourceforge.net/p/opensaf/tickets/2327/attachment/opensafStartup.tgz) (1.4 MB; application/x-compressed-tar) Changeset: 8634 5.2.FC SLES single node TIPC setup. Issue : opensafd failed to startup on active controller for the first time. Below is the output from syslog Mar 6 01:27:19 SUSE-S1-C1 opensafd[11180]: NO Monitoring of CLMD started Mar 6 01:27:19 SUSE-S1-C1 osafclmna[11211]: NO safNode=SC-1,safCluster=myClmCluster Joined cluster, nodeid=2010f Mar 6 01:27:19 SUSE-S1-C1 osafamfd[11301]: Started Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: WA saClmInitialize_4 returned 5 Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER saImmOiInitialize failed 5 Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER avd_imm_init FAILED Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize_for_assignment FAILED 2 Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize failed, exiting Mar 6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Failed DESC:AMFD Mar 6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Going for recovery Below is the output from clmd. Mar 6 1:27:29.273608 osafclmd [11291:src/clm/clmd/clms_mds.c:1194] << clms_mds_svc_event Mar 6 1:27:29.273644 osafclmd [11291:src/mbc/mbcsv_mds.c:0420] << mbcsv_mds_evt: Msg is not from same vdest, discarding Mar 6 1:27:29.269263 osafclmd [11291:src/imm/agent/imma_oi_api.cc:2783] << rt_object_update_common Mar 6 1:27:29.273697 osafclmd [11291:src/clm/clmd/clms_imm.c:0842] IN saImmOiRtObjectUpdate failed for cluster object with rc = 5. Trying again Mar 6 1:27:29.273709 osafclmd [11291:src/clm/clmd/clms_imm.c:0871] << clms_cluster_update_rattr Traces of clmd,amfd,amfnd,immd and immnd along with mds.log and syslog are attached. This issue is random. Observed two times out of three times when started on lone active controller. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2289 opensafd (nid): coredump while standby starting
I am not observing this any more on my setup. --- ** [tickets:#2289] opensafd (nid): coredump while standby starting** **Status:** unassigned **Milestone:** 5.2.RC1 **Created:** Tue Feb 07, 2017 06:31 AM UTC by A V Mahesh (AVM) **Last Updated:** Wed Mar 01, 2017 07:34 AM UTC **Owner:** nobody Restart Standby with TCP , opensafd core dumping (gdb) bt /#0 0x7f2f05cb0b55 in raise () from /lib64/libc.so.6 /#1 0x7f2f05cb2131 in abort () from /lib64/libc.so.6 /#2 0x7f2f06704955 in __gnu_cxx::__verbose_terminate_handler() () at ../../../../gcc-4.8.3/libstdc++-v3/libsupc++/vterminate.cc:95 /#3 0x7f2f06702af6 in __cxxabiv1::__terminate(void (*)()) () at ../../../../gcc-4.8.3/libstdc++-v3/libsupc++/eh_terminate.cc:38 /#4 0x7f2f06702b23 in std::terminate() () at ../../../../gcc-4.8.3/libstdc++-v3/libsupc++/eh_terminate.cc:48 /#5 0x7f2f06702d42 in __cxa_throw () at ../../../../gcc-4.8.3/libstdc++-v3/libsupc++/eh_throw.cc:87 /#6 0x7f2f0670322d in operator new(unsigned long) () at ../../../../gcc-4.8.3/libstdc++-v3/libsupc++/new_op.cc:56 /#7 0x7f2f06761979 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) () at /home/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:104 #8 0x7f2f0676256b in std::string::_Rep::_M_clone(std::allocator const&, unsigned long) () at /home/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:629 #9 0x7f2f06762bec in std::basic_string::basic_string(std::string const&) () at /home/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.h:229 #10 0x7f2f07262c39 in handle_data_request(pollfd*, std::string const&) () at /usr/include/c++/4.8.3/bits/basic_string.h:2405 #11 0x7f2f0726320f in svc_monitor_thread(void*) () at src/nid/nodeinit.cc:1539 #12 0x7f2f05ff97b6 in start_thread () from /lib64/libpthread.so.0 #13 0x7f2f05d559cd in clone () from /lib64/libc.so.6 #14 0x in ?? () (gdb) q Feb 7 11:41:13 SC-2 opensafd: OpenSAF services successfully stopped Feb 7 11:41:21 SC-2 opensafd: Starting OpenSAF Services(5.1.M0 - ) (Using TCP) Feb 7 11:41:21 SC-2 osafdtmd[5329]: mkfifo already exists: /var/lib/opensaf/osafdtmd.fifo File exists Feb 7 11:41:21 SC-2 osafdtmd[5329]: Started Feb 7 11:41:21 SC-2 osaftransportd[5336]: Started Feb 7 11:41:21 SC-2 osafclmna[5343]: Started Feb 7 11:41:21 SC-2 osafrded[5352]: Started Feb 7 11:41:22 SC-2 osaffmd[5361]: Started Feb 7 11:41:22 SC-2 osaffmd[5361]: NO Remote fencing is disabled Feb 7 11:41:22 SC-2 osafimmd[5371]: Started Feb 7 11:41:22 SC-2 osafimmd[5371]: NO *** SC_ABSENCE_ALLOWED (Headless Hydra) is configured: 900 *** Feb 7 11:41:22 SC-2 osafimmnd[5382]: Started Feb 7 11:41:22 SC-2 osafimmnd[5382]: NO Persistent Back-End capability configured, Pbe file:imm.db (suffix may get added) Feb 7 11:41:22 SC-2 opensafd[5318]: NO Monitoring of TRANSPORT started Feb 7 11:41:22 SC-2 osafclmna[5343]: NO Starting to promote this node to a system controller Feb 7 11:41:22 SC-2 osafrded[5352]: NO Requesting ACTIVE role Feb 7 11:41:22 SC-2 osafrded[5352]: NO RDE role set to Undefined Feb 7 11:41:22 SC-2 osafdtmd[5329]: NO Established contact with 'PL-3' Feb 7 11:41:22 SC-2 osafdtmd[5329]: NO Established contact with 'SC-1' Feb 7 11:41:22 SC-2 osafdtmd[5329]: NO Established contact with 'PL-4' Feb 7 11:41:22 SC-2 osafrded[5352]: NO Peer up on node 0x2010f Feb 7 11:41:22 SC-2 osafimmnd[5382]: NO IMMD service is UP ... ScAbsenseAllowed?:0 introduced?:0 Feb 7 11:41:22 SC-2 osafrded[5352]: NO Got peer info request from node 0x2010f with role ACTIVE Feb 7 11:41:22 SC-2 osafrded[5352]: NO Got peer info response from node 0x2010f with role ACTIVE Feb 7 11:41:22 SC-2 osafrded[5352]: NO RDE role set to QUIESCED Feb 7 11:41:22 SC-2 osafrded[5352]: NO Giving up election against 0x2010f with role ACTIVE. My role is now QUIESCED Feb 7 11:41:22 SC-2 osafclmna[5343]: NO safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f Feb 7 11:41:22 SC-2 osafimmnd[5382]: NO Fevs count adjusted to 2835 preLoadPid: 0 Feb 7 11:41:22 SC-2 osafimmnd[5382]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Feb 7 11:41:22 SC-2 osafimmnd[5382]: NO SERVER STATE: IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Feb 7 11:41:22 SC-2 osafimmnd[5382]: NO SERVER STATE: IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Feb 7 11:41:22 SC-2 osafimmnd[5382]: NO NODE STATE-> IMM_NODE_ISOLATED Feb 7 11:41:23 SC-2 osafimmnd[5382]: NO NODE STATE-> IMM_NODE_W_AVAILABLE Feb 7 11:41:23 SC-2 osafimmnd[5382]: NO SERVER STATE: IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT Feb 7 11:41:23 SC-2 osafimmnd[5382]: NO NODE STATE->