[tickets] [opensaf:tickets] #2321 Incorrect error messages "mkfifo already exists" observed in syslog

2017-03-01 Thread Nagendra Kumar
Thanks. I have sent patch for review. Please review it.



---

** [tickets:#2321] Incorrect error messages "mkfifo already exists" observed in 
syslog**

**Status:** review
**Milestone:** 5.2.RC1
**Created:** Thu Feb 23, 2017 05:46 AM UTC by Ritu Raj
**Last Updated:** Thu Mar 02, 2017 07:00 AM UTC
**Owner:** Nagendra Kumar


# Environment details
OS : Suse 64bit
Changeset :  8603( 5.2.MO-1)

# Summary
Incorrect error messages "mkfifo already exists" observed in syslog after 
perfoming opensaf stop and start operation.

#Steps
1. Started the OpenSAF on single controller
2. Stop the OpenSAF and start agian, while starting OpnSAF again on same node 
following error message observed in syslog for component osafamfnd and 
osafamfwd:

Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: mkfifo already exists: 
/var/lib/opensaf/osafamfnd.fifo File exists
Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: Started

Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: mkfifo already exists: 
/var/lib/opensaf/osafamfwd.fifo File exists
Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: Started





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down

2017-03-01 Thread Nagendra Kumar
Will this work? :
diff --git a/src/amf/amfnd/comp.cc b/src/amf/amfnd/comp.cc
--- a/src/amf/amfnd/comp.cc
+++ b/src/amf/amfnd/comp.cc
@@ -2650,6 +2650,9 @@ void avnd_comp_cmplete_all_csi_rec(AVND_
/* generate csi-remove-done event... csi may be 
deleted */
(void)avnd_comp_csi_remove_done(cb, comp, curr);

+   if (curr == nullptr)
+   break;
+
if (0 == m_AVND_COMPDB_REC_CSI_GET(*comp, 
curr->name.c_str())) {
curr =
(prv) ?



---

** [tickets:#2213] AMFND: Coredump if suFailover while shutting down**

**Status:** assigned
**Milestone:** 5.2.RC1
**Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau
**Last Updated:** Thu Mar 02, 2017 06:09 AM UTC
**Owner:** Nagendra Kumar
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) 
(548.6 kB; application/x-compressed)


Seen amfnd coredump in PL5 with bt as below while cluster is shutting down
~~~
Thread 1 (Thread 0x7f92a8925780 (LWP 411)):
#0  __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358
No locals.
#1  0x00449cc9 in avsv_dblist_sastring_cmp (key1=, 
key2=) at util.c:361
i = 0
str1 = 
str2 = 
#2  0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, 
key=0x656d6e6769737361 ) at ncsdlib.c:169
start_ptr = 0x1ee3168
#3  0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 
<_avnd_cb>, comp=0x1ee8200) at comp.cc:2652
curr = 0x1ee8060
prv = 0x1ee3150
__FUNCTION__ = "avnd_comp_cmplete_all_csi_rec"
#4  0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, 
su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc
.cc:3161
rc = 
#5  avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, 
comp=comp@entry=0x1ee8200, prv_st=prv_st@entry=
SA_AMF_PRESENCE_RESTARTING, 
final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967
csi = 0x0
__FUNCTION__ = "avnd_comp_clc_st_chng_prc"
ev = AVND_SU_PRES_FSM_EV_MAX
is_en = 
rc = 1
#6  0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 
<_avnd_cb>, comp=comp@entry=0x1ee8200, ev=
AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906
prv_st = 
final_st = 
rc = 1
__FUNCTION__ = "avnd_comp_clc_fsm_run"
#7  0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, 
evt=0x7f9298c0) at clc.cc:414
__FUNCTION__ = "avnd_evt_clc_resp_evh"
ev = 
clc_evt = 0x7f9298e0
comp = 0x1ee8200
rc = 1
#8  0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626
cb = 0x666940 <_avnd_cb>
rc = 1
#9  avnd_main_process () at main.cc:577
ret = 
fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, 
revents = 0}, {fd = 14, events = 1, revents = 
0}, {fd = 0, events = 0, revents = 0}}
evt = 0x7f9298c0
__FUNCTION__ = "avnd_main_process"
result = 
rc = 
#10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202
error = 0
1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory.
~~~
In syslog of PL5:

2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' 
component restart probation timer started (timeout: 600 ns)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 
'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 
'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 
'csiRemovecallbackTimeout' : Recovery is 'componentRestart'
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' 
Presence State INSTANTIATED => RESTARTING
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' 
component restart probation timer started (timeout: 600 ns)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 
'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 
'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 
'csiRemovecallbackTimeout' : Recovery is 'componentRestart'
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' 
Presence State INSTANTIATED => RESTARTING
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' 
component restart probation timer started (timeout: 600 ns)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 
'safSu=4,safSg=1,safApp=npm_2' (comp restart count: 1)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 
'safComp=A,safSu=4,safSg=1,safApp=npm_2' faulted due to 
'csiRemovecallbackTimeout' : Recovery is 'componentRestart'
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' 
Presence State 

[tickets] [opensaf:tickets] #2321 Incorrect error messages "mkfifo already exists" observed in syslog

2017-03-01 Thread Nagendra Kumar
- **status**: unassigned --> review
- **assigned_to**: Nagendra Kumar



---

** [tickets:#2321] Incorrect error messages "mkfifo already exists" observed in 
syslog**

**Status:** review
**Milestone:** 5.2.RC1
**Created:** Thu Feb 23, 2017 05:46 AM UTC by Ritu Raj
**Last Updated:** Wed Mar 01, 2017 02:25 PM UTC
**Owner:** Nagendra Kumar


# Environment details
OS : Suse 64bit
Changeset :  8603( 5.2.MO-1)

# Summary
Incorrect error messages "mkfifo already exists" observed in syslog after 
perfoming opensaf stop and start operation.

#Steps
1. Started the OpenSAF on single controller
2. Stop the OpenSAF and start agian, while starting OpnSAF again on same node 
following error message observed in syslog for component osafamfnd and 
osafamfwd:

Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: mkfifo already exists: 
/var/lib/opensaf/osafamfnd.fifo File exists
Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: Started

Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: mkfifo already exists: 
/var/lib/opensaf/osafamfwd.fifo File exists
Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: Started





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down

2017-03-01 Thread Minh Hon Chau
Hi Nagu,

I can not reproduce this scenario. I think there's one place we could improve 
to avoid coredump.
It's in avnd_comp_cmplete_all_csi_rec() 
~~~
{
...
/* generate csi-remove-done event... csi may be 
deleted */
(void)avnd_comp_csi_remove_done(cb, comp, curr);

if (0 == m_AVND_COMPDB_REC_CSI_GET(*comp, 
curr->name.c_str())) {
curr =
(prv) ?

m_AVND_CSI_REC_FROM_COMP_DLL_NODE_GET(m_NCS_DBLIST_FIND_NEXT(>comp_dll_node))
 :

m_AVND_CSI_REC_FROM_COMP_DLL_NODE_GET(m_NCS_DBLIST_FIND_FIRST(>csi_list));
} else
   
...
}
~~~
The call avnd_comp_csi_remove_done() is a recursion and can lead to a deletion 
of csi. Maybe we can also check @curr is non-null pointer in the "if" block of 
m_AVND_COMPDB_REC_CSI_GET ?

Thanks,
Minh


---

** [tickets:#2213] AMFND: Coredump if suFailover while shutting down**

**Status:** assigned
**Milestone:** 5.2.RC1
**Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau
**Last Updated:** Wed Mar 01, 2017 09:06 PM UTC
**Owner:** Nagendra Kumar
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) 
(548.6 kB; application/x-compressed)


Seen amfnd coredump in PL5 with bt as below while cluster is shutting down
~~~
Thread 1 (Thread 0x7f92a8925780 (LWP 411)):
#0  __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358
No locals.
#1  0x00449cc9 in avsv_dblist_sastring_cmp (key1=, 
key2=) at util.c:361
i = 0
str1 = 
str2 = 
#2  0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, 
key=0x656d6e6769737361 ) at ncsdlib.c:169
start_ptr = 0x1ee3168
#3  0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 
<_avnd_cb>, comp=0x1ee8200) at comp.cc:2652
curr = 0x1ee8060
prv = 0x1ee3150
__FUNCTION__ = "avnd_comp_cmplete_all_csi_rec"
#4  0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, 
su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc
.cc:3161
rc = 
#5  avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, 
comp=comp@entry=0x1ee8200, prv_st=prv_st@entry=
SA_AMF_PRESENCE_RESTARTING, 
final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967
csi = 0x0
__FUNCTION__ = "avnd_comp_clc_st_chng_prc"
ev = AVND_SU_PRES_FSM_EV_MAX
is_en = 
rc = 1
#6  0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 
<_avnd_cb>, comp=comp@entry=0x1ee8200, ev=
AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906
prv_st = 
final_st = 
rc = 1
__FUNCTION__ = "avnd_comp_clc_fsm_run"
#7  0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, 
evt=0x7f9298c0) at clc.cc:414
__FUNCTION__ = "avnd_evt_clc_resp_evh"
ev = 
clc_evt = 0x7f9298e0
comp = 0x1ee8200
rc = 1
#8  0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626
cb = 0x666940 <_avnd_cb>
rc = 1
#9  avnd_main_process () at main.cc:577
ret = 
fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, 
revents = 0}, {fd = 14, events = 1, revents = 
0}, {fd = 0, events = 0, revents = 0}}
evt = 0x7f9298c0
__FUNCTION__ = "avnd_main_process"
result = 
rc = 
#10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202
error = 0
1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory.
~~~
In syslog of PL5:

2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' 
component restart probation timer started (timeout: 600 ns)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 
'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 
'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 
'csiRemovecallbackTimeout' : Recovery is 'componentRestart'
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' 
Presence State INSTANTIATED => RESTARTING
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' 
component restart probation timer started (timeout: 600 ns)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 
'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 
'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 
'csiRemovecallbackTimeout' : Recovery is 'componentRestart'
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' 
Presence State INSTANTIATED => RESTARTING
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' 
component restart probation timer started (timeout: 600 ns)
2016-11-20 22:01:21 PL-5 

[tickets] [opensaf:tickets] #2324 amfd: SG admin state is nor honored during node group unlock_instantiation

2017-03-01 Thread Tai Dinh
- **status**: assigned --> review



---

** [tickets:#2324] amfd: SG admin state is nor honored during node group 
unlock_instantiation**

**Status:** review
**Milestone:** 5.2.RC1
**Created:** Fri Feb 24, 2017 07:20 AM UTC by Tai Dinh
**Last Updated:** Wed Mar 01, 2017 12:08 PM UTC
**Owner:** Tai Dinh


SG admin state is not honored during node group unlock_instantiation.

This can always be reproduced:
- Put the SG under locked_instantiation state. SU is now UNINSTANTIATED
- Lock/lock_in the node group that the SG or its SUs hosted on
- Unlock_in the node group, the expectation here is that SU should be kept as 
UNINSTANTIATED state but they're not.
- AMF always try to instantiate the SUs without checking their SG state.

if ((su->saAmfSUAdminState != SA_AMF_ADMIN_LOCKED_INSTANTIATION) &&
(su->su_on_node->saAmfNodeAdminState != 
SA_AMF_ADMIN_LOCKED_INSTANTIATION) &&
(su->saAmfSUOperState == SA_AMF_OPERATIONAL_ENABLED) &&
(su->saAmfSUPresenceState == 
SA_AMF_PRESENCE_UNINSTANTIATED) &&
(su->su_on_node->saAmfNodeOperState == 
SA_AMF_OPERATIONAL_ENABLED)) {
if ((su->saAmfSUPreInstantiable == false) ||
(su->su_on_node->node_state != AVD_AVND_STATE_PRESENT))
continue;

if (sg->saAmfSGNumPrefInserviceSUs > su_try_inst) {
if (avd_snd_presence_msg(avd_cb, su, false) != 
NCSCC_RC_SUCCESS) {
LOG_NO("Failed to send Instantiation of '%s'", 
su->name.c_str());
} else {
su->su_on_node->su_cnt_admin_oper++;
su_try_inst++;
}
}
}



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down

2017-03-01 Thread Minh Hon Chau
Hi Nagu,

We only have seen it once so far and didn't have trace, I try to reproduce it 
again on latest changeset to see if it happens.

Thanks,
Minh


---

** [tickets:#2213] AMFND: Coredump if suFailover while shutting down**

**Status:** assigned
**Milestone:** 5.2.RC1
**Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau
**Last Updated:** Wed Mar 01, 2017 11:30 AM UTC
**Owner:** Nagendra Kumar
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) 
(548.6 kB; application/x-compressed)


Seen amfnd coredump in PL5 with bt as below while cluster is shutting down
~~~
Thread 1 (Thread 0x7f92a8925780 (LWP 411)):
#0  __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358
No locals.
#1  0x00449cc9 in avsv_dblist_sastring_cmp (key1=, 
key2=) at util.c:361
i = 0
str1 = 
str2 = 
#2  0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, 
key=0x656d6e6769737361 ) at ncsdlib.c:169
start_ptr = 0x1ee3168
#3  0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 
<_avnd_cb>, comp=0x1ee8200) at comp.cc:2652
curr = 0x1ee8060
prv = 0x1ee3150
__FUNCTION__ = "avnd_comp_cmplete_all_csi_rec"
#4  0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, 
su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc
.cc:3161
rc = 
#5  avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, 
comp=comp@entry=0x1ee8200, prv_st=prv_st@entry=
SA_AMF_PRESENCE_RESTARTING, 
final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967
csi = 0x0
__FUNCTION__ = "avnd_comp_clc_st_chng_prc"
ev = AVND_SU_PRES_FSM_EV_MAX
is_en = 
rc = 1
#6  0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 
<_avnd_cb>, comp=comp@entry=0x1ee8200, ev=
AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906
prv_st = 
final_st = 
rc = 1
__FUNCTION__ = "avnd_comp_clc_fsm_run"
#7  0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, 
evt=0x7f9298c0) at clc.cc:414
__FUNCTION__ = "avnd_evt_clc_resp_evh"
ev = 
clc_evt = 0x7f9298e0
comp = 0x1ee8200
rc = 1
#8  0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626
cb = 0x666940 <_avnd_cb>
rc = 1
#9  avnd_main_process () at main.cc:577
ret = 
fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, 
revents = 0}, {fd = 14, events = 1, revents = 
0}, {fd = 0, events = 0, revents = 0}}
evt = 0x7f9298c0
__FUNCTION__ = "avnd_main_process"
result = 
rc = 
#10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202
error = 0
1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory.
~~~
In syslog of PL5:

2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' 
component restart probation timer started (timeout: 600 ns)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 
'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 
'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 
'csiRemovecallbackTimeout' : Recovery is 'componentRestart'
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' 
Presence State INSTANTIATED => RESTARTING
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' 
component restart probation timer started (timeout: 600 ns)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 
'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 
'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 
'csiRemovecallbackTimeout' : Recovery is 'componentRestart'
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' 
Presence State INSTANTIATED => RESTARTING
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' 
component restart probation timer started (timeout: 600 ns)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 
'safSu=4,safSg=1,safApp=npm_2' (comp restart count: 1)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 
'safComp=A,safSu=4,safSg=1,safApp=npm_2' faulted due to 
'csiRemovecallbackTimeout' : Recovery is 'componentRestart'
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' 
Presence State INSTANTIATED => RESTARTING
2016-11-20 22:01:21 PL-5 amfclccli[729]: CLEANUP request 
'safComp=A,safSu=4,safSg=1,safApp=npm_2'
2016-11-20 22:01:21 PL-5 amfclccli[728]: CLEANUP request 
'safComp=A,safSu=3,safSg=1,safApp=nway_1'
2016-11-20 22:01:21 PL-5 amfclccli[727]: CLEANUP request 
'safComp=A,safSu=3,safSg=1,safApp=npm_1'
2016-11-20 22:02:12 PL-5 osafamfnd[411]: NO Removed 'safSi=2,safApp=nway_1' 
from 'safSu=3,safSg=1,safApp=nway_1'
2016-11-20 22:02:12 PL-5 osafimmnd[394]: NO 

[tickets] [opensaf:tickets] #2321 Incorrect error messages "mkfifo already exists" observed in syslog

2017-03-01 Thread Hans Nordebäck
I suggest to change amfnd and amfwd to call daemon_exit instead of exit.


---

** [tickets:#2321] Incorrect error messages "mkfifo already exists" observed in 
syslog**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Thu Feb 23, 2017 05:46 AM UTC by Ritu Raj
**Last Updated:** Wed Mar 01, 2017 01:03 PM UTC
**Owner:** nobody


# Environment details
OS : Suse 64bit
Changeset :  8603( 5.2.MO-1)

# Summary
Incorrect error messages "mkfifo already exists" observed in syslog after 
perfoming opensaf stop and start operation.

#Steps
1. Started the OpenSAF on single controller
2. Stop the OpenSAF and start agian, while starting OpnSAF again on same node 
following error message observed in syslog for component osafamfnd and 
osafamfwd:

Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: mkfifo already exists: 
/var/lib/opensaf/osafamfnd.fifo File exists
Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: Started

Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: mkfifo already exists: 
/var/lib/opensaf/osafamfwd.fifo File exists
Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: Started





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2321 Incorrect error messages "mkfifo already exists" observed in syslog

2017-03-01 Thread Hans Nordebäck
ok, I see, this is because amfnd and amfwd calls daemonize but don't call 
daemon_exit but exit(). Their respective fifo file are therefor not deleted.


---

** [tickets:#2321] Incorrect error messages "mkfifo already exists" observed in 
syslog**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Thu Feb 23, 2017 05:46 AM UTC by Ritu Raj
**Last Updated:** Wed Mar 01, 2017 11:40 AM UTC
**Owner:** nobody


# Environment details
OS : Suse 64bit
Changeset :  8603( 5.2.MO-1)

# Summary
Incorrect error messages "mkfifo already exists" observed in syslog after 
perfoming opensaf stop and start operation.

#Steps
1. Started the OpenSAF on single controller
2. Stop the OpenSAF and start agian, while starting OpnSAF again on same node 
following error message observed in syslog for component osafamfnd and 
osafamfwd:

Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: mkfifo already exists: 
/var/lib/opensaf/osafamfnd.fifo File exists
Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: Started

Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: mkfifo already exists: 
/var/lib/opensaf/osafamfwd.fifo File exists
Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: Started





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2324 amfd: SG admin state is nor honored during node group unlock_instantiation

2017-03-01 Thread Tai Dinh
- **status**: unassigned --> assigned
- **assigned_to**: Tai Dinh



---

** [tickets:#2324] amfd: SG admin state is nor honored during node group 
unlock_instantiation**

**Status:** assigned
**Milestone:** 5.2.RC1
**Created:** Fri Feb 24, 2017 07:20 AM UTC by Tai Dinh
**Last Updated:** Fri Feb 24, 2017 11:07 AM UTC
**Owner:** Tai Dinh


SG admin state is not honored during node group unlock_instantiation.

This can always be reproduced:
- Put the SG under locked_instantiation state. SU is now UNINSTANTIATED
- Lock/lock_in the node group that the SG or its SUs hosted on
- Unlock_in the node group, the expectation here is that SU should be kept as 
UNINSTANTIATED state but they're not.
- AMF always try to instantiate the SUs without checking their SG state.

if ((su->saAmfSUAdminState != SA_AMF_ADMIN_LOCKED_INSTANTIATION) &&
(su->su_on_node->saAmfNodeAdminState != 
SA_AMF_ADMIN_LOCKED_INSTANTIATION) &&
(su->saAmfSUOperState == SA_AMF_OPERATIONAL_ENABLED) &&
(su->saAmfSUPresenceState == 
SA_AMF_PRESENCE_UNINSTANTIATED) &&
(su->su_on_node->saAmfNodeOperState == 
SA_AMF_OPERATIONAL_ENABLED)) {
if ((su->saAmfSUPreInstantiable == false) ||
(su->su_on_node->node_state != AVD_AVND_STATE_PRESENT))
continue;

if (sg->saAmfSGNumPrefInserviceSUs > su_try_inst) {
if (avd_snd_presence_msg(avd_cb, su, false) != 
NCSCC_RC_SUCCESS) {
LOG_NO("Failed to send Instantiation of '%s'", 
su->name.c_str());
} else {
su->su_on_node->su_cnt_admin_oper++;
su_try_inst++;
}
}
}



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2321 Incorrect error messages "mkfifo already exists" observed in syslog

2017-03-01 Thread Ritu Raj
This mesages only observed when we start opensaf after perfoming "opensaf stop" 
operation. 

Sequence : Stop opensaf> Start opensaf (ERR Message will observed in syslog)

while there is no impact of these syslog ERR on opensaf services, opensaf 
successfully started  without any issues.



---

** [tickets:#2321] Incorrect error messages "mkfifo already exists" observed in 
syslog**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Thu Feb 23, 2017 05:46 AM UTC by Ritu Raj
**Last Updated:** Wed Mar 01, 2017 11:17 AM UTC
**Owner:** nobody


# Environment details
OS : Suse 64bit
Changeset :  8603( 5.2.MO-1)

# Summary
Incorrect error messages "mkfifo already exists" observed in syslog after 
perfoming opensaf stop and start operation.

#Steps
1. Started the OpenSAF on single controller
2. Stop the OpenSAF and start agian, while starting OpnSAF again on same node 
following error message observed in syslog for component osafamfnd and 
osafamfwd:

Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: mkfifo already exists: 
/var/lib/opensaf/osafamfnd.fifo File exists
Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: Started

Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: mkfifo already exists: 
/var/lib/opensaf/osafamfwd.fifo File exists
Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: Started





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down

2017-03-01 Thread Nagendra Kumar
Hi Minh, I am not getting any clue about the faults. Can you please provide 
traces if possible (via email).
Thanks
-Nagu


---

** [tickets:#2213] AMFND: Coredump if suFailover while shutting down**

**Status:** assigned
**Milestone:** 5.2.RC1
**Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau
**Last Updated:** Wed Mar 01, 2017 10:57 AM UTC
**Owner:** Nagendra Kumar
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) 
(548.6 kB; application/x-compressed)


Seen amfnd coredump in PL5 with bt as below while cluster is shutting down
~~~
Thread 1 (Thread 0x7f92a8925780 (LWP 411)):
#0  __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358
No locals.
#1  0x00449cc9 in avsv_dblist_sastring_cmp (key1=, 
key2=) at util.c:361
i = 0
str1 = 
str2 = 
#2  0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, 
key=0x656d6e6769737361 ) at ncsdlib.c:169
start_ptr = 0x1ee3168
#3  0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 
<_avnd_cb>, comp=0x1ee8200) at comp.cc:2652
curr = 0x1ee8060
prv = 0x1ee3150
__FUNCTION__ = "avnd_comp_cmplete_all_csi_rec"
#4  0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, 
su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc
.cc:3161
rc = 
#5  avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, 
comp=comp@entry=0x1ee8200, prv_st=prv_st@entry=
SA_AMF_PRESENCE_RESTARTING, 
final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967
csi = 0x0
__FUNCTION__ = "avnd_comp_clc_st_chng_prc"
ev = AVND_SU_PRES_FSM_EV_MAX
is_en = 
rc = 1
#6  0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 
<_avnd_cb>, comp=comp@entry=0x1ee8200, ev=
AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906
prv_st = 
final_st = 
rc = 1
__FUNCTION__ = "avnd_comp_clc_fsm_run"
#7  0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, 
evt=0x7f9298c0) at clc.cc:414
__FUNCTION__ = "avnd_evt_clc_resp_evh"
ev = 
clc_evt = 0x7f9298e0
comp = 0x1ee8200
rc = 1
#8  0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626
cb = 0x666940 <_avnd_cb>
rc = 1
#9  avnd_main_process () at main.cc:577
ret = 
fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, 
revents = 0}, {fd = 14, events = 1, revents = 
0}, {fd = 0, events = 0, revents = 0}}
evt = 0x7f9298c0
__FUNCTION__ = "avnd_main_process"
result = 
rc = 
#10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202
error = 0
1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory.
~~~
In syslog of PL5:

2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' 
component restart probation timer started (timeout: 600 ns)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 
'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 
'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 
'csiRemovecallbackTimeout' : Recovery is 'componentRestart'
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' 
Presence State INSTANTIATED => RESTARTING
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' 
component restart probation timer started (timeout: 600 ns)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 
'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 
'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 
'csiRemovecallbackTimeout' : Recovery is 'componentRestart'
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' 
Presence State INSTANTIATED => RESTARTING
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' 
component restart probation timer started (timeout: 600 ns)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 
'safSu=4,safSg=1,safApp=npm_2' (comp restart count: 1)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 
'safComp=A,safSu=4,safSg=1,safApp=npm_2' faulted due to 
'csiRemovecallbackTimeout' : Recovery is 'componentRestart'
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' 
Presence State INSTANTIATED => RESTARTING
2016-11-20 22:01:21 PL-5 amfclccli[729]: CLEANUP request 
'safComp=A,safSu=4,safSg=1,safApp=npm_2'
2016-11-20 22:01:21 PL-5 amfclccli[728]: CLEANUP request 
'safComp=A,safSu=3,safSg=1,safApp=nway_1'
2016-11-20 22:01:21 PL-5 amfclccli[727]: CLEANUP request 
'safComp=A,safSu=3,safSg=1,safApp=npm_1'
2016-11-20 22:02:12 PL-5 osafamfnd[411]: NO Removed 'safSi=2,safApp=nway_1' 
from 'safSu=3,safSg=1,safApp=nway_1'
2016-11-20 22:02:12 PL-5 osafimmnd[394]: NO Global discard node received for 

[tickets] [opensaf:tickets] #2321 Incorrect error messages "mkfifo already exists" observed in syslog

2017-03-01 Thread Hans Nordebäck
perhaps this message can be changed to a trace message? The fifo file is 
deleted at graceful shutdown but if e.g. the node has crashed the fifo file is 
not removed and at the next node start this log message will be written.


---

** [tickets:#2321] Incorrect error messages "mkfifo already exists" observed in 
syslog**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Thu Feb 23, 2017 05:46 AM UTC by Ritu Raj
**Last Updated:** Wed Mar 01, 2017 10:59 AM UTC
**Owner:** nobody


# Environment details
OS : Suse 64bit
Changeset :  8603( 5.2.MO-1)

# Summary
Incorrect error messages "mkfifo already exists" observed in syslog after 
perfoming opensaf stop and start operation.

#Steps
1. Started the OpenSAF on single controller
2. Stop the OpenSAF and start agian, while starting OpnSAF again on same node 
following error message observed in syslog for component osafamfnd and 
osafamfwd:

Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: mkfifo already exists: 
/var/lib/opensaf/osafamfnd.fifo File exists
Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: Started

Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: mkfifo already exists: 
/var/lib/opensaf/osafamfwd.fifo File exists
Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: Started





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2321 Incorrect error messages "mkfifo already exists" observed in syslog

2017-03-01 Thread Nagendra Kumar
Hi Hans N, are these logs required in syslog?


---

** [tickets:#2321] Incorrect error messages "mkfifo already exists" observed in 
syslog**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Thu Feb 23, 2017 05:46 AM UTC by Ritu Raj
**Last Updated:** Wed Mar 01, 2017 07:39 AM UTC
**Owner:** nobody


# Environment details
OS : Suse 64bit
Changeset :  8603( 5.2.MO-1)

# Summary
Incorrect error messages "mkfifo already exists" observed in syslog after 
perfoming opensaf stop and start operation.

#Steps
1. Started the OpenSAF on single controller
2. Stop the OpenSAF and start agian, while starting OpnSAF again on same node 
following error message observed in syslog for component osafamfnd and 
osafamfwd:

Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: mkfifo already exists: 
/var/lib/opensaf/osafamfnd.fifo File exists
Feb 23 16:21:34 SO-SLOT-1 osafamfnd[21955]: Started

Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: mkfifo already exists: 
/var/lib/opensaf/osafamfwd.fifo File exists
Feb 23 16:21:34 SO-SLOT-1 osafamfwd[22062]: Started





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2213 AMFND: Coredump if suFailover while shutting down

2017-03-01 Thread Nagendra Kumar
- **status**: unassigned --> assigned
- **assigned_to**: Nagendra Kumar



---

** [tickets:#2213] AMFND: Coredump if suFailover while shutting down**

**Status:** assigned
**Milestone:** 5.2.RC1
**Created:** Fri Dec 02, 2016 04:54 AM UTC by Minh Hon Chau
**Last Updated:** Fri Dec 02, 2016 04:54 AM UTC
**Owner:** Nagendra Kumar
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2213/attachment/log.tgz) 
(548.6 kB; application/x-compressed)


Seen amfnd coredump in PL5 with bt as below while cluster is shutting down
~~~
Thread 1 (Thread 0x7f92a8925780 (LWP 411)):
#0  __strcmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1358
No locals.
#1  0x00449cc9 in avsv_dblist_sastring_cmp (key1=, 
key2=) at util.c:361
i = 0
str1 = 
str2 = 
#2  0x7f92a84b1f95 in ncs_db_link_list_find (list_ptr=0x1ee89f0, 
key=0x656d6e6769737361 ) at ncsdlib.c:169
start_ptr = 0x1ee3168
#3  0x00416dc0 in avnd_comp_cmplete_all_csi_rec (cb=0x666940 
<_avnd_cb>, comp=0x1ee8200) at comp.cc:2652
curr = 0x1ee8060
prv = 0x1ee3150
__FUNCTION__ = "avnd_comp_cmplete_all_csi_rec"
#4  0x0040ca47 in avnd_instfail_su_failover (failed_comp=0x1ee8200, 
su=0x1ee74e0, cb=0x666940 <_avnd_cb>) at clc
.cc:3161
rc = 
#5  avnd_comp_clc_st_chng_prc (cb=cb@entry=0x666940 <_avnd_cb>, 
comp=comp@entry=0x1ee8200, prv_st=prv_st@entry=
SA_AMF_PRESENCE_RESTARTING, 
final_st=final_st@entry=SA_AMF_PRESENCE_TERMINATION_FAILED) at clc.cc:967
csi = 0x0
__FUNCTION__ = "avnd_comp_clc_st_chng_prc"
ev = AVND_SU_PRES_FSM_EV_MAX
is_en = 
rc = 1
#6  0x0040f530 in avnd_comp_clc_fsm_run (cb=cb@entry=0x666940 
<_avnd_cb>, comp=comp@entry=0x1ee8200, ev=
AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL) at clc.cc:906
prv_st = 
final_st = 
rc = 1
__FUNCTION__ = "avnd_comp_clc_fsm_run"
#7  0x0040fdea in avnd_evt_clc_resp_evh (cb=0x666940 <_avnd_cb>, 
evt=0x7f9298c0) at clc.cc:414
__FUNCTION__ = "avnd_evt_clc_resp_evh"
ev = 
clc_evt = 0x7f9298e0
comp = 0x1ee8200
rc = 1
#8  0x0042676f in avnd_evt_process (evt=0x7f9298c0) at main.cc:626
cb = 0x666940 <_avnd_cb>
rc = 1
#9  avnd_main_process () at main.cc:577
ret = 
fds = {{fd = 12, events = 1, revents = 1}, {fd = 16, events = 1, 
revents = 0}, {fd = 14, events = 1, revents = 
0}, {fd = 0, events = 0, revents = 0}}
evt = 0x7f9298c0
__FUNCTION__ = "avnd_main_process"
result = 
rc = 
#10 0x004058f3 in main (argc=1, argv=0x7ffe700c5c78) at main.cc:202
error = 0
1358../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory.
~~~
In syslog of PL5:

2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' 
component restart probation timer started (timeout: 600 ns)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 
'safSu=3,safSg=1,safApp=npm_1' (comp restart count: 1)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 
'safComp=A,safSu=3,safSg=1,safApp=npm_1' faulted due to 
'csiRemovecallbackTimeout' : Recovery is 'componentRestart'
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=npm_1' 
Presence State INSTANTIATED => RESTARTING
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' 
component restart probation timer started (timeout: 600 ns)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 
'safSu=3,safSg=1,safApp=nway_1' (comp restart count: 1)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 
'safComp=A,safSu=3,safSg=1,safApp=nway_1' faulted due to 
'csiRemovecallbackTimeout' : Recovery is 'componentRestart'
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=3,safSg=1,safApp=nway_1' 
Presence State INSTANTIATED => RESTARTING
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' 
component restart probation timer started (timeout: 600 ns)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO Restarting a component of 
'safSu=4,safSg=1,safApp=npm_2' (comp restart count: 1)
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 
'safComp=A,safSu=4,safSg=1,safApp=npm_2' faulted due to 
'csiRemovecallbackTimeout' : Recovery is 'componentRestart'
2016-11-20 22:01:21 PL-5 osafamfnd[411]: NO 'safSu=4,safSg=1,safApp=npm_2' 
Presence State INSTANTIATED => RESTARTING
2016-11-20 22:01:21 PL-5 amfclccli[729]: CLEANUP request 
'safComp=A,safSu=4,safSg=1,safApp=npm_2'
2016-11-20 22:01:21 PL-5 amfclccli[728]: CLEANUP request 
'safComp=A,safSu=3,safSg=1,safApp=nway_1'
2016-11-20 22:01:21 PL-5 amfclccli[727]: CLEANUP request 
'safComp=A,safSu=3,safSg=1,safApp=npm_1'
2016-11-20 22:02:12 PL-5 osafamfnd[411]: NO Removed 'safSi=2,safApp=nway_1' 
from 'safSu=3,safSg=1,safApp=nway_1'
2016-11-20 22:02:12 PL-5 osafimmnd[394]: NO Global discard node received for 
nodeId:2040f pid:399
2016-11-20 22:02:12 PL-5 

[tickets] [opensaf:tickets] #2106 amf: Admin Operations on middleware SUs / SIs should not be supported

2017-03-01 Thread Nagendra Kumar
- **status**: accepted --> review



---

** [tickets:#2106] amf: Admin Operations on middleware SUs  / SIs should not be 
supported**

**Status:** review
**Milestone:** 5.2.RC1
**Created:** Sun Oct 09, 2016 11:18 AM UTC by Srikanth R
**Last Updated:** Wed Mar 01, 2017 10:02 AM UTC
**Owner:** Nagendra Kumar


Changeset : 8190 5.1.GA

-> Bring up a single controller SC-1
-> Now perform lock and unlock operation of middleware SU .i.e 
safSu=SC-2,safSg=NoRed,safApp=OpenSAF which is hosted on SC-2.
-> Admin lock operation succeeds, but admin unlock operation times out with the 
assignment to one of middleware SI.

 Following is the opensafd status after the unlock operation.
 
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)

  Admin operations on middleware objects should not be supported.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2328 log: logtest 5 17 fail

2017-03-01 Thread Canh Truong
- **status**: assigned --> review



---

** [tickets:#2328] log: logtest 5 17 fail**

**Status:** review
**Milestone:** 5.2.RC1
**Created:** Wed Mar 01, 2017 07:56 AM UTC by Canh Truong
**Last Updated:** Wed Mar 01, 2017 07:56 AM UTC
**Owner:** Canh Truong


2017-03-01 00:02:45,535 DEBUG - *** Test 17 (Suite 5): Add 
logRecordDestinationConfiguration. OK
2017-03-01 00:02:45,860 ERROR - *** FAILED in Test 17 (Suite 5): Add 
logRecordDestinationConfiguration. OK
Command: 'logtest 5 17'
rc: 255 - output:
Suite 5: LOG OI tests, Service configuration object
Check OpenSafLogConfig Fail
?[1;31m 17 FAILED (expected EXIT_SUCCESS, got EXIT_FAILURE (1)) Add 
logRecordDestinationConfiguration. OK?[0m;

step to reproduce:
logtest 18 1
logtest 5 17

Some test cases in test suite logtest 18 do not restore the system to beginning 
of test case after testing. This causes the checking in logtest 5 17 failed.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2106 amf: Admin Operations on middleware SUs / SIs should not be supported

2017-03-01 Thread Nagendra Kumar
- **status**: unassigned --> accepted
- **assigned_to**: Nagendra Kumar
- **Part**: - --> d
- **Version**:  --> 5.1 GA



---

** [tickets:#2106] amf: Admin Operations on middleware SUs  / SIs should not be 
supported**

**Status:** accepted
**Milestone:** 5.2.RC1
**Created:** Sun Oct 09, 2016 11:18 AM UTC by Srikanth R
**Last Updated:** Sun Oct 09, 2016 11:18 AM UTC
**Owner:** Nagendra Kumar


Changeset : 8190 5.1.GA

-> Bring up a single controller SC-1
-> Now perform lock and unlock operation of middleware SU .i.e 
safSu=SC-2,safSg=NoRed,safApp=OpenSAF which is hosted on SC-2.
-> Admin lock operation succeeds, but admin unlock operation times out with the 
assignment to one of middleware SI.

 Following is the opensafd status after the unlock operation.
 
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)

  Admin operations on middleware objects should not be supported.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2017-03-01 Thread Chani Srivastava
- **status**: unassigned --> duplicate
- **Comment**:

Closing as duplicate of #2160



---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** duplicate
**Milestone:** 5.2.RC1
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Wed Mar 01, 2017 06:46 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2116 EDS faulted on new Active controller after being promoted from QUIESCED to ACTIVE

2017-03-01 Thread M Chadrasekhar
OS : Suse 64bit
Changeset : 8634 ( 5.2.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads & PBE enabled)

similar issue observed again while running switchover scenarios.

Mar  1 14:50:58 TestBed-R2 osafsmfd[487]: NO Verify Timeout = 1000
Mar  1 14:50:58 TestBed-R2 osafsmfd[487]: NO smfKeepDuState = 0
Mar  1 14:50:50 TestBed-R2 osaflcknd[403]: ER GLND agent node not found: 
2020f5754c046
Mar  1 14:50:49 TestBed-R2 osafntfimcnd[3041]: WA ntfimcn_imm_init 
saImmOiInitialize_2() returned SA_AIS_ERR_TIMEOUT (5)
Mar  1 14:50:48 TestBed-R2 osafimmnd[32765]: NO Implementer connected: 23 
(safLckService) <135, 2020f>
Mar  1 14:50:57 TestBed-R2 osafevtd[388]: ER saImmOiImplementerSet failed with 
error: 5
Mar  1 14:50:57 TestBed-R2 osaflckd[421]: ER saImmOiImplementerSet FAILED, rc = 
5
Mar  1 14:50:58 TestBed-R2 osafamfnd[349]: NO 
'safComp=EDS,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Mar  1 14:50:58 TestBed-R2 osafamfnd[349]: ER 
safComp=EDS,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Mar  1 14:50:58 TestBed-R2 osafamfnd[349]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Mar  1 14:50:57 TestBed-R2 osafclmd[329]: ER saImmOiImplementerSet failed, rc = 
5
Mar  1 14:50:58 TestBed-R2 osaflogd[309]: WA saImmOiClassImplementerSet 
returned SA_AIS_ERR_TIMEOUT (5)
Mar  1 14:50:58 TestBed-R2 osafimmnd[32765]: NO Implementer connected: 24 
(safEvtService) <123, 2020f>
Mar  1 14:50:58 TestBed-R2 opensaf_reboot: Rebooting local node; timeout=60
Mar  1 14:50:59 TestBed-R2 osafntfimcnd[3041]: WA ntfimcn_imm_init 
saImmOiInitialize_2() returned SA_AIS_ERR_TIMEOUT (5)
Mar  1 14:50:59 TestBed-R2 osafimmnd[32765]: WA MDS Send Failed
Mar  1 14:50:59 TestBed-R2 osafimmnd[32765]: WA Failed to send response to 
agent/client over MDS



---

** [tickets:#2116] EDS faulted on new Active controller after being promoted 
from QUIESCED to ACTIVE**

**Status:** unassigned
**Milestone:** 5.0.2
**Created:** Thu Oct 13, 2016 09:49 AM UTC by Ritu Raj
**Last Updated:** Thu Nov 24, 2016 05:26 AM UTC
**Owner:** nobody
**Attachments:**

- 
[messages](https://sourceforge.net/p/opensaf/tickets/2116/attachment/messages) 
(2.9 MB; application/octet-stream)
- 
[osafevtd](https://sourceforge.net/p/opensaf/tickets/2116/attachment/osafevtd) 
(102.4 kB; application/octet-stream)


# Environment details
OS : Suse 64bit
Changeset : 8190 ( 5.1.GA)
Setup : 3 nodes ( 3 controllers with headless feature enabled & PBE disabled)

# Summary
EDS faulted on new Active controller after being promoted from QUIESCED to 
ACTIVE

# Steps followed & Observed behaviour
1. Initially started OpenSAF on 3 controller with HEADLESS feature enabled 
(SC-1 ACTIVE, SC-2 Standby, SC-3 QUIESCED)
2. Stop OpenSAF on both the controller(Active/Standby) simultaneously
3. QUIESCED controller become Active as clmna Starting to promote this node to 
a system controller

Oct 13 14:29:05 SCALE_SLOT-73 osafclmna[3434]: NO Starting to promote this node 
to a system controller
Oct 13 14:29:05 SCALE_SLOT-73 osafrded[3443]: NO Requesting ACTIVE role


Oct 13 14:29:10 SCALE_SLOT-73 osafimmd[3462]: IN AMF HA ACTIVE request
Oct 13 14:29:10 SCALE_SLOT-73 osaffmd[3452]: NO Stopped activation supervision 
due to new AMF state 1
Oct 13 14:29:10 SCALE_SLOT-73 osafamfd[3513]: NO Received node_up from 2030f: 
msg_id 1
Oct 13 14:29:10 SCALE_SLOT-73 osafamfd[3513]: NO Node 'SC-3' joined the cluster

3. After few second EDS faulted and node went for reboot

Oct 13 14:30:11 SCALE_SLOT-73 osafamfnd[3523]: NO 
'safComp=EDS,safSu=SC-3,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Oct 13 14:30:11 SCALE_SLOT-73 osafamfnd[3523]: ER 
safComp=EDS,safSu=SC-3,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Oct 13 14:30:11 SCALE_SLOT-73 osafamfnd[3523]: Rebooting OpenSAF NodeId = 
131855 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131855, SupervisionTime = 60


** Notes
1. Syslog attached
2. osafevtd trace attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2327 imm: deadlock between IMM and CLM when integrated with CLM

2017-03-01 Thread Neelakanta Reddy
- **summary**: imm: deadlock between IMM when integrated with CLM --> imm: 
deadlock between IMM and CLM when integrated with CLM



---

** [tickets:#2327] imm: deadlock between IMM and CLM when integrated with CLM**

**Status:** review
**Milestone:** 5.2.RC1
**Created:** Wed Mar 01, 2017 06:22 AM UTC by Srikanth R
**Last Updated:** Wed Mar 01, 2017 09:16 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[opensafStartup.tgz](https://sourceforge.net/p/opensaf/tickets/2327/attachment/opensafStartup.tgz)
 (1.4 MB; application/x-compressed-tar)


Changeset: 8634 5.2.FC
SLES single node TIPC setup.


Issue : opensafd failed to startup on active controller for the first time.

Below is the output from syslog

Mar  6 01:27:19 SUSE-S1-C1 opensafd[11180]: NO Monitoring of CLMD started
Mar  6 01:27:19 SUSE-S1-C1 osafclmna[11211]: NO 
safNode=SC-1,safCluster=myClmCluster Joined cluster, nodeid=2010f
Mar  6 01:27:19 SUSE-S1-C1 osafamfd[11301]: Started
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: WA saClmInitialize_4 returned 5
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER saImmOiInitialize failed 5
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER avd_imm_init FAILED
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize_for_assignment FAILED 
2
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize failed, exiting
Mar  6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Failed   DESC:AMFD
Mar  6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Going for recovery

Below is the output from clmd.
Mar  6  1:27:29.273608 osafclmd [11291:src/clm/clmd/clms_mds.c:1194] << 
clms_mds_svc_event
Mar  6  1:27:29.273644 osafclmd [11291:src/mbc/mbcsv_mds.c:0420] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Mar  6  1:27:29.269263 osafclmd [11291:src/imm/agent/imma_oi_api.cc:2783] << 
rt_object_update_common
Mar  6  1:27:29.273697 osafclmd [11291:src/clm/clmd/clms_imm.c:0842] IN 
saImmOiRtObjectUpdate failed for cluster object with rc = 5. Trying again
Mar  6  1:27:29.273709 osafclmd [11291:src/clm/clmd/clms_imm.c:0871] << 
clms_cluster_update_rattr


Traces of clmd,amfd,amfnd,immd and immnd along with mds.log and syslog are 
attached.

This issue is random. Observed two times out of three times when started on 
lone active controller.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2327 imm: deadlock between IMM when integrated with CLM

2017-03-01 Thread Neelakanta Reddy
- **summary**: Opensaf failed to start on active controller  ( random) --> imm: 
deadlock between IMM when integrated with CLM
- **status**: accepted --> review



---

** [tickets:#2327] imm: deadlock between IMM when integrated with CLM**

**Status:** review
**Milestone:** 5.2.RC1
**Created:** Wed Mar 01, 2017 06:22 AM UTC by Srikanth R
**Last Updated:** Wed Mar 01, 2017 08:42 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[opensafStartup.tgz](https://sourceforge.net/p/opensaf/tickets/2327/attachment/opensafStartup.tgz)
 (1.4 MB; application/x-compressed-tar)


Changeset: 8634 5.2.FC
SLES single node TIPC setup.


Issue : opensafd failed to startup on active controller for the first time.

Below is the output from syslog

Mar  6 01:27:19 SUSE-S1-C1 opensafd[11180]: NO Monitoring of CLMD started
Mar  6 01:27:19 SUSE-S1-C1 osafclmna[11211]: NO 
safNode=SC-1,safCluster=myClmCluster Joined cluster, nodeid=2010f
Mar  6 01:27:19 SUSE-S1-C1 osafamfd[11301]: Started
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: WA saClmInitialize_4 returned 5
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER saImmOiInitialize failed 5
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER avd_imm_init FAILED
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize_for_assignment FAILED 
2
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize failed, exiting
Mar  6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Failed   DESC:AMFD
Mar  6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Going for recovery

Below is the output from clmd.
Mar  6  1:27:29.273608 osafclmd [11291:src/clm/clmd/clms_mds.c:1194] << 
clms_mds_svc_event
Mar  6  1:27:29.273644 osafclmd [11291:src/mbc/mbcsv_mds.c:0420] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Mar  6  1:27:29.269263 osafclmd [11291:src/imm/agent/imma_oi_api.cc:2783] << 
rt_object_update_common
Mar  6  1:27:29.273697 osafclmd [11291:src/clm/clmd/clms_imm.c:0842] IN 
saImmOiRtObjectUpdate failed for cluster object with rc = 5. Trying again
Mar  6  1:27:29.273709 osafclmd [11291:src/clm/clmd/clms_imm.c:0871] << 
clms_cluster_update_rattr


Traces of clmd,amfd,amfnd,immd and immnd along with mds.log and syslog are 
attached.

This issue is random. Observed two times out of three times when started on 
lone active controller.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2323 imm: CCB operations fail after SC absence (Headless)

2017-03-01 Thread Anders Bjornerstedt
- **summary**: imm: CCB operations fail after SC absence --> imm: CCB 
operations fail after SC absence (Headless)
- **Comment**:

Added "headless" clarification because "AC absence" can be missunderstood as 
just one (out of normally two) SCs being absent.



---

** [tickets:#2323] imm: CCB operations fail after SC absence (Headless)**

**Status:** review
**Milestone:** 5.0.2
**Created:** Thu Feb 23, 2017 03:36 PM UTC by Hung Nguyen
**Last Updated:** Wed Mar 01, 2017 07:04 AM UTC
**Owner:** Hung Nguyen
**Attachments:**

- 
[logs_n_traces.tgz](https://sourceforge.net/p/opensaf/tickets/2323/attachment/logs_n_traces.tgz)
 (658.6 kB; application/gzip)


Reproduce steps:
~~~
1. Start SC-1
2. Commit some CCBs
# immcfg -c Test test=0
# immcfg -c Test test=1
# immcfg -c Test test=2
# immcfg -c Test test=3
3. Start PL-3
4. Restart SC-1
5. When SC-1 is back, it fails to add operations to CCB
# immcfg -c Test test=10
error - saImmOmCcbObjectCreate_2 FAILED with SA_AIS_ERR_FAILED_OPERATION 
(21)
OI reports: IMM: Resource abort: CCB is not in an expected state
error - saImmOmCcbApply FAILED: SA_AIS_ERR_FAILED_OPERATION (21)
OI reports: IMM: Resource abort: CCB is not in an expected state
~~~

**cb->mLatestCcbId** was not updated on PL-3 when it joined the cluster so it 
still had value of zero.

When SC-1 was back from headless, IMMND on PL-3 sent re-introduce message to 
IMMD on SC-1 with **cb->mLatestCcbId = 0**.

IMMD failed to update **cb->ccb_id_count** so when new CCB is created, it will 
start from **0+1** instead of **mLatestCcbId + 1**.

That results in the conflict with the CCB in **sCcbVector** and the CCB 
operation failure.

Attached is logs and traces.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2327 Opensaf failed to start on active controller ( random)

2017-03-01 Thread Neelakanta Reddy
The case is similar to #1731, when NTF is integrated with CLM


---

** [tickets:#2327] Opensaf failed to start on active controller  ( random)**

**Status:** accepted
**Milestone:** 5.2.RC1
**Created:** Wed Mar 01, 2017 06:22 AM UTC by Srikanth R
**Last Updated:** Wed Mar 01, 2017 07:28 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[opensafStartup.tgz](https://sourceforge.net/p/opensaf/tickets/2327/attachment/opensafStartup.tgz)
 (1.4 MB; application/x-compressed-tar)


Changeset: 8634 5.2.FC
SLES single node TIPC setup.


Issue : opensafd failed to startup on active controller for the first time.

Below is the output from syslog

Mar  6 01:27:19 SUSE-S1-C1 opensafd[11180]: NO Monitoring of CLMD started
Mar  6 01:27:19 SUSE-S1-C1 osafclmna[11211]: NO 
safNode=SC-1,safCluster=myClmCluster Joined cluster, nodeid=2010f
Mar  6 01:27:19 SUSE-S1-C1 osafamfd[11301]: Started
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: WA saClmInitialize_4 returned 5
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER saImmOiInitialize failed 5
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER avd_imm_init FAILED
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize_for_assignment FAILED 
2
Mar  6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize failed, exiting
Mar  6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Failed   DESC:AMFD
Mar  6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Going for recovery

Below is the output from clmd.
Mar  6  1:27:29.273608 osafclmd [11291:src/clm/clmd/clms_mds.c:1194] << 
clms_mds_svc_event
Mar  6  1:27:29.273644 osafclmd [11291:src/mbc/mbcsv_mds.c:0420] << 
mbcsv_mds_evt: Msg is not from same vdest, discarding
Mar  6  1:27:29.269263 osafclmd [11291:src/imm/agent/imma_oi_api.cc:2783] << 
rt_object_update_common
Mar  6  1:27:29.273697 osafclmd [11291:src/clm/clmd/clms_imm.c:0842] IN 
saImmOiRtObjectUpdate failed for cluster object with rc = 5. Trying again
Mar  6  1:27:29.273709 osafclmd [11291:src/clm/clmd/clms_imm.c:0871] << 
clms_cluster_update_rattr


Traces of clmd,amfd,amfnd,immd and immnd along with mds.log and syslog are 
attached.

This issue is random. Observed two times out of three times when started on 
lone active controller.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2289 opensafd (nid): coredump while standby starting

2017-03-01 Thread Praveen
I am not observing this any more on my setup. 




---

** [tickets:#2289] opensafd (nid): coredump while standby starting**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Tue Feb 07, 2017 06:31 AM UTC by A V Mahesh (AVM)
**Last Updated:** Wed Mar 01, 2017 07:34 AM UTC
**Owner:** nobody


Restart Standby with TCP , opensafd core dumping


(gdb) bt
/#0  0x7f2f05cb0b55 in raise () from /lib64/libc.so.6
/#1  0x7f2f05cb2131 in abort () from /lib64/libc.so.6
/#2  0x7f2f06704955 in __gnu_cxx::__verbose_terminate_handler() () at 
../../../../gcc-4.8.3/libstdc++-v3/libsupc++/vterminate.cc:95
/#3  0x7f2f06702af6 in __cxxabiv1::__terminate(void (*)()) () at 
../../../../gcc-4.8.3/libstdc++-v3/libsupc++/eh_terminate.cc:38
/#4  0x7f2f06702b23 in std::terminate() () at 
../../../../gcc-4.8.3/libstdc++-v3/libsupc++/eh_terminate.cc:48
/#5  0x7f2f06702d42 in __cxa_throw () at 
../../../../gcc-4.8.3/libstdc++-v3/libsupc++/eh_throw.cc:87
/#6  0x7f2f0670322d in operator new(unsigned long) () at 
../../../../gcc-4.8.3/libstdc++-v3/libsupc++/new_op.cc:56
/#7  0x7f2f06761979 in std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) ()
at 
/home/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:104
#8  0x7f2f0676256b in std::string::_Rep::_M_clone(std::allocator 
const&, unsigned long) () at 
/home/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:629
#9  0x7f2f06762bec in std::basic_string::basic_string(std::string const&) ()
at 
/home/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.h:229
#10 0x7f2f07262c39 in handle_data_request(pollfd*, std::string const&) () 
at /usr/include/c++/4.8.3/bits/basic_string.h:2405
#11 0x7f2f0726320f in svc_monitor_thread(void*) () at 
src/nid/nodeinit.cc:1539
#12 0x7f2f05ff97b6 in start_thread () from /lib64/libpthread.so.0
#13 0x7f2f05d559cd in clone () from /lib64/libc.so.6
#14 0x in ?? ()
(gdb) q



Feb  7 11:41:13 SC-2 opensafd: OpenSAF services successfully stopped
Feb  7 11:41:21 SC-2 opensafd: Starting OpenSAF Services(5.1.M0 - ) (Using TCP)
Feb  7 11:41:21 SC-2 osafdtmd[5329]: mkfifo already exists: 
/var/lib/opensaf/osafdtmd.fifo File exists
Feb  7 11:41:21 SC-2 osafdtmd[5329]: Started
Feb  7 11:41:21 SC-2 osaftransportd[5336]: Started
Feb  7 11:41:21 SC-2 osafclmna[5343]: Started
Feb  7 11:41:21 SC-2 osafrded[5352]: Started
Feb  7 11:41:22 SC-2 osaffmd[5361]: Started
Feb  7 11:41:22 SC-2 osaffmd[5361]: NO Remote fencing is disabled
Feb  7 11:41:22 SC-2 osafimmd[5371]: Started
Feb  7 11:41:22 SC-2 osafimmd[5371]: NO *** SC_ABSENCE_ALLOWED (Headless 
Hydra) is configured: 900 ***
Feb  7 11:41:22 SC-2 osafimmnd[5382]: Started
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO Persistent Back-End capability 
configured, Pbe file:imm.db (suffix may get added)
Feb  7 11:41:22 SC-2 opensafd[5318]: NO Monitoring of TRANSPORT started
Feb  7 11:41:22 SC-2 osafclmna[5343]: NO Starting to promote this node to a 
system controller
Feb  7 11:41:22 SC-2 osafrded[5352]: NO Requesting ACTIVE role
Feb  7 11:41:22 SC-2 osafrded[5352]: NO RDE role set to Undefined
Feb  7 11:41:22 SC-2 osafdtmd[5329]: NO Established contact with 'PL-3'
Feb  7 11:41:22 SC-2 osafdtmd[5329]: NO Established contact with 'SC-1'
Feb  7 11:41:22 SC-2 osafdtmd[5329]: NO Established contact with 'PL-4'
Feb  7 11:41:22 SC-2 osafrded[5352]: NO Peer up on node 0x2010f
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Feb  7 11:41:22 SC-2 osafrded[5352]: NO Got peer info request from node 0x2010f 
with role ACTIVE
Feb  7 11:41:22 SC-2 osafrded[5352]: NO Got peer info response from node 
0x2010f with role ACTIVE
Feb  7 11:41:22 SC-2 osafrded[5352]: NO RDE role set to QUIESCED
Feb  7 11:41:22 SC-2 osafrded[5352]: NO Giving up election against 0x2010f with 
role ACTIVE. My role is now QUIESCED
Feb  7 11:41:22 SC-2 osafclmna[5343]: NO safNode=SC-2,safCluster=myClmCluster 
Joined cluster, nodeid=2020f
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO Fevs count adjusted to 2835 
preLoadPid: 0
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> 
IMM_SERVER_CLUSTER_WAITING
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Feb  7 11:41:22 SC-2 osafimmnd[5382]: NO NODE STATE-> IMM_NODE_ISOLATED
Feb  7 11:41:23 SC-2 osafimmnd[5382]: NO NODE STATE-> IMM_NODE_W_AVAILABLE
Feb  7 11:41:23 SC-2 osafimmnd[5382]: NO SERVER STATE: IMM_SERVER_SYNC_PENDING 
--> IMM_SERVER_SYNC_CLIENT
Feb  7 11:41:23 SC-2 osafimmnd[5382]: NO NODE STATE->