[tickets] [opensaf:tickets] #1918 AMF: Informative logging

2016-11-28 Thread Nagendra Kumar
- **status**: review --> fixed
- **Comment**:

changeset:   8375:d65c5d3798c7
tag: tip
user:Nagendra Kumar
date:Tue Nov 29 11:50:26 2016 +0530
summary: amf: add more information in logging [#1918]

[staging:d65c5d]




---

** [tickets:#1918] AMF: Informative logging**

**Status:** fixed
**Milestone:** 5.2.FC
**Created:** Fri Jul 15, 2016 11:30 AM UTC by Minh Hon Chau
**Last Updated:** Tue Nov 01, 2016 11:36 AM UTC
**Owner:** Nagendra Kumar


Some error/warning logging in AMF currently does not give enough information 
about unexpected situation. 
For example:
1- LOG_ER("%s: invalid node state %u", __FUNCTION__, node->node_state);
-> it should tell which node name/id is in invalid state
2- LOG_ER("Wrong sg fsm state %u", su->sg_of_su->sg_fsm_state);
-> it should tell the sg name is in wrong fsm state
3- LOG_ER("Invalid node_name. Check node_id");
-> it should tell the node name in msg that amfd can not find
4- LOG_ER("Internal error, could not send message to avnd");
-> it should tell at least node id of which avnd that msg can not be sent to
5- LOG_ER("%s: no susis", __FUNCTION__);
-> which su has no susis
6-  LOG_ER("avnd_di_msg_send FAILED");
-> it's helpful to know which msg is failed to sent out, so we can know which 
msg is missing at amfd
...

As the logging is informative, it could help debugging in running system where 
the fault sometimes could not be reproduced (so there would not be trace file 
in next fault reproduction), or we can identify the fault straight away in some 
cases without tracing enquiries

This ticket will scan through amfd/amfnd file by file and add more information 
in error/waring cases. It's started at 5.1 FC and could be continued in next 
releases. Some rules:
1. Log must tell the object that error happens on
2. Log must give error code if it fails at checking return code
3. When failed to send msg, log the msg type and object that msg carries (if 
any)
4. ... 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2078 amfd: remove db_template.h

2016-11-28 Thread Nagendra Kumar
- **status**: review --> fixed
- **Comment**:

changeset:   8374:2f61c6d2026a
tag: tip
parent:  8371:47f9ab816d30
user:Nagendra Kumar
date:Tue Nov 29 11:45:33 2016 +0530
summary: amfd: remove db_template.h file [#2078]




---

** [tickets:#2078] amfd: remove db_template.h**

**Status:** fixed
**Milestone:** 5.2.FC
**Created:** Wed Sep 28, 2016 07:24 AM UTC by Gary Lee
**Last Updated:** Tue Nov 01, 2016 09:33 AM UTC
**Owner:** Nagendra Kumar


amfd/db_template.h can be removed. There is already 
osaf/libs/common/amf/include/amf_db_template.h



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2203 mds: SIGPIPE in mds_register_callback()

2016-11-28 Thread Hans Nordebäck
- **Comment**:

changeset:   8373:03fc556a1e6b
branch:  opensaf-5.1.x
tag: tip
parent:  8367:a426229f5bf2
user:Hans Nordeback 
date:Wed Nov 23 15:37:52 2016 +0100
files:   osaf/libs/core/mds/mds_main.c
description:
mds: avoid SIGPIPE in mds_register_callback() [#2203]


changeset:   8372:8fec915ef08d
branch:  opensaf-5.0.x
parent:  8364:6813756d7af0
user:Hans Nordeback 
date:Wed Nov 23 15:37:52 2016 +0100
files:   osaf/libs/core/mds/mds_main.c
description:
mds: avoid SIGPIPE in mds_register_callback() [#2203]

changeset:   8369:12f09f2e23ad
user:Hans Nordeback 
date:Fri Nov 25 16:27:47 2016 +0100
files:   osaf/libs/core/mds/mds_main.c
description:
mds: avoid SIGPIPE in mds_register_callback() [#2203]




---

** [tickets:#2203] mds: SIGPIPE in mds_register_callback()**

**Status:** fixed
**Milestone:** 5.2.FC
**Created:** Wed Nov 23, 2016 09:31 AM UTC by Hans Nordebäck
**Last Updated:** Mon Nov 28, 2016 09:36 AM UTC
**Owner:** Hans Nordebäck


> cat bt_20161118_11:15:23_207 :

signal: 13 pid: 207 uid: 0
/usr/local/lib/libopensaf_core.so.0(+0x1bce9)[0x7eff44157ce9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x113d0)[0x7eff43d283d0]
/lib/x86_64-linux-gnu/libpthread.so.0(send+0x7f)[0x7eff43d27a3f]
/usr/local/lib/libopensaf_core.so.0(+0x41901)[0x7eff4417d901]
/usr/local/lib/libopensaf_core.so.0(+0x1d06c)[0x7eff4415906c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa)[0x7eff43d1e6fa]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7eff4329cb5d]

objdump:

if ((n = send(fd, buf, sz, 0)) == -1)
   418f1:   41 8d 14 06 lea(%r14,%rax,1),%edx
   418f5:   31 c9   xor%ecx,%ecx
   418f7:   48 89 demov%rbx,%rsi
   418fa:   89 ef   mov%ebp,%edi
   418fc:   e8 df 54 fd ff  callq  16de0 
   41901:   83 f8 ffcmp$0x,%eax
   41904:   0f 85 66 fe ff ff   jne41770 

syslog(LOG_ERR, "MDS: %s: send to pid %d failed - %s",
__FUNCTION__, creds->pid, 
strerror(errno));

This is in:
mds_register_callback()
 :
if ((n = send(fd, buf, sz, 0)) == -1)
syslog(LOG_ERR, "MDS: %s: send to pid %d failed - %s",
__FUNCTION__, creds->pid, 
strerror(errno));

Flag MSG_NOSIGNAL should be used in the call to send.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2188 amfd: avd_imm_impl_set fails causing node reboot

2016-11-28 Thread Minh Hon Chau
Hi Praveen,

I have only tried to reproduce the coredump based on latest changeset but have 
not seen it again with trace. And I don't have any new patch officially to be 
tested, please push the patch that you think it can fix the problem, then 
change status of this ticket as well.

Thanks,
Minh


---

** [tickets:#2188] amfd: avd_imm_impl_set fails causing node reboot**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Tue Nov 15, 2016 06:39 AM UTC by Gary Lee
**Last Updated:** Tue Nov 29, 2016 04:44 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[amfd-core.txt](https://sourceforge.net/p/opensaf/tickets/2188/attachment/amfd-core.txt)
 (9.9 kB; text/plain)


avd_imm_impl_set fails causing node reboot

It seems there may have been simultaneous IMM reinit threads running.

Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer connected: 44 
(safAmfService) <526, 2020f>
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Finished re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: ER saImmOiImplementerSet failed 14
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: ER exiting since avd_imm_impl_set failed
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer locally disconnected. 
Marking it as doomed 44 <526, 2020f> (safAmfService)
Nov 14 02:24:26 SC-2-2 osafamfnd[4192]: WA AMF director unexpectedly crashed
Nov 14 02:24:26 SC-2-2 osafamfnd[4192]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer disconnected 44 <526, 
2020f> (safAmfService)


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2188 amfd: avd_imm_impl_set fails causing node reboot

2016-11-28 Thread Praveen
Hi Minh,
I think check suggested by Gary can be included.
Please push the version that you have tested along with this minor correction.

Thanks,
Praveen


---

** [tickets:#2188] amfd: avd_imm_impl_set fails causing node reboot**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Tue Nov 15, 2016 06:39 AM UTC by Gary Lee
**Last Updated:** Tue Nov 29, 2016 12:29 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[amfd-core.txt](https://sourceforge.net/p/opensaf/tickets/2188/attachment/amfd-core.txt)
 (9.9 kB; text/plain)


avd_imm_impl_set fails causing node reboot

It seems there may have been simultaneous IMM reinit threads running.

Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer connected: 44 
(safAmfService) <526, 2020f>
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Finished re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: ER saImmOiImplementerSet failed 14
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: ER exiting since avd_imm_impl_set failed
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer locally disconnected. 
Marking it as doomed 44 <526, 2020f> (safAmfService)
Nov 14 02:24:26 SC-2-2 osafamfnd[4192]: WA AMF director unexpectedly crashed
Nov 14 02:24:26 SC-2-2 osafamfnd[4192]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer disconnected 44 <526, 
2020f> (safAmfService)


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2188 amfd: avd_imm_impl_set fails causing node reboot

2016-11-28 Thread Gary Lee
- **status**: fixed --> assigned



---

** [tickets:#2188] amfd: avd_imm_impl_set fails causing node reboot**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Tue Nov 15, 2016 06:39 AM UTC by Gary Lee
**Last Updated:** Mon Nov 28, 2016 10:38 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[amfd-core.txt](https://sourceforge.net/p/opensaf/tickets/2188/attachment/amfd-core.txt)
 (9.9 kB; text/plain)


avd_imm_impl_set fails causing node reboot

It seems there may have been simultaneous IMM reinit threads running.

Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer connected: 44 
(safAmfService) <526, 2020f>
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Finished re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: ER saImmOiImplementerSet failed 14
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: ER exiting since avd_imm_impl_set failed
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer locally disconnected. 
Marking it as doomed 44 <526, 2020f> (safAmfService)
Nov 14 02:24:26 SC-2-2 osafamfnd[4192]: WA AMF director unexpectedly crashed
Nov 14 02:24:26 SC-2-2 osafamfnd[4192]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer disconnected 44 <526, 
2020f> (safAmfService)


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] Re: #2188 amfd: avd_imm_impl_set fails causing node reboot

2016-11-28 Thread Minh Hon Chau
Retry the tests many times, but have not seen it


---

** [tickets:#2188] amfd: avd_imm_impl_set fails causing node reboot**

**Status:** fixed
**Milestone:** 5.0.2
**Created:** Tue Nov 15, 2016 06:39 AM UTC by Gary Lee
**Last Updated:** Mon Nov 28, 2016 10:38 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[amfd-core.txt](https://sourceforge.net/p/opensaf/tickets/2188/attachment/amfd-core.txt)
 (9.9 kB; text/plain)


avd_imm_impl_set fails causing node reboot

It seems there may have been simultaneous IMM reinit threads running.

Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer connected: 44 
(safAmfService) <526, 2020f>
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Finished re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: ER saImmOiImplementerSet failed 14
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: ER exiting since avd_imm_impl_set failed
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer locally disconnected. 
Marking it as doomed 44 <526, 2020f> (safAmfService)
Nov 14 02:24:26 SC-2-2 osafamfnd[4192]: WA AMF director unexpectedly crashed
Nov 14 02:24:26 SC-2-2 osafamfnd[4192]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer disconnected 44 <526, 
2020f> (safAmfService)


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] Re: #2210 AMFD: Loss of RT attribute update before headless

2016-11-28 Thread Minh Hon Chau
it's wrong trace, so remove it


---

** [tickets:#2210] AMFD: Loss of RT attribute update before headless**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Mon Nov 28, 2016 10:18 PM UTC by Minh Hon Chau
**Last Updated:** Mon Nov 28, 2016 10:21 PM UTC
**Owner:** nobody


A loss of IMM RT saAmfSIAdminState update in AMFD has been seen just before 
cluster goes headless. It results in coredump after headless.

One scenario is:
- Issue amf-admin shutdown SI, delay csi quiescing callback
- Stop SCs, release csi quiescing callback
- Restart SCs
Observation: the saAmfSIAdminState is read as UNLOCKED while related SUSI was 
QUIESCED, and coredump as below

~~~
Thread 1 (Thread 0x7fec174a0780 (LWP 493)):
#0  0x004fbfd5 in SG_2N::node_fail_si_oper (this=0x24109d0, 
su=0x2413440) at sg_2n_fsm.cc:3102
s_susi = 0x8f5000b
susi_temp = 0x5fa169
o_su = 0x2417f98
__FUNCTION__ = "node_fail_si_oper"
cb = 0x919240 <_control_block>
#1  0x004fe69c in SG_2N::node_fail (this=0x24109d0, cb=0x919240 
<_control_block>, su=0x2413440) at sg_2n_fsm.cc:
3469
a_susi = 0x1
s_susi = 0x7fffedecd2d0
o_su = 0x5a50bd 
flag = 2
__FUNCTION__ = "node_fail"
su_ha_state = 0
#2  0x00513010 in AVD_SG::failover_absent_assignment (this=0x24109d0) 
at sg.cc:2273
su = @0x2411330: 0x2413440
__for_range = std::vector of length 2, capacity 2 = {0x2413440, 
0x24111e0}
__for_begin = 
__for_end = 
__FUNCTION__ = "failover_absent_assignment"
#3  0x0043be65 in avd_cluster_tmr_init_evh (cb=0x919240 
<_control_block>, evt=0x7fec04000df0) at cluster.cc:103
i_sg = 0x24109d0
it = {first = "safSg=1,safApp=osaftest", second = }
__FUNCTION__ = "avd_cluster_tmr_init_evh"
su = 0x0
node = 0x240f9b0
~~~



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2210 AMFD: Loss of RT attribute update before headless

2016-11-28 Thread Minh Hon Chau
- Description has changed:

Diff:



--- old
+++ new
@@ -35,5 +35,3 @@
 su = 0x0
 node = 0x240f9b0
 ~~~
-
-Trace is attached






---

** [tickets:#2210] AMFD: Loss of RT attribute update before headless**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Mon Nov 28, 2016 10:18 PM UTC by Minh Hon Chau
**Last Updated:** Mon Nov 28, 2016 10:18 PM UTC
**Owner:** nobody


A loss of IMM RT saAmfSIAdminState update in AMFD has been seen just before 
cluster goes headless. It results in coredump after headless.

One scenario is:
- Issue amf-admin shutdown SI, delay csi quiescing callback
- Stop SCs, release csi quiescing callback
- Restart SCs
Observation: the saAmfSIAdminState is read as UNLOCKED while related SUSI was 
QUIESCED, and coredump as below

~~~
Thread 1 (Thread 0x7fec174a0780 (LWP 493)):
#0  0x004fbfd5 in SG_2N::node_fail_si_oper (this=0x24109d0, 
su=0x2413440) at sg_2n_fsm.cc:3102
s_susi = 0x8f5000b
susi_temp = 0x5fa169
o_su = 0x2417f98
__FUNCTION__ = "node_fail_si_oper"
cb = 0x919240 <_control_block>
#1  0x004fe69c in SG_2N::node_fail (this=0x24109d0, cb=0x919240 
<_control_block>, su=0x2413440) at sg_2n_fsm.cc:
3469
a_susi = 0x1
s_susi = 0x7fffedecd2d0
o_su = 0x5a50bd 
flag = 2
__FUNCTION__ = "node_fail"
su_ha_state = 0
#2  0x00513010 in AVD_SG::failover_absent_assignment (this=0x24109d0) 
at sg.cc:2273
su = @0x2411330: 0x2413440
__for_range = std::vector of length 2, capacity 2 = {0x2413440, 
0x24111e0}
__for_begin = 
__for_end = 
__FUNCTION__ = "failover_absent_assignment"
#3  0x0043be65 in avd_cluster_tmr_init_evh (cb=0x919240 
<_control_block>, evt=0x7fec04000df0) at cluster.cc:103
i_sg = 0x24109d0
it = {first = "safSg=1,safApp=osaftest", second = }
__FUNCTION__ = "avd_cluster_tmr_init_evh"
su = 0x0
node = 0x240f9b0
~~~



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2210 AMFD: Loss of RT attribute update before headless

2016-11-28 Thread Minh Hon Chau
- Attachments has changed:

Diff:



--- old
+++ new
@@ -1 +0,0 @@
-osafamfd.tgz (1.1 MB; application/x-compressed)






---

** [tickets:#2210] AMFD: Loss of RT attribute update before headless**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Mon Nov 28, 2016 10:18 PM UTC by Minh Hon Chau
**Last Updated:** Mon Nov 28, 2016 10:18 PM UTC
**Owner:** nobody


A loss of IMM RT saAmfSIAdminState update in AMFD has been seen just before 
cluster goes headless. It results in coredump after headless.

One scenario is:
- Issue amf-admin shutdown SI, delay csi quiescing callback
- Stop SCs, release csi quiescing callback
- Restart SCs
Observation: the saAmfSIAdminState is read as UNLOCKED while related SUSI was 
QUIESCED, and coredump as below

~~~
Thread 1 (Thread 0x7fec174a0780 (LWP 493)):
#0  0x004fbfd5 in SG_2N::node_fail_si_oper (this=0x24109d0, 
su=0x2413440) at sg_2n_fsm.cc:3102
s_susi = 0x8f5000b
susi_temp = 0x5fa169
o_su = 0x2417f98
__FUNCTION__ = "node_fail_si_oper"
cb = 0x919240 <_control_block>
#1  0x004fe69c in SG_2N::node_fail (this=0x24109d0, cb=0x919240 
<_control_block>, su=0x2413440) at sg_2n_fsm.cc:
3469
a_susi = 0x1
s_susi = 0x7fffedecd2d0
o_su = 0x5a50bd 
flag = 2
__FUNCTION__ = "node_fail"
su_ha_state = 0
#2  0x00513010 in AVD_SG::failover_absent_assignment (this=0x24109d0) 
at sg.cc:2273
su = @0x2411330: 0x2413440
__for_range = std::vector of length 2, capacity 2 = {0x2413440, 
0x24111e0}
__for_begin = 
__for_end = 
__FUNCTION__ = "failover_absent_assignment"
#3  0x0043be65 in avd_cluster_tmr_init_evh (cb=0x919240 
<_control_block>, evt=0x7fec04000df0) at cluster.cc:103
i_sg = 0x24109d0
it = {first = "safSg=1,safApp=osaftest", second = }
__FUNCTION__ = "avd_cluster_tmr_init_evh"
su = 0x0
node = 0x240f9b0
~~~

Trace is attached



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2210 AMFD: Loss of RT attribute update before headless

2016-11-28 Thread Minh Hon Chau



---

** [tickets:#2210] AMFD: Loss of RT attribute update before headless**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Mon Nov 28, 2016 10:18 PM UTC by Minh Hon Chau
**Last Updated:** Mon Nov 28, 2016 10:18 PM UTC
**Owner:** nobody
**Attachments:**

- 
[osafamfd.tgz](https://sourceforge.net/p/opensaf/tickets/2210/attachment/osafamfd.tgz)
 (1.1 MB; application/x-compressed)


A loss of IMM RT saAmfSIAdminState update in AMFD has been seen just before 
cluster goes headless. It results in coredump after headless.

One scenario is:
- Issue amf-admin shutdown SI, delay csi quiescing callback
- Stop SCs, release csi quiescing callback
- Restart SCs
Observation: the saAmfSIAdminState is read as UNLOCKED while related SUSI was 
QUIESCED, and coredump as below

~~~
Thread 1 (Thread 0x7fec174a0780 (LWP 493)):
#0  0x004fbfd5 in SG_2N::node_fail_si_oper (this=0x24109d0, 
su=0x2413440) at sg_2n_fsm.cc:3102
s_susi = 0x8f5000b
susi_temp = 0x5fa169
o_su = 0x2417f98
__FUNCTION__ = "node_fail_si_oper"
cb = 0x919240 <_control_block>
#1  0x004fe69c in SG_2N::node_fail (this=0x24109d0, cb=0x919240 
<_control_block>, su=0x2413440) at sg_2n_fsm.cc:
3469
a_susi = 0x1
s_susi = 0x7fffedecd2d0
o_su = 0x5a50bd 
flag = 2
__FUNCTION__ = "node_fail"
su_ha_state = 0
#2  0x00513010 in AVD_SG::failover_absent_assignment (this=0x24109d0) 
at sg.cc:2273
su = @0x2411330: 0x2413440
__for_range = std::vector of length 2, capacity 2 = {0x2413440, 
0x24111e0}
__for_begin = 
__for_end = 
__FUNCTION__ = "failover_absent_assignment"
#3  0x0043be65 in avd_cluster_tmr_init_evh (cb=0x919240 
<_control_block>, evt=0x7fec04000df0) at cluster.cc:103
i_sg = 0x24109d0
it = {first = "safSg=1,safApp=osaftest", second = }
__FUNCTION__ = "avd_cluster_tmr_init_evh"
su = 0x0
node = 0x240f9b0
~~~

Trace is attached



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2209 SMF: ONE-STEP upgrade failed due to duplicated entities in AU/DU

2016-11-28 Thread Tai Dinh
Hi Neel,

I think the patch is still aligned with that specification.
Note that, unlike rolling and single step/forModify, where we remove not only 
duplicated AU/DU but also overlapping AU/DU (e.g: if the original rolling proc 
have AU/DU on SU level and the other proc has AU/DU on node level that the 
previous SU is hosted on, that SU will also be removed). We just only remove 
the duplicated AU/DU for the forAddRemove. i.e: all AU/DU that are represented 
in the original campaign will be still presented in the merged campaign.

/Tai


---

** [tickets:#2209] SMF: ONE-STEP upgrade failed due to duplicated entities in 
AU/DU**

**Status:** unassigned
**Milestone:** 5.1.1
**Created:** Mon Nov 28, 2016 07:01 AM UTC by Tai Dinh
**Last Updated:** Mon Nov 28, 2016 10:41 AM UTC
**Owner:** nobody
**Attachments:**

- 
[one_step_upgrade_fix.patch](https://sourceforge.net/p/opensaf/tickets/2209/attachment/one_step_upgrade_fix.patch)
 (3.0 kB; application/octet-stream)


Execution of ONE-STEP upgrade will fail if the original campaign contains 
forAddRemove Single Step procedure that have duplicated entities with another 
procedure.

Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO STEP: Lock deactivation units
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO createNodeGroup: saImmOmCcbApply() 
Fail 'SA_AIS_ERR_FAILED_OPERATION (21)'
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO changeNodeGroupAdminState: 
createNodeGroup() Fail SA_AIS_ERR_FAILED_OPERATION (21)
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO lock: changeNodeGroupAdminState() 
Fail SA_AIS_ERR_FAILED_OPERATION (21)
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: ER Failed to Lock deactivation units in 
step=safSmfStep=0001
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: ER Step execution failed, Try undoing 
the step
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO SmfStepStateUndoing::execute start 
undoing step.
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: ER Rollback of cluster reboot activate 
step is not implemented
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: ER Step undoing failed
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO Step safSmfStep=0001 in procedure 
safSmfProc=SmfSSMergedProc failed, step result 5
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO CAMP: Procedure 
safSmfProc=SmfSSMergedProc returned FAILED
Nov 26 18:30:11 SC-2-2 osafsmfd[4929]: ER Failed to rollback campaign, wrong 
state 10

The reason of this is because during calculating/optimizing the AU/DU of the 
merged procedure, the original AU/DU of that single step procedure is always 
appended into the result procedure without checking for duplicated entities.

This need to be fixed by removing any duplicated entities that is already 
presented in the tmpDU before optimization.

See attachment for a proposed fix.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2209 SMF: ONE-STEP upgrade failed due to duplicated entities in AU/DU

2016-11-28 Thread Neelakanta Reddy
Hi Tai,

The patch looks good,
But in The SMF PR document The following has been 

For rolling and single step/forModify procedures:
-all deactivation units (DU) will be collected and redundant and overlapping 
DU's will be 
 removed e.g. an SU from one procedure which is within a node from another 
procedure 
 will be removed.
-deactivation/activation units (DU/AU) are symmetrical i.e. DU will also be 
used as AU.
For single step/forAddRemove procedures:
- DU/AU will be used as specified in the original procedure.


---

** [tickets:#2209] SMF: ONE-STEP upgrade failed due to duplicated entities in 
AU/DU**

**Status:** unassigned
**Milestone:** 5.1.1
**Created:** Mon Nov 28, 2016 07:01 AM UTC by Tai Dinh
**Last Updated:** Mon Nov 28, 2016 07:01 AM UTC
**Owner:** nobody
**Attachments:**

- 
[one_step_upgrade_fix.patch](https://sourceforge.net/p/opensaf/tickets/2209/attachment/one_step_upgrade_fix.patch)
 (3.0 kB; application/octet-stream)


Execution of ONE-STEP upgrade will fail if the original campaign contains 
forAddRemove Single Step procedure that have duplicated entities with another 
procedure.

Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO STEP: Lock deactivation units
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO createNodeGroup: saImmOmCcbApply() 
Fail 'SA_AIS_ERR_FAILED_OPERATION (21)'
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO changeNodeGroupAdminState: 
createNodeGroup() Fail SA_AIS_ERR_FAILED_OPERATION (21)
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO lock: changeNodeGroupAdminState() 
Fail SA_AIS_ERR_FAILED_OPERATION (21)
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: ER Failed to Lock deactivation units in 
step=safSmfStep=0001
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: ER Step execution failed, Try undoing 
the step
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO SmfStepStateUndoing::execute start 
undoing step.
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: ER Rollback of cluster reboot activate 
step is not implemented
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: ER Step undoing failed
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO Step safSmfStep=0001 in procedure 
safSmfProc=SmfSSMergedProc failed, step result 5
Nov 26 18:30:02 SC-2-2 osafsmfd[4929]: NO CAMP: Procedure 
safSmfProc=SmfSSMergedProc returned FAILED
Nov 26 18:30:11 SC-2-2 osafsmfd[4929]: ER Failed to rollback campaign, wrong 
state 10

The reason of this is because during calculating/optimizing the AU/DU of the 
merged procedure, the original AU/DU of that single step procedure is always 
appended into the result procedure without checking for duplicated entities.

This need to be fixed by removing any duplicated entities that is already 
presented in the tmpDU before optimization.

See attachment for a proposed fix.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2188 amfd: avd_imm_impl_set fails causing node reboot

2016-11-28 Thread Gary Lee
Hi

imm_sel_obj is unsigned. So perhaps this is an alternative:

diff --git a/osaf/services/saf/amf/amfd/main.cc 
b/osaf/services/saf/amf/amfd/main.cc
--- a/osaf/services/saf/amf/amfd/main.cc
+++ b/osaf/services/saf/amf/amfd/main.cc
@@ -646,7 +646,7 @@ static void main_loop(void)
fds[FD_CLM].events = POLLIN;
}

-   if (cb->immOiHandle != 0) {
+   if (cb->immOiHandle != 0 && cb->avd_imm_status == 
AVD_IMM_INIT_DONE) {
fds[FD_IMM].fd = cb->imm_sel_obj;
fds[FD_IMM].events = POLLIN;
nfds = FD_IMM + 1;


---

** [tickets:#2188] amfd: avd_imm_impl_set fails causing node reboot**

**Status:** fixed
**Milestone:** 5.0.2
**Created:** Tue Nov 15, 2016 06:39 AM UTC by Gary Lee
**Last Updated:** Mon Nov 28, 2016 08:28 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[amfd-core.txt](https://sourceforge.net/p/opensaf/tickets/2188/attachment/amfd-core.txt)
 (9.9 kB; text/plain)


avd_imm_impl_set fails causing node reboot

It seems there may have been simultaneous IMM reinit threads running.

Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer connected: 44 
(safAmfService) <526, 2020f>
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Finished re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: ER saImmOiImplementerSet failed 14
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: ER exiting since avd_imm_impl_set failed
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer locally disconnected. 
Marking it as doomed 44 <526, 2020f> (safAmfService)
Nov 14 02:24:26 SC-2-2 osafamfnd[4192]: WA AMF director unexpectedly crashed
Nov 14 02:24:26 SC-2-2 osafamfnd[4192]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer disconnected 44 <526, 
2020f> (safAmfService)


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2120 pyosaf: clm utils are missing dispatch function

2016-11-28 Thread Hans Nordebäck
- **status**: review --> fixed
- **Comment**:

changeset:   8371:47f9ab816d30
tag: tip
user:Robert Fekete 
date:Mon Nov 28 11:06:53 2016 +0100
files:   python/pyosaf/utils/clm/__init__.py
description:
pyosaf: clm utils are missing dispatch function [#2120]





---

** [tickets:#2120] pyosaf: clm utils are missing dispatch function**

**Status:** fixed
**Milestone:** 5.2.FC
**Created:** Mon Oct 17, 2016 06:38 AM UTC by Robert Fekete
**Last Updated:** Fri Oct 28, 2016 08:43 AM UTC
**Owner:** Robert Fekete


The clm utils __init__.py file is missing a function to invoke a dispatch.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2203 mds: SIGPIPE in mds_register_callback()

2016-11-28 Thread Hans Nordebäck
- **status**: review --> fixed
- **Comment**:

changeset:   8369:12f09f2e23ad
tag: tip
user:Hans Nordeback 
date:Fri Nov 25 16:27:47 2016 +0100
files:   osaf/libs/core/mds/mds_main.c
description:
mds: avoid SIGPIPE in mds_register_callback() [#2203]




---

** [tickets:#2203] mds: SIGPIPE in mds_register_callback()**

**Status:** fixed
**Milestone:** 5.2.FC
**Created:** Wed Nov 23, 2016 09:31 AM UTC by Hans Nordebäck
**Last Updated:** Wed Nov 23, 2016 10:55 AM UTC
**Owner:** Hans Nordebäck


> cat bt_20161118_11:15:23_207 :

signal: 13 pid: 207 uid: 0
/usr/local/lib/libopensaf_core.so.0(+0x1bce9)[0x7eff44157ce9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x113d0)[0x7eff43d283d0]
/lib/x86_64-linux-gnu/libpthread.so.0(send+0x7f)[0x7eff43d27a3f]
/usr/local/lib/libopensaf_core.so.0(+0x41901)[0x7eff4417d901]
/usr/local/lib/libopensaf_core.so.0(+0x1d06c)[0x7eff4415906c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa)[0x7eff43d1e6fa]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7eff4329cb5d]

objdump:

if ((n = send(fd, buf, sz, 0)) == -1)
   418f1:   41 8d 14 06 lea(%r14,%rax,1),%edx
   418f5:   31 c9   xor%ecx,%ecx
   418f7:   48 89 demov%rbx,%rsi
   418fa:   89 ef   mov%ebp,%edi
   418fc:   e8 df 54 fd ff  callq  16de0 
   41901:   83 f8 ffcmp$0x,%eax
   41904:   0f 85 66 fe ff ff   jne41770 

syslog(LOG_ERR, "MDS: %s: send to pid %d failed - %s",
__FUNCTION__, creds->pid, 
strerror(errno));

This is in:
mds_register_callback()
 :
if ((n = send(fd, buf, sz, 0)) == -1)
syslog(LOG_ERR, "MDS: %s: send to pid %d failed - %s",
__FUNCTION__, creds->pid, 
strerror(errno));

Flag MSG_NOSIGNAL should be used in the call to send.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2204 nid: Use the FIFO monitoring for started services

2016-11-28 Thread Hans Nordebäck
- **status**: accepted --> review



---

** [tickets:#2204] nid: Use the FIFO monitoring for started services**

**Status:** review
**Milestone:** 5.2.FC
**Created:** Wed Nov 23, 2016 03:35 PM UTC by Hans Nordebäck
**Last Updated:** Wed Nov 23, 2016 03:35 PM UTC
**Owner:** Hans Nordebäck


Use the FIFO monitoring, (used also by transport monitor), to monitor started 
services til the nid phase is completed. 

[#2158]


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2206 ntf: syslog is flooded with "ntfs_mds_msg_send FAILED" related to notification send failure.

2016-11-28 Thread Praveen
- **status**: accepted --> review



---

** [tickets:#2206] ntf: syslog is flooded with "ntfs_mds_msg_send FAILED" 
related to  notification send failure.**

**Status:** review
**Milestone:** 5.0.2
**Created:** Thu Nov 24, 2016 09:41 AM UTC by Praveen
**Last Updated:** Thu Nov 24, 2016 09:41 AM UTC
**Owner:** Praveen


Steps to reproduce:
1)Bring controller up.
2)In one terminal subscribe for notificaion with "ntfsubscribe" command.
3)In other terminal repeatedly send notification by running command  "ntfsend 
-r 1"
4)While ntfsend is sending notificationk kill ntfsubscribe in other terminal
5)syslog gets flooded with following messages:
 Nov 24 12:38:00 SC-1 osafntfd[11771]: ER ntfs_mds_msg_send FAILED
Nov 24 12:38:00 SC-1 osafntfd[11771]: ER ntfs_mds_msg_send to ntfa failed rc: 2
Nov 24 12:38:00 SC-1 osafntfd[11771]: ER ntfs_mds_msg_send FAILED
Nov 24 12:38:04 SC-1 osafntfd[11771]: message repeated 1267 times: [ ER 
ntfs_mds_msg_send FAILED]

Analysis:
  Inside MDS callback, NTFS posts NTFA down event with normal priority comapred 
to API events which are posted with HIGH priority. 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2208 log: Build failure on openSUSE 42.2

2016-11-28 Thread Anders Widell
- **status**: review --> fixed
- **Comment**:

changeset:   8370:fce39a954fed
user:Anders Widell 
date:Mon Nov 28 09:24:35 2016 +0100
summary: log: Fix build failure on openSUSE 42.2 [#2208]

[staging:fce39a]




---

** [tickets:#2208] log: Build failure on openSUSE 42.2**

**Status:** fixed
**Milestone:** 5.2.FC
**Created:** Fri Nov 25, 2016 01:59 PM UTC by Anders Widell
**Last Updated:** Fri Nov 25, 2016 02:26 PM UTC
**Owner:** Anders Widell


I get the following build failure on openSUSE 42.2:

~~~
  CC   logtest-tet_Log_misc.o
tet_LogOiOps.c: In function ‘saLogOi_22’:
tet_LogOiOps.c:568:2: error: call to function ‘logFinalize’ without a real 
prototype [-Werror=unprototyped-calls]
  logFinalize(logHandle);
  ^
In file included from logtest.h:35:0,
 from tet_LogOiOps.c:26:
./logutil.h:52:13: note: ‘logFinalize’ was declared here
 SaAisErrorT logFinalize();
 ^
tet_saLogLimitGet.c: In function ‘saLogLimitGet_01’:
tet_saLogLimitGet.c:31:2: error: call to function ‘logFinalize’ without a real 
prototype [-Werror=unprototyped-calls]
  logFinalize(logHandle);
  ^
In file included from logtest.h:35:0,
 from tet_saLogLimitGet.c:18:
./logutil.h:52:13: note: ‘logFinalize’ was declared here
 SaAisErrorT logFinalize();
 ^
cc1: all warnings being treated as errors
Makefile:866: recipe for target 'logtest-tet_saLogLimitGet.o' failed
make[2]: *** [logtest-tet_saLogLimitGet.o] Error 1
make[2]: *** Waiting for unfinished jobs
cc1: all warnings being treated as errors
Makefile:880: recipe for target 'logtest-tet_LogOiOps.o' failed

~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2188 amfd: avd_imm_impl_set fails causing node reboot

2016-11-28 Thread Minh Hon Chau
Ah I will try to run the test and see, will post the trace if I hit it again


---

** [tickets:#2188] amfd: avd_imm_impl_set fails causing node reboot**

**Status:** fixed
**Milestone:** 5.0.2
**Created:** Tue Nov 15, 2016 06:39 AM UTC by Gary Lee
**Last Updated:** Mon Nov 28, 2016 08:25 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[amfd-core.txt](https://sourceforge.net/p/opensaf/tickets/2188/attachment/amfd-core.txt)
 (9.9 kB; text/plain)


avd_imm_impl_set fails causing node reboot

It seems there may have been simultaneous IMM reinit threads running.

Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer connected: 44 
(safAmfService) <526, 2020f>
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Finished re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: ER saImmOiImplementerSet failed 14
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: ER exiting since avd_imm_impl_set failed
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer locally disconnected. 
Marking it as doomed 44 <526, 2020f> (safAmfService)
Nov 14 02:24:26 SC-2-2 osafamfnd[4192]: WA AMF director unexpectedly crashed
Nov 14 02:24:26 SC-2-2 osafamfnd[4192]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer disconnected 44 <526, 
2020f> (safAmfService)


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2188 amfd: avd_imm_impl_set fails causing node reboot

2016-11-28 Thread Minh Hon Chau
Hi Praveen,

I think your fix can solve the coredump, but I am a bit worried the other 
places, where immOiHandle is accessed in mainthread upon other events, eg. 
failover, report_admin_op_error(), immutil_saImmOiXXX. To be safe I guess we 
could put the litlle change back, since it does call saImmOiFinalize() first
~~~
@@ -1954,16 +1964,16 @@ void avd_imm_reinit_bg(void)
int rc = 0;

-   (void) saImmOiFinalize(avd_cb->immOiHandle);
-
-   avd_cb->immOiHandle = 0;
-   avd_cb->is_implementer = false;
-
pthread_attr_init();
~~~

It was removed because it's executed twice in a row (second in 
avd_imm_reinit_bg_thread(), I guess. If that's not a big problem then can we 
leave it, or we can re-organize the while () loop in avd_imm_reinit_bg_thread()

Thanks,
Minh




---

** [tickets:#2188] amfd: avd_imm_impl_set fails causing node reboot**

**Status:** fixed
**Milestone:** 5.0.2
**Created:** Tue Nov 15, 2016 06:39 AM UTC by Gary Lee
**Last Updated:** Mon Nov 28, 2016 05:45 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[amfd-core.txt](https://sourceforge.net/p/opensaf/tickets/2188/attachment/amfd-core.txt)
 (9.9 kB; text/plain)


avd_imm_impl_set fails causing node reboot

It seems there may have been simultaneous IMM reinit threads running.

Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer connected: 44 
(safAmfService) <526, 2020f>
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: NO Finished re-initializing with IMM
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: ER saImmOiImplementerSet failed 14
Nov 14 02:24:26 SC-2-2 osafamfd[4174]: ER exiting since avd_imm_impl_set failed
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer locally disconnected. 
Marking it as doomed 44 <526, 2020f> (safAmfService)
Nov 14 02:24:26 SC-2-2 osafamfnd[4192]: WA AMF director unexpectedly crashed
Nov 14 02:24:26 SC-2-2 osafamfnd[4192]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60
Nov 14 02:24:26 SC-2-2 osafimmnd[16412]: NO Implementer disconnected 44 <526, 
2020f> (safAmfService)


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2205 imm: IMMND crashes when receiving D2ND_ABORT_CCB

2016-11-28 Thread Hung Nguyen
- **status**: accepted --> review



---

** [tickets:#2205] imm: IMMND crashes when receiving D2ND_ABORT_CCB**

**Status:** review
**Milestone:** 5.0.2
**Created:** Thu Nov 24, 2016 07:23 AM UTC by Hung Nguyen
**Last Updated:** Thu Nov 24, 2016 07:30 AM UTC
**Owner:** Hung Nguyen
**Attachments:**

- 
[osafNode.immnd.bz2](https://sourceforge.net/p/opensaf/tickets/2205/attachment/osafNode.immnd.bz2)
 (18.9 MB; application/octet-stream)


~~~
Nov 16 10:06:17 SC-2-1 osafimmnd[5608]: 
../../../../../../../opensaf/osaf/services/saf/immsv/immnd/ImmModel.cc:6169: 
ccbAbort: Assertion '*nodeId == ccb->mAugCcbParent->mOriginatingNode' failed.
~~~

~~~
Nov 16 10:06:17.260296 osafimmnd [5608:immsv_evt.c:5473] T8 Received: 
IMMND_EVT_A2ND_OI_CCB_AUG_INIT (91) from 0
Nov 16 10:06:17.260303 osafimmnd [5608:immnd_evt.c:10304] >> 
immnd_evt_ccb_augment_init
Nov 16 10:06:17.260310 osafimmnd [5608:ImmModel.cc:6502] >> ccbAugmentInit
Nov 16 10:06:17.260323 osafimmnd [5608:ImmModel.cc:6555] TR Augment CCB in 
state MODIFY_OP
Nov 16 10:06:17.260329 osafimmnd [5608:ImmModel.cc:6592] TR 
omuti->second:0x14051f0
Nov 16 10:06:17.260359 osafimmnd [5608:ImmModel.cc:6593] TR 
omuti->second->mContinuationId:24 == rsp->inv:24
Nov 16 10:06:17.260366 osafimmnd [5608:ImmModel.cc:6600] TR obj:0x1405460
Nov 16 10:06:17.260371 osafimmnd [5608:ImmModel.cc:6658] << ccbAugmentInit

Nov 16 10:06:17.261479 osafimmnd [5608:immsv_evt.c:5473] T8 Received: 
IMMND_EVT_D2ND_ABORT_CCB (62) from 0
Nov 16 10:06:17.261486 osafimmnd [5608:immnd_evt.c:7684] >> 
immnd_evt_proc_ccb_finalize
Nov 16 10:06:17.261490 osafimmnd [5608:immnd_evt.c:6921] >> immnd_evt_ccb_abort
Nov 16 10:06:17.261495 osafimmnd [5608:immnd_evt.c:6925] TR We expect there to 
be a PBE
Nov 16 10:06:17.261501 osafimmnd [5608:ImmModel.cc:6079] >> ccbAbort
Nov 16 10:06:17.261506 osafimmnd [5608:ImmModel.cc:6088] T5 ABORT CCB 79
Nov 16 10:06:17.261539 osafimmnd [5608:ImmModel.cc:6151] NO Ccb 79 ABORTED 
(immcfg_SC-2-1_9735)
~~~


When IMMND received A2ND_OI_CCB_AUG_INIT the ccbstate was changed to CCB_READY.
Then when D2ND_ABORT_CCB message came, in ImmModel::ccbAbort()
\*nodeId is not updated and later it failed to assert

~~~
osafassert(*nodeId == ccb->mAugCcbParent->mOriginatingNode);
~~~

Attached is IMMND traces.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets