from:"Gary Lee"

[devel] OpenSAF 5.24.02 release

2024-02-28 Thread Gary Lee via Opensaf-devel

The OpenSAF community is pleased to announce the availability of the OpenSAF 
5.24.02 release.

The source code for OpenSAF 5.24.02 and the corresponding documentation can be 
downloaded using the following links: 
[opensaf-5.24.02.tar.gz](http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.24.02.tar.gz/download),
 
[opensaf-documentation-5.24.02.tar.gz](http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.24.02.tar.gz/download).

For a complete list of new features in this release, please refer to the 
[NEWS](https://sourceforge.net/p/opensaf/wiki/NEWS-5.24.02/) at the wiki.

See the [ChangeLog](https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.24.02/) 
for a full list of changes in this release.


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] Announcement of the OpenSAF 5.23.07 release

2023-07-30 Thread Gary Lee via Opensaf-devel

The OpenSAF community is pleased to announce the availability of the OpenSAF

5.23.07 release. The source code for OpenSAF 5.23.07 and the corresponding
documentation can be downloaded using the following links:

 

http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.23.07.tar.g
z/download

 

http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.2
3.07.tar.gz/download

 

For a complete list of new features in this release, please refer to the
NEWS at the wiki:

 

https://sourceforge.net/p/opensaf/wiki/NEWS-5.23.07/

 

See the ChangeLog for a full list of changes in this release:

 

https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.23.07/

 

Thank you for your continued interest in OpenSAF and to everyone who has
contributed to this release.

 



smime.p7s
Description: S/MIME cryptographic signature
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] Announcement of the OpenSAF 5.23.03 release

2023-03-27 Thread Gary Lee

The OpenSAF community is pleased to announce the availability of the OpenSAF
5.23.03 release. The source code for OpenSAF 5.23.03 and the corresponding
documentation can be downloaded using the following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.23.03.tar.g
z/download

http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.2
3.03.tar.gz/download

For a complete list of new features in this release, please refer to the
NEWS at the wiki:

https://sourceforge.net/p/opensaf/wiki/NEWS-5.23.03/

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.23.03/

Thank you for your continued interest in OpenSAF and to everyone who has
contributed to this release.


smime.p7s
Description: S/MIME cryptographic signature
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] Announcement of the OpenSAF 5.22.11 release

2022-11-17 Thread Gary Lee

The OpenSAF community is pleased to announce the availability of the OpenSAF
5.22.11 release. The source code for OpenSAF 5.22.11 and the corresponding
documentation can be downloaded using the following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.22.11.tar.g
z/download

http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.2
2.11.tar.gz/download

For a complete list of new features in this release, please refer to the
NEWS at the wiki:

https://sourceforge.net/p/opensaf/wiki/NEWS-5.22.11/

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.22.11/

Thank you for your continued interest in OpenSAF and to everyone who has
contributed to this release.


smime.p7s
Description: S/MIME cryptographic signature
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] Announcement of the OpenSAF 5.22.06 release

2022-05-31 Thread Gary Lee

The OpenSAF community is pleased to announce the availability of the OpenSAF 
5.22.06 release. The source code for OpenSAF 5.22.06 and the corresponding 
documentation can be downloaded using the following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.22.06.tar.gz/download

http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.22.06.tar.gz/download

For a complete list of new features in this release, please refer to the NEWS 
at the wiki:

https://sourceforge.net/p/opensaf/wiki/NEWS-5.22.06/

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.22.06/

Thank you for your continued interest in OpenSAF and to everyone who has 
contributed to this release.


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] Announcement of the OpenSAF 5.22.01 release

2022-01-23 Thread Gary Lee


The OpenSAF community is pleased to announce the availability of the
OpenSAF 5.22.01 release. The source code for OpenSAF 5.22.01 and the
corresponding documentation can be downloaded using the following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.22.01.tar.gz/download

http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.22.01.tar.gz/download

For a complete list of new features in this release, please refer to the
NEWS at the wiki:

https://sourceforge.net/p/opensaf/wiki/NEWS-5.22.01/

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.22.01/

Thank you for your continued interest in OpenSAF and to everyone who has
contributed to this release.




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] Announcement of the OpenSAF 5.21.09 release

2021-09-14 Thread Gary Lee

The OpenSAF community is pleased to announce the availability of the
OpenSAF 5.21.09 release. The source code for OpenSAF 5.21.09 and the
corresponding documentation can be downloaded using the following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.21.09.tar.gz/download

http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.21.09.tar.gz/download

For a complete list of new features in this release, please refer to the
NEWS at the wiki:

https://sourceforge.net/p/opensaf/wiki/NEWS-5.21.09/

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.21.09/

Thank you for your continued interest in OpenSAF and to everyone who has
contributed to this release.

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] Announcement of the OpenSAF 5.21.06 release

2021-05-31 Thread Gary Lee

The OpenSAF community is pleased to announce the availability of the
OpenSAF 5.21.06 release. The source code for OpenSAF 5.21.06 and the
corresponding documentation can be downloaded using the following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.21.06.tar.gz/download

http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.21.06.tar.gz/download

For a complete list of new features in this release, please refer to the
NEWS at the wiki:

https://sourceforge.net/p/opensaf/wiki/NEWS-5.21.06/

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.21.06/

Thank you for your continued interest in OpenSAF and to everyone who has
contributed to this release.

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] Announcement of the OpenSAF 5.20.11 release

2020-11-30 Thread Gary Lee

The OpenSAF community is pleased to announce the availability of the
OpenSAF 5.20.11 release. The source code for OpenSAF 5.20.11 and the
corresponding documentation can be downloaded using the following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.20.11.tar.gz/download

http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.20.11.tar.gz/download

For a complete list of new features in this release, please refer to the
NEWS at the wiki:

https://sourceforge.net/p/opensaf/wiki/NEWS-5.20.11/

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.20.11/

Note that starting from the August 2017 release, we are using a new
version numbering scheme for OpenSAF. The components in the OpenSAF
version number 5.20.11 represent the major release (5), followed by the
year (20) and month (11) when the release was made. This change was made
as a step towards introducing continuous delivery in the OpenSAF project.

Thank you for your continued interest in OpenSAF and to everyone who has
contributed to this release.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] Announcement of the OpenSAF 5.20.08 release

2020-08-30 Thread Gary Lee

The OpenSAF community is pleased to announce the availability of the
OpenSAF 5.20.08 release. The source code for OpenSAF 5.20.08 and the
corresponding documentation can be downloaded using the following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.20.08.tar.gz/download

http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.20.08.tar.gz/download

For a complete list of new features in this release, please refer to the
NEWS at the wiki:

https://sourceforge.net/p/opensaf/wiki/NEWS-5.20.08/

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.20.08/

Note that starting from the August 2017 release, we are using a new
version numbering scheme for OpenSAF. The components in the OpenSAF
version number 5.20.08 represent the major release (5), followed by the
year (20) and month (08) when the release was made. This change was made
as a step towards introducing continuous delivery in the OpenSAF project.

Thank you for your continued interest in OpenSAF and to everyone who has
contributed to this release.

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] Announcement of the OpenSAF 5.20.05 release

2020-05-29 Thread Gary Lee

The OpenSAF community is pleased to announce the availability of the
OpenSAF 5.20.05 release. The source code for OpenSAF 5.20.05 and the
corresponding documentation can be downloaded using the following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.20.05.tar.gz/download

http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.20.05.tar.gz/download

For a complete list of new features in this release, please refer to the
NEWS at the wiki:

https://sourceforge.net/p/opensaf/wiki/NEWS-5.20.05/

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.20.05/

Note that starting from the August 2017 release, we are using a new
version numbering scheme for OpenSAF. The components in the OpenSAF
version number 5.20.05 represent the major release (5), followed by the
year (20) and month (05) when the release was made. This change was made
as a step towards introducing continuous delivery in the OpenSAF project.

Thank you for your continued interest in OpenSAF and to everyone who has
contributed to this release.

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amf: Debug info logged at Emergency level [#3179]

2020-04-30 Thread Gary Lee

ack (review only)

Thanks

From: Peter McIntyre 
Sent: 30 April 2020 18:55
To: Minh Hon Chau ; Thang Duc Nguyen 

Cc: opensaf-devel@lists.sourceforge.net 
Subject: [devel] [PATCH 1/1] amf: Debug info logged at Emergency level [#3179]

Many places in amf code the debug info is logged with LOG_EMERG, which is not 
quite informative at emergency level. These should be moved to LOG_ERR level.

The fix is to change the LOG_EM to the LOG_ER level.
---
 src/amf/amfd/ndfsm.cc  |  2 +-
 src/amf/amfd/ndproc.cc |  8 +++---
 src/amf/amfd/role.cc   |  8 +++---
 src/amf/amfd/sg_2n_fsm.cc  | 50 +-
 src/amf/amfd/sg_nored_fsm.cc   | 36 
 src/amf/amfd/sg_npm_fsm.cc | 40 +--
 src/amf/amfd/sg_nway_fsm.cc| 18 ++--
 src/amf/amfd/sg_nwayact_fsm.cc | 32 +++---
 src/amf/amfd/timer.cc  |  4 +--
 src/amf/amfd/util.cc   | 16 +--
 10 files changed, 107 insertions(+), 107 deletions(-)

diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc
index e2235b2e9..674ef863a 100644
--- a/src/amf/amfd/ndfsm.cc
+++ b/src/amf/amfd/ndfsm.cc
@@ -1145,7 +1145,7 @@ uint32_t avd_node_down(AVD_CL_CB *cb, SaClmNodeIdT 
node_id) {

   if ((avnd = avd_node_find_nodeid(node_id)) == nullptr) {
 /* log error that the node id is invalid */
-LOG_EM("%s:%u: %u", __FILE__, __LINE__, node_id);
+LOG_ER("%s:%u: %u", __FILE__, __LINE__, node_id);
 return NCSCC_RC_FAILURE;
   }

diff --git a/src/amf/amfd/ndproc.cc b/src/amf/amfd/ndproc.cc
index 0d30dfe71..29c574167 100644
--- a/src/amf/amfd/ndproc.cc
+++ b/src/amf/amfd/ndproc.cc
@@ -202,7 +202,7 @@ void avd_reg_su_evh(AVD_CL_CB *cb, AVD_EVT *evt) {

   /* log an error since this shouldn't happen */

-  LOG_EM("%s:%u: %u", __FILE__, __LINE__, n2d_msg->msg_info.n2d_reg_su.error);
+  LOG_ER("%s:%u: %u", __FILE__, __LINE__, n2d_msg->msg_info.n2d_reg_su.error);

   /* call the routine to failover all the effected nodes
* due to restarting this node
@@ -1041,7 +1041,7 @@ void avd_data_update_req_evh(AVD_CL_CB *cb, AVD_EVT *evt) 
{
   break;
 default:
   /* log error that a the object value is invalid */
-  LOG_EM("%s:%u: %u", __FILE__, __LINE__,
+  LOG_ER("%s:%u: %u", __FILE__, __LINE__,
  n2d_msg->msg_info.n2d_data_req.param_info.attr_id);
   break;
   } /* switch(n2d_msg->msg_info.n2d_data_req.param_info.obj_id) */
@@ -1168,7 +1168,7 @@ void avd_data_update_req_evh(AVD_CL_CB *cb, AVD_EVT *evt) 
{
   break;
 default:
   /* log error that a the object value is invalid */
-  LOG_EM("%s:%u: %u", __FILE__, __LINE__,
+  LOG_ER("%s:%u: %u", __FILE__, __LINE__,
  n2d_msg->msg_info.n2d_data_req.param_info.attr_id);
   break;
   } /* switch(n2d_msg->msg_info.n2d_data_req.param_info.obj_id) */
@@ -1177,7 +1177,7 @@ void avd_data_update_req_evh(AVD_CL_CB *cb, AVD_EVT *evt) 
{
 }
 default:
   /* log error that a the table value is invalid */
-  LOG_EM("%s:%u: %u", __FILE__, __LINE__,
+  LOG_ER("%s:%u: %u", __FILE__, __LINE__,
  n2d_msg->msg_info.n2d_data_req.param_info.class_id);
   goto done;
   break;
diff --git a/src/amf/amfd/role.cc b/src/amf/amfd/role.cc
index 15b0458d2..24374de7c 100644
--- a/src/amf/amfd/role.cc
+++ b/src/amf/amfd/role.cc
@@ -598,7 +598,7 @@ static uint32_t avd_role_failover_qsd_actv(AVD_CL_CB *cb, 
SaAmfHAStateT role) {
do node down processing for other node */
 avd_node_mark_absent(avnd_other);
   } else {
-LOG_EM("%s:%u: %u", __FILE__, __LINE__, NCSCC_RC_FAILURE);
+LOG_ER("%s:%u: %u", __FILE__, __LINE__, NCSCC_RC_FAILURE);
   }

   return NCSCC_RC_SUCCESS;
@@ -701,7 +701,7 @@ void avd_role_switch_ncs_su_evh(AVD_CL_CB *cb, AVD_EVT 
*evt) {

   /* get the avnd from node_id */
   if (nullptr == (avnd = avd_node_find_nodeid(cb->node_id_avd))) {
-LOG_EM("%s:%u: %u", __FILE__, __LINE__, cb->node_id_avd);
+LOG_ER("%s:%u: %u", __FILE__, __LINE__, cb->node_id_avd);
 return;
   }
   other_avnd = avd_node_find_nodeid(cb->node_id_avd_other);
@@ -852,12 +852,12 @@ try_again:
   if (NCSCC_RC_SUCCESS !=
   (status = avsv_set_ckpt_role(cb, SA_AMF_HA_QUIESCED))) {
 /* Log error */
-LOG_EM("%s:%u: %u", __FILE__, __LINE__, status);
+LOG_ER("%s:%u: %u", __FILE__, __LINE__, status);
   }

   /* Now Dispatch all the messages from the MBCSv mail-box */
   if (NCSCC_RC_SUCCESS != (rc = avsv_mbcsv_dispatch(cb, SA_DISPATCH_ALL))) {
-LOG_EM("%s:%u: %u", __FILE__, __LINE__, cb->node_id_avd_other);
+LOG_ER("%s:%u: %u", __FILE__, __LINE__, cb->node_id_avd_other);
 cb->swap_switch = false;
 return;
   }
diff --git a/src/amf/amfd/sg_2n_fsm.cc b/src/amf/amfd/sg_2n_fsm.cc
index e38288db7..525e30049 100644
--- a/src/amf/amfd/sg_2n_fsm.cc
+++

Re: [devel] [PATCH 1/1] amfnd: fix unexpected reboot after split-brain recovery [#3162]

2020-03-05 Thread Gary Lee

Hi Thuan

One comment inline with [GL].

Thanks
Gary


From: Thuan Tran 
Sent: 04 March 2020 18:28
To: Thang Duc Nguyen ; Minh Hon Chau 
; Gary Lee 
Cc: opensaf-devel@lists.sourceforge.net ; 
Thuan Tran 
Subject: [PATCH 1/1] amfnd: fix unexpected reboot after split-brain recovery 
[#3162]

- Split-brain recovery in headless enable, IMMND may expected restart.
If AMFND not wait IMMND restart but reinit CLM, CLM callback trigger,
clm_to_amf_node() is called then AMFND stuck in init IMM OM causes delay
restart IMMND, delay resend node_up then AMFD will order reboot node.
- Do not trigger saClmDispatch() if immnd down.
---
 src/amf/amfnd/avnd_cb.h |  1 +
 src/amf/amfnd/clc.cc| 10 ++
 src/amf/amfnd/main.cc   |  4 +++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfnd/avnd_cb.h b/src/amf/amfnd/avnd_cb.h
index 8b0cc2304..0fa0590ff 100644
--- a/src/amf/amfnd/avnd_cb.h
+++ b/src/amf/amfnd/avnd_cb.h
@@ -125,6 +125,7 @@ typedef struct avnd_cb_tag {
   SaTimeT scs_absence_max_duration;
   /* the timer for supervision of the absence of SC */
   AVND_TMR sc_absence_tmr;
+  bool immnd_down;
 } AVND_CB;

 #define AVND_CB_NULL ((AVND_CB *)0)
diff --git a/src/amf/amfnd/clc.cc b/src/amf/amfnd/clc.cc
index f78e1a707..227bf6a5a 100644
--- a/src/amf/amfnd/clc.cc
+++ b/src/amf/amfnd/clc.cc
@@ -3106,6 +3106,9 @@ uint32_t avnd_comp_clc_cmd_execute(AVND_CB *cb, AVND_COMP 
*comp,
   unsigned int i;
   SaStringT env;
   size_t env_set_nmemb;
+  size_t comma = comp->saAmfCompType.find_last_of(",");
+  size_t end = comp->saAmfCompType.length();
+  std::string compBaseType = comp->saAmfCompType.substr(comma + 1, end);

   TRACE_ENTER2("'%s':CLC CLI command type:'%s'", comp->name.c_str(),
clc_cmd_type[cmd_type]);
@@ -,6 +3336,13 @@ uint32_t avnd_comp_clc_cmd_execute(AVND_CB *cb, 
AVND_COMP *comp,
 // outcome of command is reported in comp_clc_resp_callback()
   }

+  if (compBaseType.compare("safCompType=OpenSafCompTypeIMMND") == 0) {
+if (cmd_type == AVND_COMP_CLC_CMD_TYPE_CLEANUP)
+  cb->immnd_down = true;
+else if (cmd_type == AVND_COMP_CLC_CMD_TYPE_INSTANTIATE)
+  cb->immnd_down = false;
+  }
+
   TRACE_2("success");
   goto done;

diff --git a/src/amf/amfnd/main.cc b/src/amf/amfnd/main.cc
index d7857fabe..447e2aa82 100644
--- a/src/amf/amfnd/main.cc
+++ b/src/amf/amfnd/main.cc
@@ -334,6 +334,7 @@ AVND_CB *avnd_cb_create() {

   cb->is_avd_down = true;
   cb->amfd_sync_required = false;
+  cb->immnd_down = false;

   // retrieve hydra configuration from IMM
   hydra_config_get(cb);
@@ -609,7 +610,8 @@ void avnd_main_process(void) {
   exit(0);
 }

-if (avnd_cb->clmHandle && (fds[FD_CLM].revents & POLLIN)) {
+if (!avnd_cb->immnd_down && avnd_cb->clmHandle &&
+(fds[FD_CLM].revents & POLLIN)) {

[GL] I think, in general, it's probably bad practise to skip an event when it 
is ready to be processed. This could end up in a tight loop, spiking CPU usage.

   // LOG_NO("DEBUG-> CLM event fd: %d sel_obj: %llu, clm handle: %llu",
   // fds[FD_CLM].fd, avnd_cb->clm_sel_obj, avnd_cb->clmHandle);
   result = saClmDispatch(avnd_cb->clmHandle, SA_DISPATCH_ALL);
--
2.17.1


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] osaf: fix etcd3.plugin watch takeover_request [#3158]

2020-02-24 Thread Gary Lee

Hi Thuan

ack (review) with minor comment.

Here -a is used for AND but && is used elsewhere in the file. We could be more 
consistent.

#value is cleaned after a lease time, keep watching

Maybe rechange to

#value is cleared after lease time, keep watching

Thanks
Gary

From: Thuan Tran 
Sent: 20 February 2020 22:21
To: Gary Lee ; Vu Minh Nguyen 
; Minh Hon Chau ; Thang 
Duc Nguyen 
Cc: opensaf-devel@lists.sourceforge.net ; 
Thuan Tran 
Subject: [PATCH 1/1] osaf: fix etcd3.plugin watch takeover_request [#3158]

After reject a takeover_request, value is cleaned after a lease time
then it mistaken raise a change value become empty. It leads to osafrded
handle and reboot itself as lost connectivity to consensus.
---
 src/osaf/consensus/plugins/etcd3.plugin | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/osaf/consensus/plugins/etcd3.plugin 
b/src/osaf/consensus/plugins/etcd3.plugin
index 60559a0e9..e8fa6b6e7 100644
--- a/src/osaf/consensus/plugins/etcd3.plugin
+++ b/src/osaf/consensus/plugins/etcd3.plugin
@@ -362,6 +362,14 @@ watch() {
   return 1
 fi
   elif [ "$orig_value" != "$current_value" ]; then
+if [ "$watch_key" == "$takeover_request" ]; then
+  state=$(echo $orig_value | awk '{print $4}')
+  if [ "$state" == "REJECTED" -a -z "$current_value" ]; then
+#value is cleaned after a lease time, keep watching
+orig_value=""
+continue
+  fi
+fi
 echo $current_value
 return 0
   fi
--
2.17.1


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] clmd: retry once to send message to clmna [#3156]

2020-02-17 Thread Gary Lee

Hi Thuan

Would this be simpler?

+  while (retry < 1) {
+rc = clms_mds_msg_send(cb, _msg, >fr_dest, >mds_ctxt,
+  MDS_SEND_PRIORITY_HIGH, NCSMDS_SVC_ID_CLMNA);
+if (rc != NCSCC_RC_SUCCESS) {
+  ...
+  osaf_nanosleep();
+   ++retry;
+} else {
+  break;
+}
+  }

Thanks
Gary


From: Thuan Tran 
Sent: 18 February 2020 17:38
To: Vu Minh Nguyen ; Minh Hon Chau 
; Thang Duc Nguyen ; 
Gary Lee 
Cc: opensaf-devel@lists.sourceforge.net ; 
Thuan Tran 
Subject: [PATCH 1/1] clmd: retry once to send message to clmna [#3156]

- If a node reboot up, clmna svc_up is not yet come but clmd
get message join request then send message back clmna failed.
It leads to amfnd timeout init clm agent and delay send node up.
This may cause amfd order reboot that node if node up delay
(osafAmfDelayNodeFailoverNodeWaitTimeout) is set smaller than
total time amfnd retry until init clm agent successfully.
- One retry to send messsage to clmna help avoid this scenario.
---
 src/clm/clmd/clms_evt.cc | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/clm/clmd/clms_evt.cc b/src/clm/clmd/clms_evt.cc
index 1059c6cfa..59e9c4156 100644
--- a/src/clm/clmd/clms_evt.cc
+++ b/src/clm/clmd/clms_evt.cc
@@ -34,6 +34,7 @@
 #include "base/logtrace.h"
 #include "base/ncsgl_defs.h"
 #include "base/osaf_utility.h"
+#include "base/osaf_time.h"
 #include "clm/clmd/clms.h"

 static uint32_t process_api_evt(CLMSV_CLMS_EVT *evt);
@@ -535,6 +536,7 @@ uint32_t proc_node_up_msg(CLMS_CB *cb, CLMSV_CLMS_EVT *evt) 
{
   SaNameT node_name = {0};
   CLMSV_MSG clm_msg;
   SaBoolT check_member;
+  int retry = 0;

   TRACE_ENTER2("Node up mesg for nodename length %d %s",
nodeup_info->node_name.length, nodeup_info->node_name.value);
@@ -636,8 +638,20 @@ uint32_t proc_node_up_msg(CLMS_CB *cb, CLMSV_CLMS_EVT 
*evt) {
   clm_msg.info.api_resp_info.type = CLMSV_CLUSTER_JOIN_RESP;
   clm_msg.info.api_resp_info.param.node_name = node_name;
   /*rc will be updated down in the positive flow */
-  rc = clms_mds_msg_send(cb, _msg, >fr_dest, >mds_ctxt,
- MDS_SEND_PRIORITY_HIGH, NCSMDS_SVC_ID_CLMNA);
+  do {
+rc = clms_mds_msg_send(cb, _msg, >fr_dest, >mds_ctxt,
+  MDS_SEND_PRIORITY_HIGH, NCSMDS_SVC_ID_CLMNA);
+if (rc != NCSCC_RC_SUCCESS && retry < 1) {
+  /* If a node reboot up, clmna svc_up is not yet come but clmd
+   * get message join request then send message back clmna failed.
+   * It leads to amfnd timeout init clm agent and delay send node up.
+   * This may cause amfd order reboot that node if node up delay
+   * (osafAmfDelayNodeFailoverNodeWaitTimeout) is set smaller than
+   * total time amfnd retry until init clm agent successfully.
+   * If retry here, it would help avoid this scenario */
+  osaf_nanosleep();
+}
+  } while (rc != NCSCC_RC_SUCCESS && retry++ < 1);
   /*if mds send failed, we need to report failure */
   if (rc != NCSCC_RC_SUCCESS) {
 LOG_NO("%s: send failed. dest:%" PRIx64, __FUNCTION__, evt->fr_dest);
--
2.17.1


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] rde: correct to promote node to active [#3108]

2020-02-04 Thread Gary Lee

Hi

Ack (tested)

-Original Message-
From: thang.d.nguyen [mailto:thang.d.ngu...@dektech.com.au] 
Sent: Tuesday, 4 February 2020 1:37 PM
To: Gary Lee 
Cc: opensaf-devel@lists.sourceforge.net; Thang Duc Nguyen

Subject: [PATCH 1/1] rde: correct to promote node to active [#3108]

If relaxed node promotion is enabled, allow this node to be promoted active
if it can see a peer SC and this node has the lowest node ID.
---
 src/rde/rded/role.cc | 14 +-  src/rde/rded/role.h  |  1 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc index
593ccf0eb..7ca020d5d 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -260,7 +260,8 @@ bool Role::IsCandidate() {
   // if relaxed node promotion is enabled, allow this node to be promoted
   // active if it can see a peer SC and this node has the lowest node ID
   if (consensus_service.IsRelaxedNodePromotionEnabled() == true &&
-  cb->state == State::kNotActiveSeenPeer) {
+  cb->state == State::kNotActiveSeenPeer &&
+  IsLowestNodeid() == true) {
 LOG_NO("Relaxed node promotion enabled. This node is a candidate.");
 result = true;
   }
@@ -279,6 +280,17 @@ bool Role::IsPeerPresent() {
   return result;
 }
 
+bool Role::IsLowestNodeid() {
+  bool result = true;
+  RDE_CONTROL_BLOCK* cb = rde_get_control_block();
+
+  for (auto peer_id : cb->peer_controllers) {
+if (peer_id < own_node_id_)
+  return false;
+  }
+  return result;
+}
+
 uint32_t Role::SetRole(PCS_RDA_ROLE new_role) {
   TRACE_ENTER();
   PCS_RDA_ROLE old_role = role_;
diff --git a/src/rde/rded/role.h b/src/rde/rded/role.h index
9c63cbe7b..9bf1b10bd 100644
--- a/src/rde/rded/role.h
+++ b/src/rde/rded/role.h
@@ -38,6 +38,7 @@ class Role {
   void AddPeer(NODE_ID node_id);
   bool IsCandidate();
   bool IsPeerPresent();
+  bool IsLowestNodeid();
   void SetPeerState(PCS_RDA_ROLE node_role, NODE_ID node_id);
   timespec* Poll(timespec* ts);
   uint32_t SetRole(PCS_RDA_ROLE new_role);
--
2.17.1



smime.p7s
Description: S/MIME cryptographic signature
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] fmd: Do not send RDE to set active role if opensaf_quick_reboot is executed [#3146]

2020-01-23 Thread Gary Lee

Hi Minh

ack

—

From: Minh Chau 
Sent: Friday, January 24, 2020 11:35:29 AM
To: Gary Lee 
Cc: opensaf-devel@lists.sourceforge.net ; 
Minh Hon Chau 
Subject: [PATCH 1/1] fmd: Do not send RDE to set active role if 
opensaf_quick_reboot is executed [#3146]

If a SC is separated from cluster, fmd calls opensaf_quick_reboot().
The reboot script returns yet the node has not been coming down.
In the code after opensaf_quick_reboot(), fmd tells rde to promote
to active. Hence, there is a short period of having two 2 active SC

This patch makes fmd to stop sending to RDE to set active role after
opensaf_quick_reboot().

Note: There are a few places after opensaf_quick_reboot(), the function
does not return. However, this patch only fixes the issue in fm, the
other places will be re-visited.
---
 src/fm/fmd/fm_rda.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/fm/fmd/fm_rda.cc b/src/fm/fmd/fm_rda.cc
index fca417f79..479eb2149 100644
--- a/src/fm/fmd/fm_rda.cc
+++ b/src/fm/fmd/fm_rda.cc
@@ -86,6 +86,7 @@ void promote_node(FM_CB *fm_cb) {
 LOG_ER("Unable to set active controller in consensus service");
 opensaf_quick_reboot("Unable to set active controller "
   "in consensus service");
+return;
   } else if (rc == SA_AIS_ERR_EXIST) {
 // @todo if we don't reboot, we don't seem to recover from this. Can we
 // improve?
@@ -94,6 +95,7 @@ void promote_node(FM_CB *fm_cb) {
 "cluster?");
 opensaf_quick_reboot("A controller is already active. We were separated "
  "from the cluster?");
+return;
   }

   PCS_RDA_REQ rda_req;
--
2.20.1


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] rde: Reboot node if another active controller is detected [#3142]

2020-01-15 Thread Gary Lee

Hi Minh

Ack with comment. Please include this explaination.

+// a reboot is required, as clmna on other nodes may not start
+// an election because it thinks this node is going to be active
+opensaf_quick_reboot("Another controller is already active");

Thanks
Gary


From: Minh Chau 
Sent: 16 January 2020 13:06
To: Gary Lee ; hans.nordeb...@ericsson.com 
; Vu Minh Nguyen 
Cc: opensaf-devel@lists.sourceforge.net ; 
Minh Hon Chau 
Subject: [PATCH 1/1] rde: Reboot node if another active controller is detected 
[#3142]

---
 src/rde/rded/role.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index b890117..9446ccb 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -107,6 +107,7 @@ void Role::PromoteNode(const uint64_t cluster_size,
   rc = consensus_service.PromoteThisNode(true, cluster_size);
   if (rc == SA_AIS_ERR_EXIST) {
 LOG_WA("Another controller is already active");
+opensaf_quick_reboot("Another controller is already active");
 return;
   } else if (rc != SA_AIS_OK && relaxed_mode == true) {
 LOG_WA("Unable to set active controller in consensus service");
--
2.7.4


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] rde: Reboot node if another active controller is detected [#3142]

2020-01-15 Thread Gary Lee

Hi Minh

ack


From: Minh Chau 
Sent: 16 January 2020 13:06
To: Gary Lee ; hans.nordeb...@ericsson.com 
; Vu Minh Nguyen 
Cc: opensaf-devel@lists.sourceforge.net ; 
Minh Hon Chau 
Subject: [PATCH 1/1] rde: Reboot node if another active controller is detected 
[#3142]

---
 src/rde/rded/role.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index b890117..9446ccb 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -107,6 +107,7 @@ void Role::PromoteNode(const uint64_t cluster_size,
   rc = consensus_service.PromoteThisNode(true, cluster_size);
   if (rc == SA_AIS_ERR_EXIST) {
 LOG_WA("Another controller is already active");
+opensaf_quick_reboot("Another controller is already active");
 return;
   } else if (rc != SA_AIS_OK && relaxed_mode == true) {
 LOG_WA("Unable to set active controller in consensus service");
--
2.7.4


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] log: fix memory leak that was introduced in 3116 [#3138]

2020-01-09 Thread Gary Lee

Hi Vu

ack (review only)


From: Vu Minh Nguyen 
Sent: 09 January 2020 21:51
To: Minh Hon Chau ; Gary Lee 
Cc: opensaf-devel@lists.sourceforge.net ; 
Vu Minh Nguyen 
Subject: [PATCH 1/1] log: fix memory leak that was introduced in 3116 [#3138]

---
 src/log/logd/lgs_evt.cc | 3 +++
 src/log/logd/lgs_mbcsv_cache.cc | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/src/log/logd/lgs_evt.cc b/src/log/logd/lgs_evt.cc
index 7501a282b..f169ea1e9 100644
--- a/src/log/logd/lgs_evt.cc
+++ b/src/log/logd/lgs_evt.cc
@@ -1348,6 +1348,7 @@ static uint32_t proc_write_log_async_msg(lgs_cb_t *cb, 
lgsv_lgs_evt_t *evt) {
stream->fixedLogRecordSize, buf_size, logOutputString,
++stream->logRecordId, node_name)) == 0) {
 AckToWriteAsync(param, evt->fr_dest, SA_AIS_ERR_INVALID_PARAM);
+free(logOutputString);
 return NCSCC_RC_SUCCESS;
   }

@@ -1356,6 +1357,8 @@ static uint32_t proc_write_log_async_msg(lgs_cb_t *cb, 
lgsv_lgs_evt_t *evt) {
   evt->fr_dest, node_name);
   auto data = std::make_shared(info, logOutputString, n);
   Cache::instance()->Write(data);
+
+  lgs_free_write_log(param);
   return NCSCC_RC_SUCCESS;
 }

diff --git a/src/log/logd/lgs_mbcsv_cache.cc b/src/log/logd/lgs_mbcsv_cache.cc
index cde26432a..b190c5bea 100644
--- a/src/log/logd/lgs_mbcsv_cache.cc
+++ b/src/log/logd/lgs_mbcsv_cache.cc
@@ -230,6 +230,8 @@ uint32_t ckpt_proc_pop_write_async(lgs_cb_t* cb, void* 
data) {
   if (top->seq_id_ != seq_id) {
 LOG_ER("Out of sync! Expected seq: (%" PRIu64 "), Got: (%" PRIu64 ")",
seq_id, top->seq_id_);
+lgs_free_edu_mem(param->log_record);
+lgs_free_edu_mem(param->log_file);
 return NCSCC_RC_FAILURE;
   }

--
2.17.1


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amf: allow update node failover state in cold sync [#3136]

2019-12-30 Thread Gary Lee

Hi Thuan

Ack

Thanks
Gary


From: thuan.tran 
Sent: 30 December 2019 21:20
To: Thang Duc Nguyen ; Gary Lee 
; Minh Hon Chau 
Cc: opensaf-devel@lists.sourceforge.net ; 
Thuan Tran 
Subject: [PATCH 1/1] amf: allow update node failover state in cold sync [#3136]

Nodes joined during cold sync is not updated failover state
to standby amfd cause later standby amfd failover to active
will mistakenly order reboot these nodes.
---
 src/amf/amfd/chkop.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/amf/amfd/chkop.cc b/src/amf/amfd/chkop.cc
index 15408b657..1ed6dd632 100644
--- a/src/amf/amfd/chkop.cc
+++ b/src/amf/amfd/chkop.cc
@@ -1347,6 +1347,7 @@ static uint32_t avsv_validate_reo_type_in_csync(AVD_CL_CB 
*cb,
 case AVSV_CKPT_AVND_NODE_STATE:
 case AVSV_CKPT_AVND_RCV_MSG_ID:
 case AVSV_CKPT_AVND_SND_MSG_ID:
+case AVSV_CKPT_NODE_FAILOVER_STATE:
   if (cb->synced_reo_type >= AVSV_CKPT_AVD_NODE_CONFIG)
 status = NCSCC_RC_SUCCESS;
   break;
--
2.17.1


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/5] log: improve the resilience of log service [#3116]

2019-12-23 Thread Gary Lee

Hi Vu

Very, very minor comments with [GL].

Thanks
Gary

-Original Message-
From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] 
Sent: Thursday, 28 November 2019 7:24 PM
To: lennart.l...@ericsson.com; Gary Lee ; Minh Hon
Chau 
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen

Subject: [PATCH 1/5] log: improve the resilience of log service [#3116]

In order to improve resilience of OpenSAF LOG service when underlying
file system is unresponsive, a queue is introduced to hold async
write request up to an configurable time that is around 15 - 30 seconds.

The readiness of the I/O thread will periodically check, and if it turns
to ready state, the front element will go first. Returns
SA_AIS_ERR_TRY_AGAIN
to client if the element stays in the queue longer than the setting time.

The queue capacity and the resilient time are configurable via the
attributes:
`logMaxPendingWriteRequests` and `logResilienceTimeout`.

In default, this feature is disabled to keep log server backward compatible.
---
 src/log/Makefile.am  |  21 +-
 src/log/config/logsv_classes.xml |  43 ++-
 src/log/logd/lgs_cache.cc| 469 +++
 src/log/logd/lgs_cache.h | 287 +++
 src/log/logd/lgs_config.cc   |  78 -
 src/log/logd/lgs_config.h|  10 +-
 src/log/logd/lgs_evt.cc  | 161 +++
 src/log/logd/lgs_evt.h   |  10 +
 src/log/logd/lgs_file.cc |   8 +-
 src/log/logd/lgs_filehdl.cc  |  58 ++--
 src/log/logd/lgs_imm.cc  |  40 ++-
 src/log/logd/lgs_main.cc |  24 +-
 src/log/logd/lgs_mbcsv.cc| 447 +++--
 src/log/logd/lgs_mbcsv.h |  19 +-
 src/log/logd/lgs_mbcsv_cache.cc  | 372 
 src/log/logd/lgs_mbcsv_cache.h   | 110 
 src/log/logd/lgs_mbcsv_v1.cc |   1 +
 src/log/logd/lgs_mbcsv_v2.cc |   2 +
 18 files changed, 1889 insertions(+), 271 deletions(-)
 create mode 100644 src/log/logd/lgs_cache.cc
 create mode 100644 src/log/logd/lgs_cache.h
 create mode 100644 src/log/logd/lgs_mbcsv_cache.cc
 create mode 100644 src/log/logd/lgs_mbcsv_cache.h

diff --git a/src/log/Makefile.am b/src/log/Makefile.am
index f63a4a053..3367ef4f6 100644
--- a/src/log/Makefile.am
+++ b/src/log/Makefile.am
@@ -95,7 +95,9 @@ noinst_HEADERS += \
src/log/logd/lgs_nildest.h \
src/log/logd/lgs_unixsock_dest.h \
src/log/logd/lgs_common.h \
-   src/log/logd/lgs_amf.h
+   src/log/logd/lgs_amf.h \
+   src/log/logd/lgs_cache.h \
+   src/log/logd/lgs_mbcsv_cache.h
 
 
 bin_PROGRAMS += bin/saflogger
@@ -123,6 +125,15 @@ bin_osaflogd_CPPFLAGS = \
-DSA_EXTENDED_NAME_SOURCE \
$(AM_CPPFLAGS)
 
+# Enable this flag to simulate the case that file system is unresponsive
+# during write log record. Mainly for testing the following enhancement:
+# log: improve the resilience of log service [#3116].
+# When enabled, log handle thread will be suspended 17 seconds every 02
write
+# requests and only take affect if the `logMaxPendingWriteRequests` is set
+# to an non-zero value.
+bin_osaflogd_CPPFLAGS += -DSIMULATE_NFS_UNRESPONSE
+
+
 bin_osaflogd_SOURCES = \
src/log/logd/lgs_amf.cc \
src/log/logd/lgs_clm.cc \
@@ -147,7 +158,9 @@ bin_osaflogd_SOURCES = \
src/log/logd/lgs_util.cc \
src/log/logd/lgs_dest.cc \
src/log/logd/lgs_nildest.cc \
-   src/log/logd/lgs_unixsock_dest.cc
+   src/log/logd/lgs_unixsock_dest.cc \
+   src/log/logd/lgs_cache.cc \
+   src/log/logd/lgs_mbcsv_cache.cc
 
 bin_osaflogd_LDADD = \
lib/libosaf_common.la \
@@ -183,6 +196,10 @@ bin_logtest_CPPFLAGS = \
-DSA_EXTENDED_NAME_SOURCE \
$(AM_CPPFLAGS)
 
+# Enable this flag to add test cases for following enhancement:
+# log: improve the resilience of log service [#3116].
+bin_logtest_CPPFLAGS += -DSIMULATE_NFS_UNRESPONSE
+
 bin_logtest_SOURCES = \
src/log/apitest/logtest.c \
src/log/apitest/logutil.c \
diff --git a/src/log/config/logsv_classes.xml
b/src/log/config/logsv_classes.xml
index 9359823ff..084e8915d 100644
--- a/src/log/config/logsv_classes.xml
+++ b/src/log/config/logsv_classes.xml
@@ -195,7 +195,7 @@ to ensure that default global values in the
implementation are also changed acco
SA_UINT32_T
SA_CONFIG
SA_WRITABLE
-1024
+   1024


logStreamFileFormat
@@ -208,42 +208,42 @@ to ensure that default global values in the
implementation are also changed acco
SA_UINT32_T
SA_CONFIG
SA_WRITABLE
-0
+   0


logStreamSystemLowLimit
SA_UINT32_T
SA_CONFIG
SA_WRITABLE

Re: [devel] [PATCH 4/5] log: update README file for improvement of log resilience [#3116]

2019-12-23 Thread Gary Lee

Hi Vu

Very minor comments with [GL].

Gary

-Original Message-
From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au] 
Sent: Thursday, 28 November 2019 7:25 PM
To: lennart.l...@ericsson.com; Gary Lee ; Minh Hon 
Chau 
Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen 

Subject: [PATCH 4/5] log: update README file for improvement of log resilience 
[#3116]

---
 src/log/README | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/src/log/README b/src/log/README index b83d472e4..ab96a8157 100644
--- a/src/log/README
+++ b/src/log/README
@@ -764,3 +764,41 @@ on AMF role is unnecessary delay the CLM state of a Node  
(CLM state will available as soon as CLM started), so LGS is a taking  AVD Up 
event as trigger to do CLM initialize.
  
+
+4. Improve the resilience of OpenSAF LOG service (#3116)
+-
+When the file system is unresponsive, log client gets try-again from 
+write callback very shortly after I/O timeout reaches the setting; the 

[GL] "reaches the setting" sounds confusing. What setting?

+value of I/O timeout is configurable via the attribute logFileIoTimeout 
+within this valid range [500ms – 5000ms]. This is legacy behavior.
+
+This ticket improves the resilience of LOG service, so that log service 

+can cache async write requests up to an configurable time that is 

[GL] a configurable

+around 15-30 seconds before returning status to log client via write async 
callback.
+
+The cache size is configurable via a new attribute 
`logMaxPendingWriteRequests`.
+Default value is zero (0) - means this feature is disabled. The valid 
+range is [current queue size - 1000]. To know what is the current size 
+of the queue, fetching the value of pure runtime attribute 

[GL] To find the current size of the queue, fetch the ...

+`logCurrentPendingWriteRequests` of `OpenSafLogCurrentConfig` class. 
+When the cache size reaches the limit, all coming requests will get 
+acknowledgement right away with SA_AIS_ERR_TRY_AGAIN.
+
+The resilient timeout can also be configurable via a new attribute 
+`logResilienceTimeout`. The valid range is [15-30] seconds. When a 
+pending write async can be dropped and removed from the queue in cases:
+a) Stays in the queue longer than the given resilient timeout.
+b) The targeting stream has been closed.
+
+The queue is always kept in sync with standby.
+
+Besides, log agent has a light list keeping track all invocations which 
+not yet get acknowledgements from log server. If cluster goes to 
+headless; in other words, log server is disappeared and all cached data 
+has been lost, log agent (library) will notify all lost invocations to 
+log client via write async callback with SA_AIS_ERR_TRY_AGAIN error code.
+
+To test this feature, a gcc flag is added during compile time to 
+simulate the case the underlying file system is unresponsive, and it 
+only takes affect when the cache size is given to an non-zero value. 

[G][ it only takes effect when the cache size is set to a non-zero value

+With that, the I/O thread will sleep *16 seconds* every 02 write requests.
\ No newline at end of file
--
2.17.1



smime.p7s
Description: S/MIME cryptographic signature
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfd: Fix the data types of attributes inconsistency in get_config() [#3128]

2019-12-16 Thread Gary Lee

Hi

Ack ( review )

thanks
Gary

—

From: phuc.h.chau 
Sent: Monday, December 16, 2019 6:59:38 PM
To: Vu Minh Nguyen 
Cc: opensaf-devel@lists.sourceforge.net 
Subject: [devel] [PATCH 1/1] amfd: Fix the data types of attributes 
inconsistency in get_config() [#3128]

In Amfd, for Configuration::get_config(), object osafAmfDelayNodeFailoverTimeout
and osafAmfDelayNodeFailoverNodeWaitTimeout are time_t, but the method uses
uint32_t to hold the values of those attributes it leads to the stack memory 
corrupted
---
 src/amf/amfd/config.cc | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)
 mode change 100644 => 100755 src/amf/amfd/config.cc

diff --git a/src/amf/amfd/config.cc b/src/amf/amfd/config.cc
old mode 100644
new mode 100755
index af72840..375f050
--- a/src/amf/amfd/config.cc
+++ b/src/amf/amfd/config.cc
@@ -43,20 +43,20 @@ static void ccb_apply_modify_hdlr(struct 
CcbUtilOperationData *opdata) {
   configuration->restrict_auto_repair(enabled);
 } else if (!strcmp(attr_mod->modAttr.attrName,
 "osafAmfDelayNodeFailoverTimeout")) {
-  uint32_t delay = 0;  // default to 0 if attribute is blank
+  time_t delay = 0;  // default to 0 if attribute is blank
   if (attr_mod->modType != SA_IMM_ATTR_VALUES_DELETE &&
   attr_mod->modAttr.attrValues != nullptr) {
-delay = (*((SaUint32T *)attr_mod->modAttr.attrValues[0]));
+delay = (*((time_t *)attr_mod->modAttr.attrValues[0]));
   }
   avd_cb->node_failover_delay = delay;
   TRACE("osafAmfDelayNodeFailoverTimeout changed to '%llu'",
  avd_cb->node_failover_delay);
 } else if (!strcmp(attr_mod->modAttr.attrName,
 "osafAmfDelayNodeFailoverNodeWaitTimeout")) {
-  uint32_t delay = kDefaultNodeWaitTime;
+  time_t delay = kDefaultNodeWaitTime;
   if (attr_mod->modType != SA_IMM_ATTR_VALUES_DELETE &&
   attr_mod->modAttr.attrValues != nullptr) {
-delay = (*((SaUint32T *)attr_mod->modAttr.attrValues[0]));
+delay = (*((time_t *)attr_mod->modAttr.attrValues[0]));
   }
   avd_cb->node_failover_node_wait = delay;
   TRACE("osafAmfDelayNodeFailoverNodeWaitTimeout changed to '%llu'",
@@ -166,18 +166,19 @@ SaAisErrorT Configuration::get_config(void) {
  (SaImmAttrValuesT_2 ***)) ==
  SA_AIS_OK) {
 uint32_t value;
+time_t time_value;
 TRACE("reading configuration '%s'", osaf_extended_name_borrow());
 if (immutil_getAttr("osafAmfRestrictAutoRepairEnable", attributes, 0,
 ) == SA_AIS_OK) {
   configuration->restrict_auto_repair(static_cast(value));
 }
 if (immutil_getAttr("osafAmfDelayNodeFailoverTimeout", attributes, 0,
-) == SA_AIS_OK) {
-  avd_cb->node_failover_delay = value;
+_value) == SA_AIS_OK) {
+  avd_cb->node_failover_delay = time_value;
 }
 if (immutil_getAttr("osafAmfDelayNodeFailoverNodeWaitTimeout", attributes, 
0,
-) == SA_AIS_OK) {
-  avd_cb->node_failover_node_wait = value;
+_value) == SA_AIS_OK) {
+  avd_cb->node_failover_node_wait = time_value;
 }
   }

--
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for osaf: return a help message if no parameter is specified [#3118]

2019-11-12 Thread Gary Lee

Summary: osaf: return a help message if no parameter is specified [#3118]
Review request for Ticket(s): 3118
Peer Reviewer(s): Minh, Thuan
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3118
Base revision: 2a7ec1f63710f9e8f679bbceb18032e0ebb1b46a
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y 
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 4fd8ba91a1943a6ed696f86763b6ee804bccc27c
Author: Gary Lee 
Date:   Wed, 13 Nov 2019 17:09:35 +1100

osaf: return a help message if no parameter is specified [#3118]



Complete diffstat:
--
 src/osaf/consensus/plugins/tcp/tcp.plugin | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)


Testing Commands:
-
Run tcp.plugin without an argument

Testing, Expected Results:
--
A help message should be printed instead of crashing

Conditions of Submission:
-
Ack or in 7 days


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] osaf: return a help message if no parameter is specified [#3118]

2019-11-12 Thread Gary Lee

---
 src/osaf/consensus/plugins/tcp/tcp.plugin | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/osaf/consensus/plugins/tcp/tcp.plugin 
b/src/osaf/consensus/plugins/tcp/tcp.plugin
index 1b5ddf5..0be20fc 100755
--- a/src/osaf/consensus/plugins/tcp/tcp.plugin
+++ b/src/osaf/consensus/plugins/tcp/tcp.plugin
@@ -149,7 +149,12 @@ class ArbitratorPlugin(object):
 params = []
 if args:
 params.append(args)
-return getattr(self, command)(*params)
+if command:
+return getattr(self, command)(*params)
+else:
+ret = {'code': 0,
+   'output': parser.format_help()}
+return ret
 
 def get_node_name(self):
 node_file = open(self.node_name_file)
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amf: amfnd should ignore amfd down event during shutting down [#3117]

2019-11-07 Thread Gary Lee


ack (review only)

On 7/11/19 8:33 pm, thuan.tran wrote:

When cluster stop by immadm, amfnd (is shutting down) may see amfd
down event and order node reboot.
---
  src/amf/amfnd/di.cc | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/src/amf/amfnd/di.cc b/src/amf/amfnd/di.cc
index 2043c6064..1f310b949 100644
--- a/src/amf/amfnd/di.cc
+++ b/src/amf/amfnd/di.cc
@@ -664,6 +664,12 @@ uint32_t avnd_evt_mds_avd_dn_evh(AVND_CB *cb, AVND_EVT 
*evt) {
  
LOG_WA("AMF director unexpectedly crashed");
  
+  if (m_AVND_IS_SHUTTING_DOWN(cb)) {

+LOG_WA("Ignore because AMFND is in SHUTDOWN state");
+TRACE_LEAVE();
+return rc;
+  }
+
// if headless is disabled OR if the amfd down came from the local node, 
just
// reboot
if (cb->scs_absence_max_duration == 0 ||




smime.p7s
Description: S/MIME Cryptographic Signature
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfnd: reset transition descriptor during comp restart [#3103]

2019-10-20 Thread Gary Lee


Hi Alex

ack

Thanks

Gary

On 18/10/19 2:56 am, Jones, Alex wrote:

If a component is configured to restart, instead of failover, on failure,
the previous transition descriptor is passed to the CSI set callback after
the restart.

The transition descriptor is not reset by amfnd in this case.

Always reset the transition descriptor to NEW_ASSIGN during a reassignment
due to restart.
---
src/amf/amfnd/comp.cc | 3 +++
1 file changed, 3 insertions(+)

diff --git a/src/amf/amfnd/comp.cc b/src/amf/amfnd/comp.cc
index a12171c28..10c77a462 100644
--- a/src/amf/amfnd/comp.cc
+++ b/src/amf/amfnd/comp.cc
@@ -1448,6 +1448,9 @@ uint32_t avnd_comp_csi_reassign(AVND_CB *cb, 
AVND_COMP *comp) {

m_AVND_COMP_CSI_CURR_ASSIGN_STATE_SET(
curr, AVND_COMP_CSI_ASSIGN_STATE_ASSIGNING);

+ // reset the transition descriptor
+ curr->trans_desc = SA_AMF_CSI_NEW_ASSIGN;
+
/* invoke the callback */
rc = avnd_comp_cbk_send(cb, curr->comp, AVSV_AMF_CSI_SET, 0, curr);
}
--
2.20.1



Notice: This e-mail together with any attachments may contain 
information of Ribbon Communications Inc. that is confidential and/or 
proprietary for the sole use of the intended recipient. Any review, 
disclosure, reliance or distribution by others or forwarding without 
express permission is strictly prohibited. If you are not the intended 
recipient, please notify the sender immediately and then delete all 
copies, including any attachments.




smime.p7s
Description: S/MIME Cryptographic Signature
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] mds: Disable mds flow control for mds broadcast/multicast message [#3101]

2019-10-20 Thread Gary Lee


Hi Minh

ack (review only)

Thanks

On 17/10/19 2:00 pm, Minh Chau wrote:

The mds flow control has been disabled for broadcast/mulitcast unfragment
message if tipc multicast is enabled. This patch revisits and continues
with fragment messages.
---
  src/mds/mds_tipc_fctrl_intf.cc   | 47 
  src/mds/mds_tipc_fctrl_msg.h | 11 +++---
  src/mds/mds_tipc_fctrl_portid.cc | 47 ++--
  src/mds/mds_tipc_fctrl_portid.h  |  3 ++-
  4 files changed, 69 insertions(+), 39 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_intf.cc b/src/mds/mds_tipc_fctrl_intf.cc
index b803bfe..fe3dbd5 100644
--- a/src/mds/mds_tipc_fctrl_intf.cc
+++ b/src/mds/mds_tipc_fctrl_intf.cc
@@ -133,7 +133,7 @@ uint32_t process_flow_event(const Event& evt) {
kChunkAckSize, sock_buf_size);
portid_map[TipcPortId::GetUniqueId(evt.id_)] = portid;
rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
-evt.fseq_, evt.svc_id_);
+evt.fseq_, evt.svc_id_, evt.snd_type_, is_mcast_enabled);
  } else if (evt.type_ == Event::Type::kEvtRcvIntro) {
portid = new TipcPortId(evt.id_, data_sock_fd,
kChunkAckSize, sock_buf_size);
@@ -147,7 +147,7 @@ uint32_t process_flow_event(const Event& evt) {
} else {
  if (evt.type_ == Event::Type::kEvtRcvData) {
rc = portid->ReceiveData(evt.mseq_, evt.mfrag_,
-  evt.fseq_, evt.svc_id_);
+  evt.fseq_, evt.svc_id_, evt.snd_type_, is_mcast_enabled);
  }
  if (evt.type_ == Event::Type::kEvtRcvChunkAck) {
portid->ReceiveChunkAck(evt.fseq_, evt.chunk_size_);
@@ -430,6 +430,7 @@ uint32_t mds_tipc_fctrl_drop_data(uint8_t *buffer, uint16_t 
len,
  
HeaderMessage header;

header.Decode(buffer);
+  Event* pevt = nullptr;
// if mds support flow control
if ((header.pro_ver_ & MDS_PROT_VER_MASK) == MDS_PROT_FCTRL) {
  if (header.pro_id_ == MDS_PROT_FCTRL_ID) {
@@ -438,9 +439,10 @@ uint32_t mds_tipc_fctrl_drop_data(uint8_t *buffer, 
uint16_t len,
  ChunkAck ack;
  ack.Decode(buffer);
  // send to the event thread
-if (m_NCS_IPC_SEND(_events,
-new Event(Event::Type::kEvtSendChunkAck, id, ack.svc_id_,
-header.mseq_, header.mfrag_, ack.acked_fseq_, ack.chunk_size_),
+pevt = new Event(Event::Type::kEvtSendChunkAck, id, ack.svc_id_,
+header.mseq_, header.mfrag_, ack.acked_fseq_);
+pevt->chunk_size_ = ack.chunk_size_;
+if (m_NCS_IPC_SEND(_events, pevt,
  NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events, Error[%s]",
strerror(errno));
@@ -453,9 +455,9 @@ uint32_t mds_tipc_fctrl_drop_data(uint8_t *buffer, uint16_t 
len,
DataMessage data;
data.Decode(buffer);
// send to the event thread
-  if (m_NCS_IPC_SEND(_events,
-  new Event(Event::Type::kEvtDropData, id, data.svc_id_,
-  header.mseq_, header.mfrag_, header.fseq_),
+  pevt = new Event(Event::Type::kEvtDropData, id, data.svc_id_,
+  header.mseq_, header.mfrag_, header.fseq_);
+  if (m_NCS_IPC_SEND(_events, pevt,
NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
  m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events, Error[%s]",
  strerror(errno));
@@ -474,6 +476,7 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, uint16_t 
len,
  
HeaderMessage header;

header.Decode(buffer);
+  Event* pevt = nullptr;
// if mds support flow control
if ((header.pro_ver_ & MDS_PROT_VER_MASK) == MDS_PROT_FCTRL) {
  if (header.pro_id_ == MDS_PROT_FCTRL_ID) {
@@ -482,9 +485,10 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, uint16_t 
len,
  ChunkAck ack;
  ack.Decode(buffer);
  // send to the event thread
-if (m_NCS_IPC_SEND(_events,
-new Event(Event::Type::kEvtRcvChunkAck, id, ack.svc_id_,
-header.mseq_, header.mfrag_, ack.acked_fseq_, ack.chunk_size_),
+pevt = new Event(Event::Type::kEvtRcvChunkAck, id, ack.svc_id_,
+header.mseq_, header.mfrag_, ack.acked_fseq_);
+pevt->chunk_size_ = ack.chunk_size_;
+if (m_NCS_IPC_SEND(_events, pevt,
  NCS_IPC_PRIORITY_HIGH) != NCSCC_RC_SUCCESS) {
m_MDS_LOG_ERR("FCTRL: Failed to send msg to mbx_events, Error[%s]",
strerror(errno));
@@ -494,9 +498,9 @@ uint32_t mds_tipc_fctrl_rcv_data(uint8_t *buffer, uint16_t 
len,
  Nack nack;
  nack.Decode(buffer);
  // send to the event thread
-if (m_NCS_IPC_SEND(_events,
-new Event(Event::Type::kEvtRcvNack, id, nack.svc_id_,
-header.mseq_, header.mfrag_, nack.nacked_fseq_),
+pevt = new Event(Event::Type::kEvtRcvNack, id, nack.svc_id_,
+header.mseq_, header.mfrag_, nack.nacked_fseq_);
+if

Re: [devel] [PATCH 1/1] mds: add more tests for mds flow control [#3091]

2019-10-14 Thread Gary Lee


Hi Thuan

Looks OK (review only).

Thanks

Gary

On 14/10/19 8:44 pm, thuan.tran wrote:

mdstest for overload
- 2 senders overload one receivers
- one sender overloads 2 receivers

mdstest for SNA (Serial Number Arithmetic)
- without overload, mds sender gradually sends more than 65535 messages
   and receivers should receive them all
- with overload, mds sender sends a burst of greater than 65535 messages
   and receivers should receive them all

mdstest for #1960 backward compatibility, in order to test the txprob timer
- sender enables, receiver disables
- sender disables, receiver enables
---
  src/mds/apitest/mdstipc.h |   6 +
  src/mds/apitest/mdstipc_api.c | 480 +-
  2 files changed, 421 insertions(+), 65 deletions(-)

diff --git a/src/mds/apitest/mdstipc.h b/src/mds/apitest/mdstipc.h
index 2bd44b4fa..5fd7b9c6e 100644
--- a/src/mds/apitest/mdstipc.h
+++ b/src/mds/apitest/mdstipc.h
@@ -145,6 +145,12 @@ typedef struct tet_mds_recvd_msg_info {
uint16_t len;
  } TET_MDS_RECVD_MSG_INFO;
  
+typedef struct COUNTER {

+  MDS_DEST fr_dest;
+  uint32_t msg_count;
+  struct COUNTER *next;
+} COUNTER;
+
  /* GLOBAL variables /
  TET_ADEST gl_tet_adest;
  TET_VDEST
diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c
index f667d7385..5c0e28ab2 100644
--- a/src/mds/apitest/mdstipc_api.c
+++ b/src/mds/apitest/mdstipc_api.c
@@ -31,6 +31,7 @@
  
  #define MSG_SIZE MDS_DIRECT_BUF_MAXSIZE

  static MDS_CLIENT_MSG_FORMAT_VER gl_set_msg_fmt_ver;
+COUNTER *gl_head_counters = NULL;
  
  MDS_SVC_ID svc_ids[3] = {2006, 2007, 2008};
  
@@ -13105,9 +13106,62 @@ void tet_create_default_PWE_VDEST_tp()

test_validate(FAIL, 0);
  }
  
-void tet_sender(uint32_t msg_count, uint32_t msg_size)

+static void reset_counters(void)
+{
+   COUNTER *tmp = gl_head_counters;
+   while (tmp != NULL) {
+   gl_head_counters = tmp->next;
+   free(tmp);
+   tmp = gl_head_counters;
+   }
+}
+
+static uint32_t increase_counters(MDS_DEST dest)
+{
+   COUNTER *tmp = gl_head_counters;
+   while (tmp != NULL) {
+   if (tmp->fr_dest == dest) {
+   tmp->msg_count++;
+   printf("\nGot %d message from %x\n",
+   tmp->msg_count, dest);
+   return tmp->msg_count;
+   }
+   tmp = tmp->next;
+   }
+   if (tmp == NULL) {
+   COUNTER *new = (COUNTER *)malloc(sizeof(COUNTER));
+   new->fr_dest = dest;
+   new->msg_count = 1;
+   new->next = gl_head_counters;
+   gl_head_counters = new;
+   printf("\nGot %d message from %x\n",
+   new->msg_count, dest);
+   return new->msg_count;
+   }
+   return 0;
+}
+
+static bool verify_counters(uint32_t expect_num)
+{
+   COUNTER *tmp = gl_head_counters;
+   if (tmp == NULL) {
+   printf("\nNo message\n");
+   return false;
+   }
+   while (tmp != NULL) {
+   if (tmp->msg_count != expect_num) {
+   printf("\nGot %d message from %x\n",
+   tmp->msg_count, tmp->fr_dest);
+   return false;
+   }
+   tmp = tmp->next;
+   }
+   return true;
+}
+
+void tet_sender(MDS_SVC_ID svc_id, uint32_t msg_num, uint32_t msg_size,
+   int svc_num, MDS_SVC_ID to_svcids[])
  {
-   int live = 100; // sender live max 100s
TET_MDS_MSG *mesg;
if (msg_size > TET_MSG_SIZE_MIN) {
printf("\nSender: msg_size > TET_MSG_SIZE_MIN\n");
@@ -13117,72 +13171,84 @@ void tet_sender(uint32_t msg_count, uint32_t msg_size)
memset(mesg, 0, sizeof(TET_MDS_MSG));
  
  	printf("\nStarted Sender (pid:%d) svc_id=%d\n",

-   (int)getpid(), NCSMDS_SVC_ID_INTERNAL_MIN);
+   (int)getpid(), svc_id);
if (adest_get_handle() != NCSCC_RC_SUCCESS) {
printf("\n: Sender FAIL to get adest handle\n");
exit(1);
}
  
  	if (mds_service_install(gl_tet_adest.mds_pwe1_hdl,

-   NCSMDS_SVC_ID_INTERNAL_MIN, 1,
+   svc_id, 1,
NCSMDS_SCOPE_NONE, false, false) != 
NCSCC_RC_SUCCESS) {
printf("\nSender FAIL to install the service\n");
exit(1);
}
  
-	MDS_SVC_ID svcids[] = {NCSMDS_SVC_ID_EXTERNAL_MIN};

if (mds_service_subscribe(
-   gl_tet_adest.mds_pwe1_hdl, NCSMDS_SVC_ID_INTERNAL_MIN,
-   NCSMDS_SCOPE_INTRANODE, 1, svcids) != NCSCC_RC_SUCCESS) {
+   gl_tet_adest.mds_pwe1_hdl, svc_id,
+   NCSMDS_SCOPE_INTRANODE,
+   svc_num, to_svcids) != NCSCC_RC_SUCCESS) {
printf("\nSender

Re: [devel] [PATCH 1/1] osaf: perform handshake in tcp_server in new thread [#3099]

2019-10-10 Thread Gary Lee


Hi

I should have put one more comment in.

Currently, the handshake is done in the equivalent of accept() running 
in the 'main thread'. If a client is malicious or faulty, then no one 
else can connect. But finish_request() is run from the thread created 
for each client.


Gary

On 11/10/19 2:22 pm, Gary Lee wrote:

---
  src/osaf/consensus/plugins/tcp/tcp_server.py | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/osaf/consensus/plugins/tcp/tcp_server.py 
b/src/osaf/consensus/plugins/tcp/tcp_server.py
index a7f22f2..c10859c 100755
--- a/src/osaf/consensus/plugins/tcp/tcp_server.py
+++ b/src/osaf/consensus/plugins/tcp/tcp_server.py
@@ -73,10 +73,15 @@ class ThreadedRPCServer(ThreadingMixIn,
  certfile=CERTFILE,
  keyfile=KEYFILE,
  cert_reqs=ssl.CERT_NONE,
-ssl_version=ssl.PROTOCOL_TLSv1_2)
+ssl_version=ssl.PROTOCOL_TLSv1_2,
+do_handshake_on_connect=False)
  self.server_bind()
  self.server_activate()
  
+def finish_request(self, request, client_address):

+ request.do_handshake()
+ return SimpleXMLRPCServer.finish_request(self, request, 
client_address)
+
  
  class Arbitrator(object):

  """ Implementation of a simple arbitrator """




smime.p7s
Description: S/MIME Cryptographic Signature
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for osaf: perform handshake in tcp_server in new thread [#3099]

2019-10-10 Thread Gary Lee

Summary: osaf: perform handshake in tcp_server in new thread [#3099]
Review request for Ticket(s): 3099
Peer Reviewer(s): Hans, Minh, Thuan
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3099
Base revision: e4c3c0c95644238fc84f31352e8ef289d9820ab4
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples y 
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision fed332c489eb687982071013a8cb64e1932960e0
Author: Gary Lee 
Date:   Fri, 11 Oct 2019 14:08:50 +1100

osaf: perform handshake in tcp_server in new thread [#3099]



Complete diffstat:
--
 src/osaf/consensus/plugins/tcp/tcp_server.py | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)


Testing Commands:
-
1) Run tcp_server.py manually
2) telnet localhost  and don't enter anything
3) Run tcp.plugin and make sure it receives a response from the server

Testing, Expected Results:
--
As above. Without this patch, Step 3 will not work

Conditions of Submission:
-
Ack from anyone


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] osaf: perform handshake in tcp_server in new thread [#3099]

2019-10-10 Thread Gary Lee

---
 src/osaf/consensus/plugins/tcp/tcp_server.py | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/osaf/consensus/plugins/tcp/tcp_server.py 
b/src/osaf/consensus/plugins/tcp/tcp_server.py
index a7f22f2..c10859c 100755
--- a/src/osaf/consensus/plugins/tcp/tcp_server.py
+++ b/src/osaf/consensus/plugins/tcp/tcp_server.py
@@ -73,10 +73,15 @@ class ThreadedRPCServer(ThreadingMixIn,
 certfile=CERTFILE,
 keyfile=KEYFILE,
 cert_reqs=ssl.CERT_NONE,
-ssl_version=ssl.PROTOCOL_TLSv1_2)
+ssl_version=ssl.PROTOCOL_TLSv1_2,
+do_handshake_on_connect=False)
 self.server_bind()
 self.server_activate()
 
+def finish_request(self, request, client_address):
+ request.do_handshake()
+ return SimpleXMLRPCServer.finish_request(self, request, 
client_address)
+
 
 class Arbitrator(object):
 """ Implementation of a simple arbitrator """
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] osaf: return new takeover_request immediately [#3098]

2019-10-09 Thread Gary Lee

If a takeover_request is created just before the active controller
calls 'watch takeover_request', then it's possible that the
active rded instance is not informed of the request.

When 'watch takeover_request' is called, check if there's already
a takeover_request in 'NEW' state and return immediately.
---
 src/osaf/consensus/plugins/etcd3.plugin | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/osaf/consensus/plugins/etcd3.plugin 
b/src/osaf/consensus/plugins/etcd3.plugin
index d926885..4e09ef6 100644
--- a/src/osaf/consensus/plugins/etcd3.plugin
+++ b/src/osaf/consensus/plugins/etcd3.plugin
@@ -337,13 +337,22 @@ watch() {
   orig_value=$(get "$watch_key")
   result=$?
 
-  if [ "$result" -le "1" ]; then
+  if [ "$result" -le 1 ]; then
+  if [ "$result" -eq 0 ] && [ "$watch_key" == "$takeover_request" ]; then
+state=$(echo $orig_value | awk '{print $4}')
+if [ "$state" == "NEW" ]; then
+  # takeover_request already exists; maybe it was written created
+  # while this node was being promoted
+  echo $orig_value
+  return 0
+fi
+  fi
 while true
 do
   sleep $heartbeat_interval
   current_value=$(get "$watch_key")
   result=$?
-  if [ "$result" -gt "1" ]; then
+  if [ "$result" -gt 1 ]; then
 # etcd down?
 if [ "$watch_key" == "$takeover_request" ]; then
   hostname=`cat $node_name_file`
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for osaf: return new takeover_request immediately [#3098]

2019-10-09 Thread Gary Lee

Summary: osaf: return new takeover_request immediately [#3098]
Review request for Ticket(s): 3098
Peer Reviewer(s): Minh, Thuan, Thang, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3098
Base revision: cafbc5d02c90b57c7c94a7735ce8e002224b3d6b
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples y 
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 903ebd435993cce00350c60827e35b15a78ca3c8
Author: Gary Lee 
Date:   Thu, 10 Oct 2019 14:53:41 +1100

osaf: return new takeover_request immediately [#3098]

If a takeover_request is created just before the active controller
calls 'watch takeover_request', then it's possible that the
active rded instance is not informed of the request.

When 'watch takeover_request' is called, check if there's already
a takeover_request in 'NEW' state and return immediately.



Complete diffstat:
--
 src/osaf/consensus/plugins/etcd3.plugin | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
ack from anyone


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] osaf: add tcp arbitrator [#3064]

2019-10-04 Thread Gary Lee

Hi Hans

OK that’s a good idea.

thanks

> On 4 Oct 2019, at 11:06 pm, Hans Nordebäck  
> wrote:
> 
> Hi Gary, ack, review only. One comment/suggestion can we provide a
> small script that generates the x509 certificate (use e.g. openssl X509
> ... ) instead of including a self signed cert? /BR Hans
>> On Tue, 2019-10-01 at 12:53 +1000, Gary Lee wrote:
>> ---
>> src/osaf/consensus/plugins/tcp/README  |  41 ++
>> src/osaf/consensus/plugins/tcp/certificate.pem |  20 +
>> src/osaf/consensus/plugins/tcp/key.pem |  28 ++
>> src/osaf/consensus/plugins/tcp/tcp.plugin  | 520
>> +
>> src/osaf/consensus/plugins/tcp/tcp_server.py   | 157 
>> 5 files changed, 766 insertions(+)
>> create mode 100644 src/osaf/consensus/plugins/tcp/README
>> create mode 100644 src/osaf/consensus/plugins/tcp/certificate.pem
>> create mode 100644 src/osaf/consensus/plugins/tcp/key.pem
>> create mode 100755 src/osaf/consensus/plugins/tcp/tcp.plugin
>> create mode 100755 src/osaf/consensus/plugins/tcp/tcp_server.py
>> 
>> diff --git a/src/osaf/consensus/plugins/tcp/README
>> b/src/osaf/consensus/plugins/tcp/README
>> new file mode 100644
>> index 000..6f739e8
>> --- /dev/null
>> +++ b/src/osaf/consensus/plugins/tcp/README
>> @@ -0,0 +1,41 @@
>> +TCP arbitrator
>> +
>> +The TCP arbitrator may be useful for deployments where deploying
>> etcd is not
>> +feasible. An example arbitrator is provided to help prevent split
>> brain in
>> +clusters that contain up to 2 system controllers.
>> +
>> +The example arbitrator is a simple python based program that can be
>> deployed on
>> +a single payload or a node external to the cluster.
>> +
>> +Two main pieces of information are stored on the arbitrator: the
>> hostname of the
>> +current active controller and a heartbeat timestamp.
>> +
>> +An active controller sends a heartbeat to the controller every 100ms
>> using TLs
>> +over a persistent TCP connection. It should self-fence if it is
>> unable to
>> +heartbeat, as it is likely to be separated from the arbitrator.
>> +
>> +A candidate active controller must check the existing controller is
>> not
>> +heartbeating before promoting itself active. On a cluster using
>> TIPC,
>> +the timeout value is the TIPC link tolerance timeout. On a TCP based
>> cluster,
>> +the timeout is calculated from FMS_TAKEOVER_REQUEST_VALID_TIME.
>> +
>> +Suggested fmd.conf configuration:
>> +
>> +export FMS_SPLIT_BRAIN_PREVENTION=1
>> +export FMS_KEYVALUE_STORE_PLUGIN_CMD=/full/path/to/tcp.plugin
>> +export FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE=0 (any other setting
>> is ignored)
>> +export FMS_RELAXED_NODE_PROMOTION=1
>> +
>> +The above settings will allow a controller to be elected active
>> during
>> +cluster startup, even if the arbitrator is not yet running.
>> +If the arbitrator becomes temporarily unavailable, the controllers
>> will
>> +remain running if they can see each other. If an active controller
>> becomes
>> +isolated from the standby *and* the arbitrator, it will self-fence
>> and the
>> +standby will become active (if located in the same network partition
>> as
>> +the arbitrator).
>> +
>> +The provided self-signed certificate is an example only, and was
>> generated using:
>> +
>> +openssl req -newkey rsa:2048 -nodes -keyout key.pem -x509 -days
>> 10 -out certificate.pem
>> +
>> +It must be replaced in an actual deployment!!
>> diff --git a/src/osaf/consensus/plugins/tcp/certificate.pem
>> b/src/osaf/consensus/plugins/tcp/certificate.pem
>> new file mode 100644
>> index 000..e0b4993
>> --- /dev/null
>> +++ b/src/osaf/consensus/plugins/tcp/certificate.pem
>> @@ -0,0 +1,20 @@
>> +-BEGIN CERTIFICATE-
>> +MIIDUTCCAjmgAwIBAgIJANrPYThNMllvMA0GCSqGSIb3DQEBCwUAMD4xCzAJBgNV
>> +BAYTAkFVMQ4wDAYDVQQIDAVTdGF0ZTENMAsGA1UEBwwEQ2l0eTEQMA4GA1UECgwH
>> +T3BlblNBRjAgFw0xOTA5MzAwMDMxNTRaGA8yMjkzMDcxNTAwMzE1NFowPjELMAkG
>> +A1UEBhMCQVUxDjAMBgNVBAgMBVN0YXRlMQ0wCwYDVQQHDARDaXR5MRAwDgYDVQQK
>> +DAdPcGVuU0FGMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA5pCFKYnS
>> ++pi0gzrRWPRYg1sak9VpNK+MkKbj+m0bptRt/8JvosV62js4q5Da3ldq2AAcEJyf
>> +gd02YZ4HUDdCMgMtlWT1CAx89rNpozRwyj5g+4cfmOqiz7ApeZ9yqltInjG720DT
>> +lam2/R4/00zmFGAqD2ZGPiOY93bjYx+GhtiHcDvpJuZS2Z2vQ/Dd09v6Omhus0rZ
>> +WMrENyfavc7HwFv2z/qi4Hsb/Aa9ZuAXUKp1Q2cvC0XWdRJMdZaZfGUlTfY6X8ar
>> +hSnswHJJKIjBq/0jYpztntOubceOuGVyezxPVXPw5qiBLO7ZyYNgN9IMoF6Rbu9y
>> +

[devel] [PATCH 0/1] Review Request for amf: add asserts to problematic areas identified by codechecker [#3077]

2019-10-02 Thread Gary Lee

Summary: amf: add asserts to problematic areas identified by codechecker [#3077]
Review request for Ticket(s): 3077
Peer Reviewer(s): Hans, Minh, Thuan
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3077
Base revision: 05064a1cfd0aeaf824dce7602d535654b3457e30
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy 
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 39c8ca156da2acbaecb83ae76ce7d9bc480a4c64
Author: Gary Lee 
Date:   Thu, 3 Oct 2019 15:07:30 +1000

amf: add asserts to problematic areas identified by codechecker [#3077]



Complete diffstat:
--
 src/amf/amfd/sg_nway_fsm.cc | 2 ++
 src/amf/amfd/sgtype.cc  | 1 +
 src/amf/amfnd/comp.cc   | 2 ++
 src/amf/amfnd/susm.cc   | 1 +
 4 files changed, 6 insertions(+)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
ack from anyone


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 0/1] Review Request for osaf: add tcp arbitrator [#3064]

2019-10-01 Thread Gary Lee

Hi Alex

> I see in the README this is usable for "clusters that contain up to 2 
system controllers." What are the limiting factors for applying it to a 
> cluster with more than 2 system controllers (where the others are 
running as spares)?

The TCP arbitrator is intended for use with FMS_RELAXED_NODE_PROMOTION=1.

Otherwise, it becomes a single point of failure.

With this setting, two SCs can remain up if they can see other.

If roaming SC is enabled, consider the case where two spare SCs become 
isolated in a network partition (partition 2), while existing 
active/standby/arbitrator is in partition 1. We would end up with dual 
actives as the SCs in partition 2 will also become active/standby.

Hope that explains it better.

Gary

On 1/10/19 12:53 pm, Gary Lee wrote:

Summary: osaf: add tcp arbitrator [#3064]
Review request for Ticket(s): 3064
Peer Reviewer(s): Minh, Hans, AndersW
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3064
Base revision: 46e9e0f310a6c21dbc89a9ffd8bee26829342c0c
Personal repository: git://git.code.sf.net/u/userid-2226215/review

Impacted area   Impact y/n

  Docsn
  Build systemn
  RPM/packaging   n
  Configuration files n
  Startup scripts n
  SAF servicesn
  OpenSAF servicesy
  Core libraries  n
  Samples n
  Tests   n
  Other   n

Comments (indicate scope for each "y" above):
-

revision feea45602df54671c8e769f2e234b03ad6dcdaeb
Author: Gary Lee 
Date:   Tue, 1 Oct 2019 12:47:13 +1000

osaf: add tcp arbitrator [#3064]

Added Files:

  src/osaf/consensus/plugins/tcp/certificate.pem
  src/osaf/consensus/plugins/tcp/key.pem
  src/osaf/consensus/plugins/tcp/README
  src/osaf/consensus/plugins/tcp/tcp.plugin
  src/osaf/consensus/plugins/tcp/tcp_server.py

Complete diffstat:
--
  src/osaf/consensus/plugins/tcp/README  |  41 ++
  src/osaf/consensus/plugins/tcp/certificate.pem |  20 +
  src/osaf/consensus/plugins/tcp/key.pem |  28 ++
  src/osaf/consensus/plugins/tcp/tcp.plugin  | 520 +
  src/osaf/consensus/plugins/tcp/tcp_server.py   | 157 
  5 files changed, 766 insertions(+)

Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***

Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***

Conditions of Submission:
-
ack from anyone

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n

Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]

Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
 that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
 (i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
 Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
 like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
 cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
 too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
 Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
 commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
 of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
 comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (

[devel] [PATCH 0/1] Review Request for osaf: add tcp arbitrator [#3064]

2019-09-30 Thread Gary Lee

Summary: osaf: add tcp arbitrator [#3064]
Review request for Ticket(s): 3064
Peer Reviewer(s): Minh, Hans, AndersW
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3064
Base revision: 46e9e0f310a6c21dbc89a9ffd8bee26829342c0c
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision feea45602df54671c8e769f2e234b03ad6dcdaeb
Author: Gary Lee 
Date:   Tue, 1 Oct 2019 12:47:13 +1000

osaf: add tcp arbitrator [#3064]



Added Files:

 src/osaf/consensus/plugins/tcp/certificate.pem
 src/osaf/consensus/plugins/tcp/key.pem
 src/osaf/consensus/plugins/tcp/README
 src/osaf/consensus/plugins/tcp/tcp.plugin
 src/osaf/consensus/plugins/tcp/tcp_server.py


Complete diffstat:
--
 src/osaf/consensus/plugins/tcp/README  |  41 ++
 src/osaf/consensus/plugins/tcp/certificate.pem |  20 +
 src/osaf/consensus/plugins/tcp/key.pem |  28 ++
 src/osaf/consensus/plugins/tcp/tcp.plugin  | 520 +
 src/osaf/consensus/plugins/tcp/tcp_server.py   | 157 
 5 files changed, 766 insertions(+)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
ack from anyone


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfd: correct handling complete/apply callback on standby sc [#3082]

2019-09-16 Thread Gary Lee


Hi Thang

ack (review only)

Thanks

Gary

On 16/9/19 4:44 pm, thang.d.nguyen wrote:

During stanby SC comes up, AMF config objects are deleted on
active SC. It causes NOT_EXIST error on standby node.
AMFD on standby should ignore this error in this case.
---
  src/amf/amfd/app.cc| 29 -
  src/amf/amfd/comp.cc   | 18 +++---
  src/amf/amfd/compcstype.cc | 14 ++
  src/amf/amfd/csi.cc| 24 ++--
  src/amf/amfd/nodegroup.cc  |  7 ---
  src/amf/amfd/sg.cc | 32 ++--
  src/amf/amfd/sgtype.cc | 11 +++
  src/amf/amfd/si.cc | 29 ++---
  src/amf/amfd/su.cc | 35 ---
  src/amf/amfd/sutype.cc | 12 
  10 files changed, 162 insertions(+), 49 deletions(-)

diff --git a/src/amf/amfd/app.cc b/src/amf/amfd/app.cc
index 67e5e3e9d..17a259199 100644
--- a/src/amf/amfd/app.cc
+++ b/src/amf/amfd/app.cc
@@ -296,6 +296,11 @@ static void app_ccb_apply_cb(CcbUtilOperationData_t 
*opdata) {
  case CCBUTIL_MODIFY: {
const SaImmAttrModificationT_2 *attr_mod;
app = app_db->find(Amf::to_string(>objectName));
+  if (app == nullptr && avd_cb->is_active() == false) {
+LOG_WA("App modify apply (STDBY): app does not exist");
+break;
+  }
+  assert(app != nullptr);
int i = 0;
while ((attr_mod = opdata->param.modify.attrMods[i++]) != nullptr) {
  const SaImmAttrValuesT_2 *attribute = _mod->modAttr;
@@ -448,11 +453,12 @@ SaAisErrorT avd_app_config_get(void) {
searchParam.searchOneAttr.attrValueType = SA_IMM_ATTR_SASTRINGT;
searchParam.searchOneAttr.attrValue = 
  
-  if (immutil_saImmOmSearchInitialize_2(

+  if ((rc = immutil_saImmOmSearchInitialize_2(
avd_cb->immOmHandle, nullptr, SA_IMM_SUBTREE,
SA_IMM_SEARCH_ONE_ATTR | SA_IMM_SEARCH_GET_SOME_ATTR, ,
-  configAttributes, ) != SA_AIS_OK) {
-LOG_ER("%s: saImmOmSearchInitialize_2 failed: %u", __FUNCTION__, error);
+  configAttributes, )) != SA_AIS_OK) {
+LOG_ER("%s: saImmOmSearchInitialize_2 failed: %u", __FUNCTION__, rc);
+error = rc;
  goto done1;
}
  
@@ -468,9 +474,22 @@ SaAisErrorT avd_app_config_get(void) {
  
  app_add_to_model(app);
  
-if (avd_sg_config_get(Amf::to_string(), app) != SA_AIS_OK) goto done2;

+if ((rc = avd_sg_config_get(Amf::to_string(), app)) != SA_AIS_OK) {
+  if ((rc == SA_AIS_ERR_NOT_EXIST) && (avd_cb->is_active() == false)) {
+avd_app_delete(app);
+continue;
+  } else {
+goto done2;
+  }
+}
  
-if (avd_si_config_get(app) != SA_AIS_OK) goto done2;

+if ((rc = avd_si_config_get(app)) != SA_AIS_OK) {
+  if ((rc == SA_AIS_ERR_NOT_EXIST) && (avd_cb->is_active() == false)) {
+avd_app_delete(app);
+  } else {
+goto done2;
+  }
+}
}
  
if (rc == SA_AIS_ERR_NOT_EXIST) {

diff --git a/src/amf/amfd/comp.cc b/src/amf/amfd/comp.cc
index 0ff365e55..7e46584db 100644
--- a/src/amf/amfd/comp.cc
+++ b/src/amf/amfd/comp.cc
@@ -1509,6 +1509,7 @@ SaAisErrorT avd_comp_config_get(const std::string 
_name, AVD_SU *su) {
 SA_IMM_SEARCH_ONE_ATTR | SA_IMM_SEARCH_GET_SOME_ATTR, ,
 configAttributes, )) != SA_AIS_OK) {
  LOG_ER("%s: saImmOmSearchInitialize_2 failed: %u", __FUNCTION__, rc);
+error = rc;
  goto done1;
}
  
@@ -1524,9 +1525,15 @@ SaAisErrorT avd_comp_config_get(const std::string _name, AVD_SU *su) {

  num_of_comp_in_su++;
  comp_add_to_model(comp);
  
-if (avd_compcstype_config_get(Amf::to_string(_name), comp) !=

-SA_AIS_OK)
-  goto done2;
+if ((rc = avd_compcstype_config_get(Amf::to_string(_name), comp)) !=
+SA_AIS_OK) {
+  if ((rc == SA_AIS_ERR_NOT_EXIST) && (avd_cb->is_active() == false)) {
+avd_comp_delete(comp);
+num_of_comp_in_su--;
+  } else {
+goto done2;
+  }
+}
}
  
/* If there are no component in the SU, we treat it as invalid configuration.

@@ -1695,6 +1702,10 @@ static SaAisErrorT 
ccb_completed_modify_hdlr(CcbUtilOperationData_t *opdata) {
TRACE_ENTER();
  
comp = comp_db->find(Amf::to_string(>objectName));

+  if (comp == nullptr && avd_cb->is_active() == false) {
+LOG_WA("Comp modify completed (STDBY): comp does not exist");
+return SA_AIS_OK;
+  }
  
while ((attr_mod = opdata->param.modify.attrMods[i++]) != nullptr) {

  const SaImmAttrValuesT_2 *attribute = _mod->modAttr;
@@ -2479,6 +2490,7 @@ void comp_ccb_apply_delete_hdlr(struct 
CcbUtilOperationData *opdata) {
  
AVD_COMP *comp = comp_db->find(Amf::to_string(>objectName));

if (comp == nullptr && avd_cb->is_active() == false) {
+LOG_WA("Comp modify apply (STDBY): comp does not exist");
  return;
}
/* comp should be found in the database even if it was
diff --git

[devel] [PATCH 0/1] Review Request for amfd: fix coredump during downgrade if delayed failover is enabled V2 [#3078]

2019-09-12 Thread Gary Lee

Summary: amfd: fix coredump during downgrade if delayed failover is enabled 
[#3078]
Review request for Ticket(s): 3078
Peer Reviewer(s): Minh
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3078
Base revision: 4ac5b9921c64657900a029774636a00de41d8232
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 4a13618129f61b3a24502722d8c7b84bb465639e
Author: Gary Lee 
Date:   Thu, 12 Sep 2019 17:17:51 +1000

amfd: fix coredump during downgrade if delayed failover is enabled [#3078]

If delayed failover is enabled, and a downgrade to a version without #3060 
occurs,
then the standby running a newer version with #3060 may complain about an out
of sync error during warm sync.



Complete diffstat:
--
 src/amf/amfd/ckpt_dec.cc | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] amfd: fix coredump during downgrade if delayed failover is enabled [#3078]

2019-09-12 Thread Gary Lee

If delayed failover is enabled, and a downgrade to a version without #3060 
occurs,
then the standby running a newer version with #3060 may complain about an out
of sync error during warm sync.
---
 src/amf/amfd/ckpt_dec.cc | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/src/amf/amfd/ckpt_dec.cc b/src/amf/amfd/ckpt_dec.cc
index 6288b4f..75213f8 100644
--- a/src/amf/amfd/ckpt_dec.cc
+++ b/src/amf/amfd/ckpt_dec.cc
@@ -2721,10 +2721,25 @@ uint32_t avd_dec_warm_sync_rsp(AVD_CL_CB *cb, 
NCS_MBCSV_CB_DEC *dec) {
 if (updt_cnt->ng_updt != cb->async_updt_cnt.ng_updt)
   LOG_ER("ng_updt counters mismatch: Active: %u Standby: %u",
  updt_cnt->ng_updt, cb->async_updt_cnt.ng_updt);
-if (updt_cnt->failover_updt != cb->async_updt_cnt.failover_updt)
-  LOG_ER("failover_updt counters mismatch: Active: %u Standby: %u",
- updt_cnt->failover_updt, cb->async_updt_cnt.failover_updt);
-
+if (updt_cnt->failover_updt != cb->async_updt_cnt.failover_updt) {
+  if (dec->i_peer_version >= AVD_MBCSV_SUB_PART_VERSION_10) {
+LOG_ER("failover_updt counters mismatch: Active: %u Standby: %u",
+   updt_cnt->failover_updt, cb->async_updt_cnt.failover_updt);
+  } else {
+// Versions before 10 did not support failover_updt
+// After a downgrade scenario, where the active is < v10
+// and this node is >= v10, then there will be failover_updt mismatch
+// If so, just set the value to what's on the older active
+cb->async_updt_cnt.failover_updt = updt_cnt->failover_updt;
+
+// check again
+if (0 == memcmp(updt_cnt, >async_updt_cnt,
+sizeof(AVSV_ASYNC_UPDT_CNT))) {
+  cb->stby_sync_state = AVD_STBY_IN_SYNC;
+  return status;
+}
+  }
+}
 LOG_ER("Out of sync detected in warm sync response, exiting");
 osafassert(0);
 
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] amfd: fix coredump during downgrade if delayed failover is enabled [#3078]

2019-09-12 Thread Gary Lee

If delayed failover is enabled, and a downgrade to a version without #3060 
occurs,
then the standby running a newer version with #3060 may complain about an out
of sync error during warm sync.
---
 src/amf/amfd/ckpt_dec.cc | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/src/amf/amfd/ckpt_dec.cc b/src/amf/amfd/ckpt_dec.cc
index 6288b4f..5d4b3f5 100644
--- a/src/amf/amfd/ckpt_dec.cc
+++ b/src/amf/amfd/ckpt_dec.cc
@@ -2721,10 +2721,25 @@ uint32_t avd_dec_warm_sync_rsp(AVD_CL_CB *cb, 
NCS_MBCSV_CB_DEC *dec) {
 if (updt_cnt->ng_updt != cb->async_updt_cnt.ng_updt)
   LOG_ER("ng_updt counters mismatch: Active: %u Standby: %u",
  updt_cnt->ng_updt, cb->async_updt_cnt.ng_updt);
-if (updt_cnt->failover_updt != cb->async_updt_cnt.failover_updt)
-  LOG_ER("failover_updt counters mismatch: Active: %u Standby: %u",
- updt_cnt->failover_updt, cb->async_updt_cnt.failover_updt);
-
+if (updt_cnt->failover_updt != cb->async_updt_cnt.failover_updt) {
+  if (dec->i_peer_version >= AVD_MBCSV_SUB_PART_VERSION_10) {
+LOG_ER("failover_updt counters mismatch: Active: %u Standby: %u",
+   updt_cnt->failover_updt, cb->async_updt_cnt.failover_updt);
+  } else {
+// Versions before 10 did not support failover_updt
+// After a downupgrade scenario, where the active is < v10
+// and this node is >= v10, then there will be failover_updt mismatch
+// If so, just set the value to what's on the older active
+cb->async_updt_cnt.failover_updt = updt_cnt->failover_updt;
+
+// check again
+if (0 == memcmp(updt_cnt, >async_updt_cnt,
+sizeof(AVSV_ASYNC_UPDT_CNT))) {
+  cb->stby_sync_state = AVD_STBY_IN_SYNC;
+  return status;
+}
+  }
+}
 LOG_ER("Out of sync detected in warm sync response, exiting");
 osafassert(0);
 
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for amfd: fix coredump during downgrade if delayed failover is enabled V2 [#3078]

2019-09-12 Thread Gary Lee

Summary: amfd: fix coredump during downgrade if delayed failover is enabled 
[#3078]
Review request for Ticket(s): 3078
Peer Reviewer(s): Minh
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3078
Base revision: 4ac5b9921c64657900a029774636a00de41d8232
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy 
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision c6c9d6b8efcd9c8b992b82621bbf7ea8f53865a1
Author: Gary Lee 
Date:   Thu, 12 Sep 2019 17:08:56 +1000

amfd: fix coredump during downgrade if delayed failover is enabled [#3078]

If delayed failover is enabled, and a downgrade to a version without #3060 
occurs,
then the standby running a newer version with #3060 may complain about an out
of sync error during warm sync.



Complete diffstat:
--
 src/amf/amfd/ckpt_dec.cc | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 3/9] mds: Add implementation for TIPC buffer overflow solution [#1960]

2019-09-10 Thread Gary Lee


Please ignore the Encode/Decode comment.

On 10/9/19 6:02 pm, Gary Lee wrote:

Hi Minh & Thuan

Some minor comments marked with [GL].

On 14/8/19 4:38 pm, Minh Chau wrote:

This is a collaborative patch of two participants:Thuan, Minh.

Main changes:
- Add mds_tipc_fctrl_intf.h, mds_tipc_fctrl_intf.cc: These two files
introduce new functions which are called in mds_dt_tipc.c if the flow
control is enabled
- Add mds_tipc_fctrl_portid.h, mds_tipc_fctrl_portid.cc: These files
implements the tipc portid instance, which supports the sliding window,
mds msg queue
- Add mds_tipc_fctrl_msg.h, mds_tipc_fctrl_msg.cc: These files define
the event and messages which are used for this solution.
---
  src/mds/Makefile.am  |  10 +-
  src/mds/mds_dt.h |   8 +-
  src/mds/mds_dt_tipc.c    | 188 +---
  src/mds/mds_tipc_fctrl_intf.cc   | 376 
+++

  src/mds/mds_tipc_fctrl_intf.h    |  47 +
  src/mds/mds_tipc_fctrl_msg.cc    | 142 +++
  src/mds/mds_tipc_fctrl_msg.h | 129 ++
  src/mds/mds_tipc_fctrl_portid.cc | 261 +++
  src/mds/mds_tipc_fctrl_portid.h  |  87 +
  9 files changed, 1184 insertions(+), 64 deletions(-)
  create mode 100644 src/mds/mds_tipc_fctrl_intf.cc
  create mode 100644 src/mds/mds_tipc_fctrl_intf.h
  create mode 100644 src/mds/mds_tipc_fctrl_msg.cc
  create mode 100644 src/mds/mds_tipc_fctrl_msg.h
  create mode 100644 src/mds/mds_tipc_fctrl_portid.cc
  create mode 100644 src/mds/mds_tipc_fctrl_portid.h

diff --git a/src/mds/Makefile.am b/src/mds/Makefile.am
index 2d7b652..d849e8f 100644
--- a/src/mds/Makefile.am
+++ b/src/mds/Makefile.am
@@ -48,10 +48,16 @@ lib_libopensaf_core_la_SOURCES += \
  if ENABLE_TIPC_TRANSPORT
  noinst_HEADERS += src/mds/mds_dt_tipc.h \
  src/mds/mds_tipc_recvq_stats.h \
-    src/mds/mds_tipc_recvq_stats_impl.h
+    src/mds/mds_tipc_recvq_stats_impl.h \
+    src/mds/mds_tipc_fctrl_intf.h \
+    src/mds/mds_tipc_fctrl_portid.h \
+    src/mds/mds_tipc_fctrl_msg.h
  lib_libopensaf_core_la_SOURCES += src/mds/mds_dt_tipc.c \
  src/mds/mds_tipc_recvq_stats.cc \
-    src/mds/mds_tipc_recvq_stats_impl.cc
+    src/mds/mds_tipc_recvq_stats_impl.cc \
+    src/mds/mds_tipc_fctrl_intf.cc \
+    src/mds/mds_tipc_fctrl_portid.cc \
+    src/mds/mds_tipc_fctrl_msg.cc
  endif
    if ENABLE_TESTS
diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h
index b645bb4..d9e8633 100644
--- a/src/mds/mds_dt.h
+++ b/src/mds/mds_dt.h
@@ -162,7 +162,7 @@ uint32_t mdtm_del_from_ref_tbl(MDS_SUBTN_REF_VAL 
ref);

  uint32_t mds_tmr_mailbox_processing(void);
  uint32_t mdtm_get_from_ref_tbl(MDS_SUBTN_REF_VAL ref, MDS_SVC_HDL 
*svc_hdl);
  uint32_t mdtm_add_frag_hdr(uint8_t *buf_ptr, uint16_t len, uint32_t 
seq_num,

-   uint16_t frag_byte);
+   uint16_t frag_byte, uint16_t fctrl_seq_num);
  uint32_t mdtm_free_reassem_msg_mem(MDS_ENCODED_MSG *msg);
  uint32_t mdtm_process_recv_data(uint8_t *buf, uint16_t len, 
uint64_t tipc_id,

  uint32_t *buff_dump);
@@ -240,9 +240,13 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, 
NCSCONTEXT msg);

    #define MDS_PROT 0xA0
  #define MDS_VERSION 0x08
-#define MDS_PROT_VER_MASK (MDS_PROT | MDS_VERSION)
+#define MDS_PROT_VER_MASK 0xFC
  #define MDTM_PRI_MASK 0x3
  +/* MDS protocol/version for flow control */
+#define MDS_PROT_FCTRL (0xB0 | MDS_VERSION)
+#define MDS_PROT_FCTRL_ID 0x00AC13F5
+
  /* Added for the subscription changes */
  #define MDS_NCS_CHASSIS_ID (m_NCS_GET_NODE_ID & 0x00ff)
  #define MDS_TIPC_COMMON_ID 0x01001000
diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index 86b52bb..fef1c50 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -47,6 +47,7 @@
  #include "mds_dt_tipc.h"
  #include "mds_dt_tcp_disc.h"
  #include "mds_core.h"
+#include "mds_tipc_fctrl_intf.h"
  #include "mds_tipc_recvq_stats.h"
  #include "base/osaf_utility.h"
  #include "base/osaf_poll.h"
@@ -165,20 +166,22 @@ NCS_PATRICIA_TREE mdtm_reassembly_list;
  uint32_t mdtm_global_frag_num;
    const unsigned int MAX_RECV_THRESHOLD = 30;
+uint8_t gl_mds_pro_ver = MDS_PROT | MDS_VERSION;
  -static bool get_tipc_port_id(int sock, uint32_t* port_id) {
+static bool get_tipc_port_id(int sock, struct tipc_portid* port_id) {
  struct sockaddr_tipc addr;
  socklen_t sz = sizeof(addr);
    memset(, 0, sizeof(addr));
-    *port_id = 0;
+    port_id->node = 0;
+    port_id->ref = 0;
  if (0 > getsockname(sock, (struct sockaddr *), )) {
  syslog(LOG_ERR, "MDTM:TIPC Failed to get socket name, err: 
%s",

 strerror(errno));
  return false;
  }
  -    *port_id = addr.addr.id.ref;
+    *port_id = addr.addr.id;
  return true;
  }
  @@ -240,12 +243,13 @@ uint32_t mdtm_tipc_init(NODE_ID

Re: [devel] [PATCH 3/9] mds: Add implementation for TIPC buffer overflow solution [#1960]

2019-09-10 Thread Gary Lee


Hi Minh & Thuan

Some minor comments marked with [GL].

On 14/8/19 4:38 pm, Minh Chau wrote:

This is a collaborative patch of two participants:Thuan, Minh.

Main changes:
- Add mds_tipc_fctrl_intf.h, mds_tipc_fctrl_intf.cc: These two files
introduce new functions which are called in mds_dt_tipc.c if the flow
control is enabled
- Add mds_tipc_fctrl_portid.h, mds_tipc_fctrl_portid.cc: These files
implements the tipc portid instance, which supports the sliding window,
mds msg queue
- Add mds_tipc_fctrl_msg.h, mds_tipc_fctrl_msg.cc: These files define
the event and messages which are used for this solution.
---
  src/mds/Makefile.am  |  10 +-
  src/mds/mds_dt.h |   8 +-
  src/mds/mds_dt_tipc.c| 188 +---
  src/mds/mds_tipc_fctrl_intf.cc   | 376 +++
  src/mds/mds_tipc_fctrl_intf.h|  47 +
  src/mds/mds_tipc_fctrl_msg.cc| 142 +++
  src/mds/mds_tipc_fctrl_msg.h | 129 ++
  src/mds/mds_tipc_fctrl_portid.cc | 261 +++
  src/mds/mds_tipc_fctrl_portid.h  |  87 +
  9 files changed, 1184 insertions(+), 64 deletions(-)
  create mode 100644 src/mds/mds_tipc_fctrl_intf.cc
  create mode 100644 src/mds/mds_tipc_fctrl_intf.h
  create mode 100644 src/mds/mds_tipc_fctrl_msg.cc
  create mode 100644 src/mds/mds_tipc_fctrl_msg.h
  create mode 100644 src/mds/mds_tipc_fctrl_portid.cc
  create mode 100644 src/mds/mds_tipc_fctrl_portid.h

diff --git a/src/mds/Makefile.am b/src/mds/Makefile.am
index 2d7b652..d849e8f 100644
--- a/src/mds/Makefile.am
+++ b/src/mds/Makefile.am
@@ -48,10 +48,16 @@ lib_libopensaf_core_la_SOURCES += \
  if ENABLE_TIPC_TRANSPORT
  noinst_HEADERS += src/mds/mds_dt_tipc.h \
src/mds/mds_tipc_recvq_stats.h \
-   src/mds/mds_tipc_recvq_stats_impl.h
+   src/mds/mds_tipc_recvq_stats_impl.h \
+   src/mds/mds_tipc_fctrl_intf.h \
+   src/mds/mds_tipc_fctrl_portid.h \
+   src/mds/mds_tipc_fctrl_msg.h
  lib_libopensaf_core_la_SOURCES += src/mds/mds_dt_tipc.c \
src/mds/mds_tipc_recvq_stats.cc \
-   src/mds/mds_tipc_recvq_stats_impl.cc
+   src/mds/mds_tipc_recvq_stats_impl.cc \
+   src/mds/mds_tipc_fctrl_intf.cc \
+   src/mds/mds_tipc_fctrl_portid.cc \
+   src/mds/mds_tipc_fctrl_msg.cc
  endif
  
  if ENABLE_TESTS

diff --git a/src/mds/mds_dt.h b/src/mds/mds_dt.h
index b645bb4..d9e8633 100644
--- a/src/mds/mds_dt.h
+++ b/src/mds/mds_dt.h
@@ -162,7 +162,7 @@ uint32_t mdtm_del_from_ref_tbl(MDS_SUBTN_REF_VAL ref);
  uint32_t mds_tmr_mailbox_processing(void);
  uint32_t mdtm_get_from_ref_tbl(MDS_SUBTN_REF_VAL ref, MDS_SVC_HDL *svc_hdl);
  uint32_t mdtm_add_frag_hdr(uint8_t *buf_ptr, uint16_t len, uint32_t seq_num,
-   uint16_t frag_byte);
+   uint16_t frag_byte, uint16_t fctrl_seq_num);
  uint32_t mdtm_free_reassem_msg_mem(MDS_ENCODED_MSG *msg);
  uint32_t mdtm_process_recv_data(uint8_t *buf, uint16_t len, uint64_t tipc_id,
  uint32_t *buff_dump);
@@ -240,9 +240,13 @@ bool mdtm_mailbox_mbx_cleanup(NCSCONTEXT arg, NCSCONTEXT 
msg);
  
  #define MDS_PROT 0xA0

  #define MDS_VERSION 0x08
-#define MDS_PROT_VER_MASK (MDS_PROT | MDS_VERSION)
+#define MDS_PROT_VER_MASK 0xFC
  #define MDTM_PRI_MASK 0x3
  
+/* MDS protocol/version for flow control */

+#define MDS_PROT_FCTRL (0xB0 | MDS_VERSION)
+#define MDS_PROT_FCTRL_ID 0x00AC13F5
+
  /* Added for the subscription changes */
  #define MDS_NCS_CHASSIS_ID (m_NCS_GET_NODE_ID & 0x00ff)
  #define MDS_TIPC_COMMON_ID 0x01001000
diff --git a/src/mds/mds_dt_tipc.c b/src/mds/mds_dt_tipc.c
index 86b52bb..fef1c50 100644
--- a/src/mds/mds_dt_tipc.c
+++ b/src/mds/mds_dt_tipc.c
@@ -47,6 +47,7 @@
  #include "mds_dt_tipc.h"
  #include "mds_dt_tcp_disc.h"
  #include "mds_core.h"
+#include "mds_tipc_fctrl_intf.h"
  #include "mds_tipc_recvq_stats.h"
  #include "base/osaf_utility.h"
  #include "base/osaf_poll.h"
@@ -165,20 +166,22 @@ NCS_PATRICIA_TREE mdtm_reassembly_list;
  uint32_t mdtm_global_frag_num;
  
  const unsigned int MAX_RECV_THRESHOLD = 30;

+uint8_t gl_mds_pro_ver = MDS_PROT | MDS_VERSION;
  
-static bool get_tipc_port_id(int sock, uint32_t* port_id) {

+static bool get_tipc_port_id(int sock, struct tipc_portid* port_id) {
struct sockaddr_tipc addr;
socklen_t sz = sizeof(addr);
  
  	memset(, 0, sizeof(addr));

-   *port_id = 0;
+   port_id->node = 0;
+   port_id->ref = 0;
if (0 > getsockname(sock, (struct sockaddr *), )) {
syslog(LOG_ERR, "MDTM:TIPC Failed to get socket name, err: %s",
   strerror(errno));
return false;
}
  
-	*port_id = addr.addr.id.ref;

+   *port_id = addr.addr.id;
return true;
  }
  
@@ -240,12 +243,13 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, uint32_t *mds_tipc_ref)

}
  
  	/* Code for getting the self tipc random number */

-   if

[devel] [PATCH 0/1] Review Request for amfd: fix coredump during downgrade if delayed failover is enabled [#3078]

2019-09-08 Thread Gary Lee

Summary: amfd: fix coredump during downgrade if delayed failover is enabled 
[#3078]
Review request for Ticket(s): 3078
Peer Reviewer(s): Minh
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3078
Base revision: 88ba98b8e45621508b528010e524b89068a05d8e
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy 
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision f3aac6813bc4fa002f3dbc726f325ed26a70fda4
Author: Gary Lee 
Date:   Mon, 9 Sep 2019 11:20:34 +1000

amfd: fix coredump during downgrade if delayed failover is enabled [#3078]

If delayed failover is enabled, and a downgrade to a version without #3060 
occurs,
then the standby running a newer version with #3060 may complain about an out
of sync error during warm sync.



Complete diffstat:
--
 src/amf/amfd/ckpt_dec.cc | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] amfd: fix coredump during downgrade if delayed failover is enabled [#3078]

2019-09-08 Thread Gary Lee

If delayed failover is enabled, and a downgrade to a version without #3060 
occurs,
then the standby running a newer version with #3060 may complain about an out
of sync error during warm sync.
---
 src/amf/amfd/ckpt_dec.cc | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/amf/amfd/ckpt_dec.cc b/src/amf/amfd/ckpt_dec.cc
index 6288b4f..3c253d2 100644
--- a/src/amf/amfd/ckpt_dec.cc
+++ b/src/amf/amfd/ckpt_dec.cc
@@ -2721,10 +2721,21 @@ uint32_t avd_dec_warm_sync_rsp(AVD_CL_CB *cb, 
NCS_MBCSV_CB_DEC *dec) {
 if (updt_cnt->ng_updt != cb->async_updt_cnt.ng_updt)
   LOG_ER("ng_updt counters mismatch: Active: %u Standby: %u",
  updt_cnt->ng_updt, cb->async_updt_cnt.ng_updt);
-if (updt_cnt->failover_updt != cb->async_updt_cnt.failover_updt)
-  LOG_ER("failover_updt counters mismatch: Active: %u Standby: %u",
- updt_cnt->failover_updt, cb->async_updt_cnt.failover_updt);
-
+if (updt_cnt->failover_updt != cb->async_updt_cnt.failover_updt) {
+  if (dec->i_peer_version >= AVD_MBCSV_SUB_PART_VERSION_10) {
+LOG_ER("failover_updt counters mismatch: Active: %u Standby: %u",
+   updt_cnt->failover_updt, cb->async_updt_cnt.failover_updt);
+  } else {
+// Versions before 10 did not support failover_updt
+// After a downupgrade scenario, where the active is < v10
+// and this node is >= v10, then there will be failover_updt mismatch
+// If so, just set the value to what's on the older active
+cb->async_updt_cnt.failover_updt = updt_cnt->failover_updt;
+// failover_updt must be the LAST comparison made, otherwise
+// these if statements need will some refactoring
+return status;
+  }
+}
 LOG_ER("Out of sync detected in warm sync response, exiting");
 osafassert(0);
 
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] amf: handle errors identified by codechecker [#3077]

2019-09-02 Thread Gary Lee

add assertions where pointers should not be null
fix a couple of typos
---
 src/amf/amfd/comp.cc   |  1 +
 src/amf/amfd/csi.cc|  3 ++-
 src/amf/amfd/cstype.cc |  2 ++
 src/amf/amfd/hlt.cc|  1 +
 src/amf/amfd/nodeswbundle.cc   |  2 +-
 src/amf/amfd/ntf.cc|  1 +
 src/amf/amfd/sg_npm_fsm.cc | 34 +++---
 src/amf/amfd/sg_nway_fsm.cc|  2 +-
 src/amf/amfd/sgproc.cc |  1 +
 src/amf/amfd/su.cc |  1 +
 src/amf/amfd/sutype.cc |  3 ++-
 src/amf/amfd/svctype.cc|  1 +
 src/amf/amfd/svctypecstypes.cc |  1 +
 src/amf/amfnd/cbq.cc   |  2 ++
 src/amf/amfnd/clc.cc   |  1 +
 src/amf/amfnd/comp.cc  |  4 
 src/amf/amfnd/compdb.cc|  2 +-
 src/amf/amfnd/susm.cc  | 11 +++
 18 files changed, 53 insertions(+), 20 deletions(-)

diff --git a/src/amf/amfd/comp.cc b/src/amf/amfd/comp.cc
index 0ff365e..5c6a283 100644
--- a/src/amf/amfd/comp.cc
+++ b/src/amf/amfd/comp.cc
@@ -2117,6 +2117,7 @@ static void comp_ccb_apply_modify_hdlr(struct 
CcbUtilOperationData *opdata) {
   attribute->attrValuesNumber);
 
 if (!strcmp(attribute->attrName, "saAmfCompType")) {
+  osafassert(value != nullptr);
   SaNameT *dn = (SaNameT *)value;
   const std::string oldType(comp->saAmfCompType);
   if (oldType.compare(Amf::to_string(dn)) == 0) {
diff --git a/src/amf/amfd/csi.cc b/src/amf/amfd/csi.cc
index f7e3730..1856610 100644
--- a/src/amf/amfd/csi.cc
+++ b/src/amf/amfd/csi.cc
@@ -913,7 +913,8 @@ static void ccb_apply_delete_hdlr(CcbUtilOperationData_t 
*opdata) {
 goto done;
   }
 
-  TRACE("'%s'", csi ? csi->name.c_str() : nullptr);
+  osafassert(csi != nullptr);
+  TRACE("'%s'", csi->name.c_str());
 
   /* Check whether si has been assigned to any SU. */
   if ((nullptr != csi->si->list_of_sisu) && (csi->compcsi_cnt != 0)) {
diff --git a/src/amf/amfd/cstype.cc b/src/amf/amfd/cstype.cc
index cadc6df..683d3cd 100644
--- a/src/amf/amfd/cstype.cc
+++ b/src/amf/amfd/cstype.cc
@@ -62,6 +62,7 @@ static AVD_CS_TYPE *cstype_create(const std::string ,
  * @param cst
  */
 static void cstype_delete(AVD_CS_TYPE *cst) {
+  osafassert(cst != nullptr);
   cstype_db->erase(cst->name);
   cst->saAmfCSAttrName.clear();
   delete cst;
@@ -205,6 +206,7 @@ static SaAisErrorT 
cstype_ccb_completed_hdlr(CcbUtilOperationData_t *opdata) {
 opdata->userData = nullptr;
 break;
   }
+  osafassert(cst != nullptr);
   if (cst->list_of_csi != nullptr) {
 /* check whether there exists a delete operation for
  * each of the CSI in the cs_type list in the current CCB
diff --git a/src/amf/amfd/hlt.cc b/src/amf/amfd/hlt.cc
index 27863db..4c2737e 100644
--- a/src/amf/amfd/hlt.cc
+++ b/src/amf/amfd/hlt.cc
@@ -75,6 +75,7 @@ static SaAisErrorT 
ccb_completed_delete_hdlr(CcbUtilOperationData_t *opdata) {
 opdata->userData = nullptr;
 goto done;
   }
+  osafassert(comp != nullptr);
   for (curr_susi = comp->su->list_of_susi; curr_susi != nullptr;
curr_susi = curr_susi->su_next)
 for (compcsi = curr_susi->list_of_csicomp; compcsi;
diff --git a/src/amf/amfd/nodeswbundle.cc b/src/amf/amfd/nodeswbundle.cc
index 4ab79f7..cf280cb 100644
--- a/src/amf/amfd/nodeswbundle.cc
+++ b/src/amf/amfd/nodeswbundle.cc
@@ -125,7 +125,7 @@ static int is_swbdl_delete_ok(const std::string _dn,
   if (node == nullptr && avd_cb->is_active() == false) {
 return 1;
   }
-
+  osafassert(node != nullptr);
   if (!is_swbdl_delete_ok_for_node(bundle_dn, node_dn, node->list_of_ncs_su,
opdata))
 return 0;
diff --git a/src/amf/amfd/ntf.cc b/src/amf/amfd/ntf.cc
index eb2654a..52ee745 100644
--- a/src/amf/amfd/ntf.cc
+++ b/src/amf/amfd/ntf.cc
@@ -505,6 +505,7 @@ SaAisErrorT avd_try_send_notification(NtfSend* job) {
 >notification.alarmNotification.notificationHandle;
   }
 
+  osafassert(notificationHandle != nullptr);
   // Try to send the notification if not sent.
   if (job->already_sent == false) {
 rc = saNtfNotificationSend(*notificationHandle);
diff --git a/src/amf/amfd/sg_npm_fsm.cc b/src/amf/amfd/sg_npm_fsm.cc
index 0ef094d..0e91eb5 100644
--- a/src/amf/amfd/sg_npm_fsm.cc
+++ b/src/amf/amfd/sg_npm_fsm.cc
@@ -2773,23 +2773,26 @@ static uint32_t avd_sg_npm_susi_sucss_si_oper(AVD_CL_CB 
*cb, AVD_SU *su,
* modify standby all to the Quiesced SU. Remove the SI from
* admin pointer and add the quiesced SU to the SU oper list.
*/
-  if (su->sg_of_su->admin_si->list_of_sisu == i_susi) {
-o_susi = i_susi->si_next;
-  } else {
-o_susi = su->sg_of_su->admin_si->list_of_sisu;
-  }
+  i_susi = avd_su_susi_find(cb, su, su->sg_of_su->admin_si->name);
+  if (i_susi != nullptr) {
+if (su->sg_of_su->admin_si->list_of_sisu == i_susi) {
+  o_susi = i_susi->si_next;
+} else {
+

[devel] [PATCH 0/1] Review Request for amf: handle errors identified by codechecker [#3077]

2019-09-02 Thread Gary Lee

Summary: amf: handle errors identified by codechecker [#3077]
Review request for Ticket(s): 3077
Peer Reviewer(s): Minh, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3077
Base revision: 2bc054ca85b56bc03bdc9be965593b56124aad00
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy 
 OpenSAF servicesy 
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 24b75d78a013c554d5f9731e69a7150c11217ad7
Author: Gary Lee 
Date:   Tue, 3 Sep 2019 12:06:36 +1000

amf: handle errors identified by codechecker [#3077]

add assertions where pointers should not be null
fix a couple of typos



Complete diffstat:
--
 src/amf/amfd/comp.cc   |  1 +
 src/amf/amfd/csi.cc|  3 ++-
 src/amf/amfd/cstype.cc |  2 ++
 src/amf/amfd/hlt.cc|  1 +
 src/amf/amfd/nodeswbundle.cc   |  2 +-
 src/amf/amfd/ntf.cc|  1 +
 src/amf/amfd/sg_npm_fsm.cc | 34 +++---
 src/amf/amfd/sg_nway_fsm.cc|  2 +-
 src/amf/amfd/sgproc.cc |  1 +
 src/amf/amfd/su.cc |  1 +
 src/amf/amfd/sutype.cc |  3 ++-
 src/amf/amfd/svctype.cc|  1 +
 src/amf/amfd/svctypecstypes.cc |  1 +
 src/amf/amfnd/cbq.cc   |  2 ++
 src/amf/amfnd/clc.cc   |  1 +
 src/amf/amfnd/comp.cc  |  4 
 src/amf/amfnd/compdb.cc|  2 +-
 src/amf/amfnd/susm.cc  | 11 +++
 18 files changed, 53 insertions(+), 20 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Ope

Re: [devel] [PATCH 1/1] util: Fenced should only write a log record when two acitve controllers is seen [#3073]

2019-08-22 Thread Gary Lee


Hi Hans

ack (review only)

Thanks

Gary

On 22/8/19 5:49 pm, Hans Nordebäck wrote:

---
  tools/devel/fenced/node_state_hdlr_pl.cc | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/devel/fenced/node_state_hdlr_pl.cc 
b/tools/devel/fenced/node_state_hdlr_pl.cc
index c74fe72b9..6bf032e5a 100644
--- a/tools/devel/fenced/node_state_hdlr_pl.cc
+++ b/tools/devel/fenced/node_state_hdlr_pl.cc
@@ -169,8 +169,8 @@ void NodeStateHdlrPl::check_isolation() {
isolated_ = NodeIsolationState::kNotIsolated;
syslog(LOG_NOTICE, "one active controller detected");
  } else {
-  isolated_ = NodeIsolationState::kIsolated;
-  syslog(LOG_NOTICE, "%d active controllers detected, split brain", 
no_of_active);
+  isolated_ = NodeIsolationState::kNotIsolated;
+  syslog(LOG_NOTICE, "%d active controllers detected", no_of_active);
  }
}
  notify:



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] amfd: set failover_state on standby [#3072]

2019-08-21 Thread Gary Lee

Otherwise, after two controller failovers, unexpected
reboot of previously rebooted payloads may occur.
---
 src/amf/amfd/node_state_machine.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/amf/amfd/node_state_machine.cc 
b/src/amf/amfd/node_state_machine.cc
index efe2085..d38f79e 100644
--- a/src/amf/amfd/node_state_machine.cc
+++ b/src/amf/amfd/node_state_machine.cc
@@ -63,6 +63,12 @@ void NodeStateMachine::SetState(uint32_t state) {
 LOG_NO("New state '%u'", state);
   }
 
+  // this is needed for cold sync, in case this node (currently standby)
+  // becomes active later
+  AVD_AVND *node = avd_node_find_nodeid(node_id_);
+  osafassert(node != nullptr);
+  node->failover_state = state;
+
   switch (state) {
 case NodeState::kStart:
   state_ = std::make_shared(this);
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for amfd: set failover_state on standby [#3072]

2019-08-21 Thread Gary Lee

Summary: amfd: set failover_state on standby [#3072]
Review request for Ticket(s): 3072
Peer Reviewer(s): Minh, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3072
Base revision: 729f71fbfff0eea6d4a6a394780142b87a9fb472
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn 
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 252c36529095306e57a859177f9a74f47809b50d
Author: Gary Lee 
Date:   Thu, 22 Aug 2019 14:08:39 +1000

amfd: set failover_state on standby [#3072]

Otherwise, after two controller failovers, unexpected
reboot of previously rebooted payloads may occur.



Complete diffstat:
--
 src/amf/amfd/node_state_machine.cc | 6 ++
 1 file changed, 6 insertions(+)


Testing Commands:
-
1) set failover delay to 5s, node wait timeout to 15s
2) reboot PL-3
3) reboot active SC
4) reboot active SC again

Testing, Expected Results:
--
Ensure PL-3 does not get rebooted 15s after step 4 above.

Conditions of Submission:
-
ack from anyone

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] mbc: fix some coding errors [#3070]

2019-08-18 Thread Gary Lee


Hi Thuan

ack (review only)

Thanks

Gary

On 14/8/19 8:24 pm, thuan.tran wrote:

---
  src/mbc/mbcsv_api.c  | 6 +++---
  src/mbc/mbcsv_peer.c | 2 +-
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/mbc/mbcsv_api.c b/src/mbc/mbcsv_api.c
index 84a2b8771..3a84fdfda 100644
--- a/src/mbc/mbcsv_api.c
+++ b/src/mbc/mbcsv_api.c
@@ -619,7 +619,7 @@ uint32_t mbcsv_process_close_request(NCS_MBCSV_ARG *arg)
if (NULL ==
(mbc_reg = (MBCSV_REG *)m_MBCSV_TAKE_HANDLE(arg->i_mbcsv_hdl))) {
TRACE_2("bad handle specified");
-   rc = SA_AIS_ERR_BAD_HANDLE;
+   return SA_AIS_ERR_BAD_HANDLE;
}
  
  	m_NCS_LOCK(_reg->svc_lock, NCS_LOCK_WRITE);

@@ -685,7 +685,7 @@ uint32_t mbcsv_process_chg_role_request(NCS_MBCSV_ARG *arg)
if (NULL ==
(mbc_reg = (MBCSV_REG *)m_MBCSV_TAKE_HANDLE(arg->i_mbcsv_hdl))) {
TRACE_2("bad handle specified");
-   rc = SA_AIS_ERR_BAD_HANDLE;
+   return SA_AIS_ERR_BAD_HANDLE;
}
  
  	m_NCS_LOCK(_reg->svc_lock, NCS_LOCK_READ);

@@ -804,7 +804,7 @@ uint32_t mbcsv_process_snd_ckpt_request(NCS_MBCSV_ARG *arg)
if (NULL ==
(mbc_reg = (MBCSV_REG *)m_MBCSV_TAKE_HANDLE(arg->i_mbcsv_hdl))) {
TRACE_2("bad handle specified");
-   rc = SA_AIS_ERR_BAD_HANDLE;
+   return SA_AIS_ERR_BAD_HANDLE;
}
  
  	m_NCS_LOCK(_reg->svc_lock, NCS_LOCK_READ);

diff --git a/src/mbc/mbcsv_peer.c b/src/mbc/mbcsv_peer.c
index 1d4b257a3..1a9eeb125 100644
--- a/src/mbc/mbcsv_peer.c
+++ b/src/mbc/mbcsv_peer.c
@@ -54,7 +54,7 @@ the messages received from the peer.
  
  static const char *disc_trace[] = {"Peer UP msg", "Peer DOWN msg",

   "Peer INFO msg", "Peer INFO resp msg",
-  "Peer Role change msg"
+  "Peer Role change msg",
   "Invalid peer discovery msg"};
  typedef enum {ANCHOR_SEARCH, NODE_ID_SEARCH} SearchMode;
  



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] rde: missing comma between elements in array [#3069]

2019-08-18 Thread Gary Lee


Hi Thuan

ack, will push on your behalf.

Thanks

On 14/8/19 7:42 pm, thuan.tran wrote:

---
  src/rde/rded/rde_main.cc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
index 1a7e58792..6594b3d49 100644
--- a/src/rde/rded/rde_main.cc
+++ b/src/rde/rded/rde_main.cc
@@ -53,7 +53,7 @@ const char *rde_msg_name[] = {"-",
"RDE_MSG_PEER_DOWN(2)",
"RDE_MSG_PEER_INFO_REQ(3)",
"RDE_MSG_PEER_INFO_RESP(4)",
-  "RDE_MSG_NEW_ACTIVE_CALLBACK(5)"
+  "RDE_MSG_NEW_ACTIVE_CALLBACK(5)",
"RDE_MSG_NODE_UP(6)",
"RDE_MSG_NODE_DOWN(7)",
"RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)",



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] nid: use the tipc command instead of tipc-config [#2104]

2019-08-05 Thread Gary Lee


Hi Vu

ack (review only)

Thanks

On 1/8/19 12:53 pm, Vu Minh Nguyen wrote:

The tipc-config command is obsolete and no longer being maintained. We should
switch to using the "tipc" command instead
---
  Makefile.am   |  3 ++-
  opensaf.spec.in   |  1 +
  .../archive/scripts => scripts}/tipc-config   | 15 --
  src/nid/configure_tipc.in | 16 ++-
  src/nid/opensafd.in   | 20 +++
  tools/cluster_sim_uml/build_uml   |  2 +-
  6 files changed, 35 insertions(+), 22 deletions(-)
  rename {tools/cluster_sim_uml/archive/scripts => scripts}/tipc-config (83%)

diff --git a/Makefile.am b/Makefile.am
index b3d6553c1..6d86ec180 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -159,7 +159,8 @@ dist_osaf_execbin_SCRIPTS += \
$(top_srcdir)/scripts/opensaf_reboot \
$(top_srcdir)/scripts/opensaf_sc_active \
$(top_srcdir)/scripts/opensaf_scale_out \
-   $(top_srcdir)/scripts/plm_scale_out
+   $(top_srcdir)/scripts/plm_scale_out \
+   $(top_srcdir)/scripts/tipc-config
  
  include $(top_srcdir)/src/ais/Makefile.am

  include $(top_srcdir)/src/base/Makefile.am
diff --git a/opensaf.spec.in b/opensaf.spec.in
index 0effd59cd..37be5de6d 100644
--- a/opensaf.spec.in
+++ b/opensaf.spec.in
@@ -950,6 +950,7 @@ fi
  %{_pkglibdir}/plm_scale_out
  %{_pkglibdir}/opensaf_sc_active
  %{_pkglibdir}/configure_tipc
+%{_pkglibdir}/tipc-config
  
  
  %files amf-libs

diff --git a/tools/cluster_sim_uml/archive/scripts/tipc-config 
b/scripts/tipc-config
similarity index 83%
rename from tools/cluster_sim_uml/archive/scripts/tipc-config
rename to scripts/tipc-config
index f9fd47937..34eb9a539 100755
--- a/tools/cluster_sim_uml/archive/scripts/tipc-config
+++ b/scripts/tipc-config
@@ -1,4 +1,4 @@
-#!/bin/ash
+#!/bin/bash
  #
  #  -*- OpenSAF  -*-
  #
@@ -39,7 +39,18 @@ fi
  while [ $# -gt 0 ]; do
  case "$1" in
-addr)
-   echo "node address: $(/sbin/tipc node get address)"
+   addr=$(/sbin/tipc node get address)
+   hex_pattern="^[0-9a-fA-F]+$"
+   if [[ $addr =~ $hex_pattern ]]; then
+   dec_addr=$((16#$addr))
+   # the algorithm is based on /usr/include/linux/tipc.h
+   # to form tipc node address into 'Z.C.N' format.
+   tipc_zone=$((dec_addr >> 24))
+   tipc_cluster=$(((dec_addr >> 12) & 0xfff))
+   tipc_node=$((dec_addr & 0xfff))
+   addr="<$tipc_zone.$tipc_cluster.$tipc_node>"
+   fi
+   echo "node address: $addr"
;;
-a=*)
/sbin/tipc node set address "$(echo "$1" | cut -d= -f2)"
diff --git a/src/nid/configure_tipc.in b/src/nid/configure_tipc.in
index 33621a0ef..5d0bf6efb 100644
--- a/src/nid/configure_tipc.in
+++ b/src/nid/configure_tipc.in
@@ -78,12 +78,13 @@ if ! [ -x "${tipc}" ] && ! [ -x "${tipc_config}" ]; then
  exit 1
  fi
  
+# Prefer using `tipc` over the obsoleted `tipc-config`

+if [ -x "${tipc}" ]; then
+tipc_config="${pkglibdir}"/tipc-config
+fi
+
  if [ "$MANAGE_TIPC" != "yes" ] && ! [ -s "$pkglocalstatedir/node_id" ]; then
-if [ -x "${tipc}" ]; then
-   addr=$(tipc node get address | cut -d'<' -f2 | cut -d'>' -f1)
-else
-   addr=$(tipc-config -addr | cut -d'<' -f2 | cut -d'>' -f1)
-fi
+   addr=$(${tipc-config} -addr | cut -d'<' -f2 | cut -d'>' -f1)
  addr=$(echo "$addr" | cut -d. -f3)
  CHASSIS_ID=2
  SLOT_ID=$((addr & 255))
@@ -98,11 +99,6 @@ fi
  ETH_NAME=$2
  TIPC_NETID=$3
  
-if ! [ -x "${tipc_config}" ]; then

-echo "error: tipc-config is not available"
-exit 1
-fi
-
  # Get the Chassis Id and Slot Id from @sysconfdir@/@PACKAGE_NAME@/chassis_id 
and @sysconfdir@/@PACKAGE_NAME@/slot_id
  if ! test -f "$CHASSIS_ID_FILE"; then
 echo "$CHASSIS_ID_FILE doesnt exists, exiting "
diff --git a/src/nid/opensafd.in b/src/nid/opensafd.in
index 94888039a..f85cf5b0c 100644
--- a/src/nid/opensafd.in
+++ b/src/nid/opensafd.in
@@ -50,7 +50,7 @@ osafcshash=@INTERNAL_VERSION_ID@
  unload_tipc() {
  
	# Unload TIPC if already loaded

-   if [ $MANAGE_TIPC = "yes" ] && grep tipc /proc/modules >/dev/null 2>&1; 
then
+   if [ "$MANAGE_TIPC" = "yes" ] && grep tipc /proc/modules >/dev/null 
2>&1; then
modprobe -r tipc >/dev/null 2>&1
if [ $? -eq 1 ]; then
logger -t $osafprog "warning: TIPC module unloading 
failed"
@@ -59,13 +59,17 @@ unload_tipc() {
  }
  
  check_tipc() {

-   # Exit if tipc-config is not installed
-   if [ "$MANAGE_TIPC" = "yes" ] && [ ! -x /sbin/tipc-config ]; then
-   which tipc-config >/dev/null 2>&1
-   if [ $? -eq 1 ] ; then
-   logger -s -t $osafprog "Can't find tipc-config in the PATH, 
exiting."
-

Re: [devel] [PATCH 0/1] Review Request for amfd: add support for dynamically changing saAmfRank of SaAmfSIRankedSU [#3058]

2019-07-28 Thread Gary Lee


Hi Alex

Ack, review only.

Thanks

Gary

On 19/7/19 5:04 am, Jones, Alex wrote:
Summary: amfd: add support for dynamically changing saAmfRank of 
SaAmfSIRankedSU [#3058]

Review request for Ticket(s): 3058
Peer Reviewer(s): Nagu, Hans, Gary
Pull request to:
Affected branch(es): develop
Development branch: ticket-3058
Base revision: ec296cbb38761831929a97a8d94d177130f656c9
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area Impact y/n

Docs n
Build system n
RPM/packaging n
Configuration files n
Startup scripts n
SAF services y
OpenSAF services n
Core libraries n
Samples n
Tests n
Other n


Comments (indicate scope for each "y" above):
-

revision 620fd473bfa6f28598a6171ac82b8a7e19056d1b
Author:Alex Jones 
Date:Thu, 18 Jul 2019 14:43:29 -0400

amfd: add support for dynamically changing saAmfRank of 
SaAmfSIRankedSU [#3058]


Allow saAmfRank of SaAmfSIRankedSU to be changed at runtime



Complete diffstat:
--
src/amf/amfd/si.cc | 103 +
src/amf/amfd/si.h | 3 ++
src/amf/amfd/siass.cc | 38 +
src/amf/amfd/sirankedsu.cc | 73 +++-
src/amf/amfd/util.cc | 30 -
5 files changed, 243 insertions(+), 4 deletions(-)


Testing Commands:
-
1) create N-way service group with SUs and components
2) create SaAmfSIRankedSU objects for the SUs
3) Once the assignments have been made on the components, change the 
value of

saAmfRank in one of the SaAmfSIRankedSU objects


Testing, Expected Results:
--
1) changing the value should be accepted
2) failover choice by amfd should reflect the new rank

Conditions of Submission:
-
Aug 5, or ack from developer

Arch Built Started Linux distro
---
mips n n
mips64 n n
x86 n n
x86_64 y y
powerpc n n
powerpc64 n n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank 
entries

that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your 
headers/comments/text.


___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, 
user.email etc)


___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



Notice: This e-mail together with any attachments may contain 
information of Ribbon Communications Inc. that is confidential and/or 
proprietary for the sole use of the intended recipient. Any review, 
disclosure, reliance or distribution by others or forwarding without 
express permission is strictly prohibited. If you are not the intended 
recipient, please notify the sender immediately and then delete all 
copies, including any attachments.

[devel] [PATCH 0/1] Review Request for amfd: include failover info in coldsync [#3060]

2019-07-19 Thread Gary Lee

Summary: amfd: include failover info in coldsync [#3060]
Review request for Ticket(s): 3060
Peer Reviewer(s): Minh, Hans, Thang, Thuan 
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3060
Base revision: ec296cbb38761831929a97a8d94d177130f656c9
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy 
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 9443abefdeaae481dbe483b708db8d467619b8c1
Author: Gary Lee 
Date:   Fri, 19 Jul 2019 16:02:19 +1000

amfd: include failover info in coldsync [#3060]

Failover information is not currently included in coldsync. This means
if a delayed failover is in progress *before* a standby controller is
available, *and* a controller failover occurs, then information about
the delayed failover is lost.



Complete diffstat:
--
 src/amf/amfd/chkop.cc  |  4 ++
 src/amf/amfd/ckpt.h|  4 +-
 src/amf/amfd/ckpt_dec.cc   | 77 --
 src/amf/amfd/ckpt_edu.cc   |  2 +
 src/amf/amfd/ckpt_enc.cc   |  5 ++-
 src/amf/amfd/node.h|  3 ++
 src/amf/amfd/node_state_machine.cc |  2 +
 src/amf/amfd/util.cc   |  1 +
 8 files changed, 76 insertions(+), 22 deletions(-)


Testing Commands:
-
1. Enable delayed node failover and network fence a PL while
there is no standby SC. Before the failover occurs,
power up the standby SC, and force a controller failover.

2. Ensure different versions of amfd can cold sync with each
other.

Testing, Expected Results:
--
1. The standby SC (now active) should continue the node failover.

2. It works.

Conditions of Submission:
-
ack from any reviewer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y  
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge

[devel] [PATCH 1/1] amfd: include failover info in coldsync [#3060]

2019-07-19 Thread Gary Lee

Failover information is not currently included in coldsync. This means
if a delayed failover is in progress *before* a standby controller is
available, *and* a controller failover occurs, then information about
the delayed failover is lost.
---
 src/amf/amfd/chkop.cc  |  4 ++
 src/amf/amfd/ckpt.h|  4 +-
 src/amf/amfd/ckpt_dec.cc   | 77 --
 src/amf/amfd/ckpt_edu.cc   |  2 +
 src/amf/amfd/ckpt_enc.cc   |  5 ++-
 src/amf/amfd/node.h|  3 ++
 src/amf/amfd/node_state_machine.cc |  2 +
 src/amf/amfd/util.cc   |  1 +
 8 files changed, 76 insertions(+), 22 deletions(-)

diff --git a/src/amf/amfd/chkop.cc b/src/amf/amfd/chkop.cc
index e9a68f4..56b0142 100644
--- a/src/amf/amfd/chkop.cc
+++ b/src/amf/amfd/chkop.cc
@@ -1051,6 +1051,10 @@ uint32_t avsv_send_ckpt_data(AVD_CL_CB *cb, uint32_t 
action,
 avd_cb->avd_peer_ver);
 return NCSCC_RC_SUCCESS;
   }
+  if (avd_cb->avd_peer_ver >= AVD_MBCSV_SUB_PART_VERSION_10) {
+cb->async_updt_cnt.failover_updt++;
+  }
+
   break;
 default:
   return NCSCC_RC_SUCCESS;
diff --git a/src/amf/amfd/ckpt.h b/src/amf/amfd/ckpt.h
index 875776a..2e15387 100644
--- a/src/amf/amfd/ckpt.h
+++ b/src/amf/amfd/ckpt.h
@@ -35,9 +35,10 @@
 #define AMF_AMFD_CKPT_H_
 
 // current version
-#define AVD_MBCSV_SUB_PART_VERSION 9
+#define AVD_MBCSV_SUB_PART_VERSION 10
 
 // supported versions
+#define AVD_MBCSV_SUB_PART_VERSION_10 10
 #define AVD_MBCSV_SUB_PART_VERSION_9 9
 #define AVD_MBCSV_SUB_PART_VERSION_8 8
 #define AVD_MBCSV_SUB_PART_VERSION_7 7
@@ -109,6 +110,7 @@ typedef struct avsv_async_updt_cnt {
   uint32_t compcstype_updt;
   uint32_t si_trans_updt;
   uint32_t ng_updt;
+  uint32_t failover_updt;
 } AVSV_ASYNC_UPDT_CNT;
 
 /*
diff --git a/src/amf/amfd/ckpt_dec.cc b/src/amf/amfd/ckpt_dec.cc
index a46f6d3..6288b4f 100644
--- a/src/amf/amfd/ckpt_dec.cc
+++ b/src/amf/amfd/ckpt_dec.cc
@@ -178,6 +178,31 @@ const AVSV_DECODE_COLD_SYNC_RSP_DATA_FUNC_PTR 
dec_cs_data_func_list[] = {
 dec_cs_comp_config, dec_cs_comp_cs_type_config, dec_cs_siass,
 dec_cs_si_trans,dec_cs_async_updt_cnt};
 
+void set_node_failover_state(AVD_CL_CB *cb, const SaClmNodeIdT node_id,
+const uint32_t state) {
+  TRACE_ENTER();
+
+  if (state == NodeState::NodeStates::kUndefined) {
+// not in failover list
+return;
+  }
+
+  auto failed_node = cb->failover_list.find(node_id);
+  if (failed_node != cb->failover_list.end()) {
+failed_node->second->SetState(state);
+  } else {
+LOG_NO("Node '%u' not found in failover_list. Create new entry",
+node_id);
+auto new_node = std::make_shared(cb, node_id);
+// node must be added to failover_list before SetState() is called.
+// If the state is 'end', then it will be deleted by SetState().
+// Otherwise, we will leave a node in 'End' state mistakenly in
+// failover_list.
+cb->failover_list[node_id] = new_node;
+new_node->SetState(state);
+  }
+}
+
 void decode_cb(NCS_UBAID *ub, AVD_CL_CB *cb, const uint16_t peer_version) {
   osaf_decode_uint32(ub, reinterpret_cast(>init_state));
   osaf_decode_satimet(ub, >cluster_init_time);
@@ -254,6 +279,9 @@ void decode_node_config(NCS_UBAID *ub, AVD_AVND *avnd,
   osaf_decode_uint32(ub, >rcv_msg_id);
   osaf_decode_uint32(ub, >snd_msg_id);
   osaf_extended_name_free(_name);
+  if (peer_version >= AVD_MBCSV_SUB_PART_VERSION_10) {
+osaf_decode_uint32(ub, >failover_state);
+  }
   TRACE_LEAVE();
 }
 
@@ -585,7 +613,7 @@ void decode_siass(NCS_UBAID *ub, AVSV_SU_SI_REL_CKPT_MSG 
*su_si_ckpt,
 su_si_ckpt->csi_add_rem = static_cast(csi_add_rem);
 osaf_decode_sanamet(ub, _si_ckpt->comp_name);
 osaf_decode_sanamet(ub, _si_ckpt->csi_name);
-  };
+  }
 }
 
 /\
@@ -2199,6 +2227,7 @@ static uint32_t dec_cs_node_config(AVD_CL_CB *cb, 
NCS_MBCSV_CB_DEC *dec,
   for (count = 0; count < num_of_obj; count++) {
 decode_node_config(>i_uba, , dec->i_peer_version);
 status = avd_ckpt_node(cb, , dec->i_action);
+set_node_failover_state(cb, avnd.node_info.nodeId, avnd.failover_state);
 osafassert(status == NCSCC_RC_SUCCESS);
   }
 
@@ -2552,14 +2581,23 @@ static uint32_t dec_cs_async_updt_cnt(AVD_CL_CB *cb, 
NCS_MBCSV_CB_DEC *dec,
   /*
* Decode and send async update counts for all the data structures.
*/
-  if (dec->i_peer_version >= AVD_MBCSV_SUB_PART_VERSION_7) {
+  if (dec->i_peer_version >= AVD_MBCSV_SUB_PART_VERSION_10) {
 TRACE(
-"Peer AMFD version is >= AVD_MBCSV_SUB_PART_VERSION_7,"
+"Peer AMFD version is >= AVD_MBCSV_SUB_PART_VERSION_10,"
 "peer ver:%d",
 avd_cb->avd_peer_ver);
 status = m_NCS_EDU_VER_EXEC(>edu_hdl, avsv_edp_ckpt_msg_async_updt_cnt,
 >i_uba, EDP_OP_TYPE_DEC, _cnt,
 , dec->i_peer_version);
+

[devel] [PATCH 2/4] fmd: add active promotion supervision timer [#3029]

2019-07-09 Thread Gary Lee

Add supervision timer so controller will reboot if it cannot obtain
consensus lock within the allocation period
(2* FMS_TAKEOVER_REQUEST_VALID_TIME).

The peer controller can then safely perform a node failover
after this period of time.
---
 src/fm/fmd/fm_cb.h|  2 ++
 src/fm/fmd/fm_main.cc | 14 -
 src/fm/fmd/fm_rda.cc  | 87 ++-
 3 files changed, 74 insertions(+), 29 deletions(-)

diff --git a/src/fm/fmd/fm_cb.h b/src/fm/fmd/fm_cb.h
index 6eb0d54..b5ea5ae 100644
--- a/src/fm/fmd/fm_cb.h
+++ b/src/fm/fmd/fm_cb.h
@@ -39,6 +39,7 @@ typedef enum {
   FM_TMR_TYPE_MIN,
   FM_TMR_PROMOTE_ACTIVE,
   FM_TMR_ACTIVATION_SUPERVISION,
+  FM_TMR_CONSENSUS_SERVICE_SUPERVISION,
   FM_TMR_TYPE_MAX
 } FM_TMR_TYPE;
 
@@ -83,6 +84,7 @@ struct FM_CB {
   /* Timers */
   FM_TMR promote_active_tmr{};
   FM_TMR activation_supervision_tmr{};
+  FM_TMR consensus_service_supervision_tmr{};
 
   /* Time in terms of one hundredth of seconds (500 for 5 secs.) */
   uint32_t active_promote_tmr_val{};
diff --git a/src/fm/fmd/fm_main.cc b/src/fm/fmd/fm_main.cc
index 2eb3c16..4a843cc 100644
--- a/src/fm/fmd/fm_main.cc
+++ b/src/fm/fmd/fm_main.cc
@@ -59,7 +59,8 @@ static uint32_t fm_get_args(FM_CB *);
 static uint32_t fms_fms_exchange_node_info(FM_CB *);
 static uint32_t fms_fms_inform_terminating(FM_CB *fm_cb);
 static uint32_t fm_nid_notify(uint32_t);
-static uint32_t fm_tmr_start(FM_TMR *, SaTimeT);
+uint32_t fm_tmr_start(FM_TMR *, SaTimeT);
+void fm_tmr_stop(FM_TMR *tmr);
 static SaAisErrorT get_peer_clm_node_name(NODE_ID);
 static SaAisErrorT fm_clm_init();
 static void fm_mbx_msg_handler(FM_CB *, FM_EVT *);
@@ -449,6 +450,8 @@ static uint32_t fm_get_args(FM_CB *fm_cb) {
   /* Set timer variables */
   fm_cb->promote_active_tmr.type = FM_TMR_PROMOTE_ACTIVE;
   fm_cb->activation_supervision_tmr.type = FM_TMR_ACTIVATION_SUPERVISION;
+  fm_cb->consensus_service_supervision_tmr.type =
+FM_TMR_CONSENSUS_SERVICE_SUPERVISION;
 
   char *node_isolation_timeout = getenv("FMS_NODE_ISOLATION_TIMEOUT");
   if (node_isolation_timeout != NULL) {
@@ -704,6 +707,11 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
"Activation timer supervision "
"expired: no ACTIVE assignment received "
"within the time limit");
+  } else if (fm_mbx_evt->info.fm_tmr->type ==
+ FM_TMR_CONSENSUS_SERVICE_SUPERVISION) {
+opensaf_quick_reboot("Consensus service supervision "
+ "expired: controller was not promoted "
+ "within the time limit");
   }
   break;
 
@@ -728,6 +736,10 @@ static void fm_evt_proc_rda_callback(FM_CB *cb, FM_EVT 
*evt) {
   uint32_t rc = NCSCC_RC_SUCCESS;
 
   TRACE_ENTER2("%d", (int)evt->info.rda_info.role);
+  if (evt->info.rda_info.role == PCS_RDA_ACTIVE) {
+LOG_NO("Controller promoted. Stop supervision timer");
+fm_tmr_stop(_cb->consensus_service_supervision_tmr);
+  }
   if (evt->info.rda_info.role != PCS_RDA_ACTIVE &&
   cb->activation_supervision_tmr.status == FM_TMR_RUNNING) {
 fm_tmr_stop(>activation_supervision_tmr);
diff --git a/src/fm/fmd/fm_rda.cc b/src/fm/fmd/fm_rda.cc
index d3063ba..c072cb0 100644
--- a/src/fm/fmd/fm_rda.cc
+++ b/src/fm/fmd/fm_rda.cc
@@ -23,6 +23,8 @@
 #include "osaf/consensus/consensus.h"
 #include "rde/agent/rda_papi.h"
 
+extern uint32_t fm_tmr_start(FM_TMR *tmr, SaTimeT period);
+extern void fm_tmr_stop(FM_TMR *tmr);
 extern void rda_cb(uint32_t cb_hdl, PCS_RDA_CB_INFO *cb_info,
PCSRDA_RETURN_CODE error_code);
 /
@@ -64,6 +66,47 @@ done:
   return rc;
 }
 
+void promote_node(FM_CB *fm_cb) {
+  TRACE_ENTER();
+
+  Consensus consensus_service;
+  if (consensus_service.PrioritisePartitionSize() == true) {
+// Allow topology events to be processed first. The MDS thread may
+// be processing MDS down events and updating cluster_size concurrently.
+// We need cluster_size to be as accurate as possible, without waiting
+// too long for node down events.
+std::this_thread::sleep_for(std::chrono::seconds(2));
+  }
+
+  uint32_t rc;
+  rc = consensus_service.PromoteThisNode(true, fm_cb->cluster_size);
+  if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) {
+LOG_ER("Unable to set active controller in consensus service");
+opensaf_quick_reboot("Unable to set active controller "
+  "in consensus service");
+  } else if (rc == SA_AIS_ERR_EXIST) {
+// @todo if we don't reboot, we don't seem to recover from this. Can we
+// improve?
+LOG_ER(
+"A controller is already active. We were separated from the "
+"cluster?");
+opensaf_quick_reboot("A controller is already active. We were separated "
+ "from the cluster?");
+  }
+
+  PCS_RDA_REQ rda_req;
+
+  /* set the RDA role to active */
+

[devel] [PATCH 4/4] osaf: make wait time configurable [#3029]

2019-07-09 Thread Gary Lee

If FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is enabled,
make the time that we wait for MDS node events configurable.
---
 src/fm/fmd/fm_rda.cc| 4 +++-
 src/fm/fmd/fmd.conf | 5 +
 src/osaf/consensus/consensus.cc | 9 +
 src/osaf/consensus/consensus.h  | 2 ++
 src/rde/rded/role.cc| 4 +++-
 5 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/src/fm/fmd/fm_rda.cc b/src/fm/fmd/fm_rda.cc
index c072cb0..fca417f 100644
--- a/src/fm/fmd/fm_rda.cc
+++ b/src/fm/fmd/fm_rda.cc
@@ -75,7 +75,9 @@ void promote_node(FM_CB *fm_cb) {
 // be processing MDS down events and updating cluster_size concurrently.
 // We need cluster_size to be as accurate as possible, without waiting
 // too long for node down events.
-std::this_thread::sleep_for(std::chrono::seconds(2));
+std::this_thread::sleep_for(
+  std::chrono::seconds(
+consensus_service.PrioritisePartitionSizeWaitTime()));
   }
 
   uint32_t rc;
diff --git a/src/fm/fmd/fmd.conf b/src/fm/fmd/fmd.conf
index 209e484..4dbf53a 100644
--- a/src/fm/fmd/fmd.conf
+++ b/src/fm/fmd/fmd.conf
@@ -36,6 +36,11 @@ export FMS_TAKEOVER_REQUEST_VALID_TIME=20
 # Default is 1
 #export FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE=1
 
+# If FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is set to 1, wait until
+# this number of seconds for MDS events before making a decision
+# on partition size. Default is 4 seconds
+#export FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE_MDS_WAIT_TIME=4
+
 # Default behaviour is not to allow promotion of this node to Active
 # unless a lock can be obtained, if split brain prevention is enabled.
 # Uncomment the next line to allow promotion of this node at cluster startup,
diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc
index 814885e..0e37fa3 100644
--- a/src/osaf/consensus/consensus.cc
+++ b/src/osaf/consensus/consensus.cc
@@ -207,6 +207,10 @@ bool Consensus::PrioritisePartitionSize() const {
   return prioritise_partition_size_;
 }
 
+uint32_t Consensus::PrioritisePartitionSizeWaitTime() const {
+  return prioritise_partition_size_mds_wait_time_;
+}
+
 uint32_t Consensus::TakeoverValidTime() const {
   return takeover_valid_time_;
 }
@@ -253,6 +257,8 @@ void Consensus::ProcessEnvironmentSettings() {
   uint32_t use_remote_fencing = base::GetEnv("FMS_USE_REMOTE_FENCING", 0);
   uint32_t prioritise_partition_size =
 base::GetEnv("FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE", 1);
+  uint32_t prioritise_partition_size_mds_wait_time =
+base::GetEnv("FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE_MDS_WAIT_TIME", 4);
   uint32_t relaxed_node_promotion =
 base::GetEnv("FMS_RELAXED_NODE_PROMOTION", 0);
   config_file_ = base::GetEnv("FMS_CONF_FILE", "");
@@ -281,6 +287,9 @@ void Consensus::ProcessEnvironmentSettings() {
   if (use_consensus_ == true && relaxed_node_promotion == 1) {
 relaxed_node_promotion_ = true;
   }
+
+  prioritise_partition_size_mds_wait_time_ =
+prioritise_partition_size_mds_wait_time;
 }
 
 bool Consensus::ReloadConfiguration() {
diff --git a/src/osaf/consensus/consensus.h b/src/osaf/consensus/consensus.h
index 1fabf90..1aba561 100644
--- a/src/osaf/consensus/consensus.h
+++ b/src/osaf/consensus/consensus.h
@@ -61,6 +61,7 @@ class Consensus {
   bool IsRelaxedNodePromotionEnabled() const;
 
   bool PrioritisePartitionSize() const;
+  uint32_t PrioritisePartitionSizeWaitTime() const;
 
   uint32_t TakeoverValidTime() const;
 
@@ -100,6 +101,7 @@ class Consensus {
   bool use_consensus_{false};
   bool use_remote_fencing_{false};
   bool prioritise_partition_size_{true};
+  uint32_t prioritise_partition_size_mds_wait_time_{4};
   bool relaxed_node_promotion_{false};
   uint32_t takeover_valid_time_{20};
   uint32_t max_takeover_retry_{0};
diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index b8c8157..b890117 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -83,7 +83,9 @@ void Role::MonitorCallback(const std::string& key, const 
std::string& new_value,
 consensus_service.PrioritisePartitionSize() == true) {
   // don't send this to the main thread straight away, as it will
   // need some time to process topology changes.
-  std::this_thread::sleep_for(std::chrono::seconds(4));
+  std::this_thread::sleep_for(
+std::chrono::seconds(
+  consensus_service.PrioritisePartitionSizeWaitTime()));
 }
   } else {
 msg->type = RDE_MSG_NEW_ACTIVE_CALLBACK;
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/4] Review Request for amfd: improve controller failover behavior V2 [#3029]

2019-07-09 Thread Gary Lee

Summary: amfd: improve controller failover behavior [#3029]
Review request for Ticket(s): 3029
Peer Reviewer(s): Canh, Minh, Hans 
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3029
Base revision: 71852f322b42437f074bfa4c618c021798357143
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy 
 OpenSAF servicesy
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 4feee2b631afa3393ae9e53fd6575c3768861dca
Author: Gary Lee 
Date:   Tue, 9 Jul 2019 14:38:49 +1000

osaf: make wait time configurable [#3029]

If FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is enabled,
make the time that we wait for MDS node events configurable.



revision 2c419ba5fffb85272f0d15118b561bcfc1de4814
Author: Gary Lee 
Date:   Tue, 9 Jul 2019 14:38:49 +1000

amfd: improve controller failover behavior [#3029]

If consensus service is enabled, only perform node failover
after peer controller has self-fenced
(after 2 * FMS_TAKEOVER_REQUEST_VALID_TIME seconds).

This also means if node failover delay is set to a large value,
we do not unnecesarily wait too long before failing over assignments
previously assigned to the peer controller.

Remove unused fmd_conf_file variable.

Change some LOG_ER calls to LOG_WA.



revision 7c4fff483477082ca66a26f921a50b3bc1240538
Author: Gary Lee 
Date:   Tue, 9 Jul 2019 14:38:49 +1000

fmd: add active promotion supervision timer [#3029]

Add supervision timer so controller will reboot if it cannot obtain
consensus lock within the allocation period
(2* FMS_TAKEOVER_REQUEST_VALID_TIME).

The peer controller can then safely perform a node failover
after this period of time.



revision 8b596a228402ff99b26906138daf920c23e965e7
Author: Gary Lee 
Date:   Tue, 9 Jul 2019 14:38:49 +1000

osaf: add function to return takeover request expiry time [#3029]



Complete diffstat:
--
 src/amf/amfd/cb.h  |  1 -
 src/amf/amfd/clm.cc|  4 +-
 src/amf/amfd/main.cc   |  1 -
 src/amf/amfd/ndfsm.cc  |  8 ++--
 src/amf/amfd/ndproc.cc | 19 
 src/amf/amfd/node_state.cc | 23 +-
 src/amf/amfd/node_state_machine.cc | 19 
 src/amf/amfd/node_state_machine.h  |  2 +
 src/amf/amfd/proc.h|  1 +
 src/fm/fmd/fm_cb.h |  2 +
 src/fm/fmd/fm_main.cc  | 14 +-
 src/fm/fmd/fm_rda.cc   | 89 ++
 src/fm/fmd/fmd.conf|  5 +++
 src/osaf/consensus/consensus.cc| 13 ++
 src/osaf/consensus/consensus.h |  4 ++
 src/rde/rded/role.cc   |  4 +-
 16 files changed, 160 insertions(+), 49 deletions(-)


Testing Commands:
-
1) Ensure a 2N application is active on standby controller,
   and standy on the active controller
2) Isolate active & standby controller


Testing, Expected Results:
--
amfd should failover 2N application only after
2 * FMS_TAKEOVER_REQUEST_VALID_TIME seconds

Conditions of Submission:
-
ack from any reviewer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code c

Re: [devel] [PATCH 1/1] amfd: disallow delete of CtCs object if Ct maps to comp [#3028]

2019-07-03 Thread Gary Lee


Hi Phuc

ack, will push on your behalf.

Thanks

Gary

On 25/6/19 7:13 pm, phuc.h.chau wrote:

Amfd crashes when su is unlocked, The reason for the crash is in the
function avd_snd_susi_msg(),get_comp_capability() is called
with csi and comp as input parameter.

In the function, get_comp_capability(), there is no CtCs object available
so ctcstype_db->find returns NULL to ctcs_type.
While accessing ctcs_type->saAmfCtCompCapability,
AMfd crashes because ctcs_type is NULL.
---
  src/amf/amfd/ctcstype.cc | 46 +-
  1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfd/ctcstype.cc b/src/amf/amfd/ctcstype.cc
index 5dffdae..7e62358 100644
--- a/src/amf/amfd/ctcstype.cc
+++ b/src/amf/amfd/ctcstype.cc
@@ -187,13 +187,57 @@ static SaAisErrorT 
ctcstype_ccb_completed_cb(CcbUtilOperationData_t *opdata) {
opdata, "Modification of SaAmfCtCsType not supported");
break;
  case CCBUTIL_DELETE:
+  AVD_CTCS_TYPE *ctcstype;
+  AVD_COMP_TYPE *comp_type;
+  AVD_COMP *comp;
+  CcbUtilOperationData_t *t_opData;
+
+  ctcstype = ctcstype_db->find(Amf::to_string(>objectName));
+  if (ctcstype != nullptr) {
+std::string cst_name, ct_name;
+avsv_sanamet_init(Amf::to_string(>objectName),
+  cst_name, "safCSType=");
+avsv_sanamet_init(cst_name, ct_name, "safVersion");
+TRACE("'%s'", ct_name.c_str());
+comp_type = comptype_db->find(ct_name);
+if ((comp_type) && (nullptr != comp_type->list_of_comp)) {
+  /* check whether there exists a delete operation for
+  * each of the Comp in the comp_type list in the current CCB
+  */
+  bool comp_exist = false;
+  TRACE("SaAmfCompType '%s' has components", comp_type->name.c_str());
+  comp = comp_type->list_of_comp;
+  while (comp != nullptr) {
+TRACE("%s", osaf_extended_name_borrow(>comp_info.name));
+t_opData = ccbutil_getCcbOpDataByDN(opdata->ccbId,
+>comp_info.name);
+TRACE("%p", t_opData);
+if ((t_opData == nullptr) ||
+(t_opData->operationType != CCBUTIL_DELETE)) {
+  TRACE("OperationType: %p", t_opData);
+  comp_exist = true;
+  break;
+}
+comp = comp->comp_type_list_comp_next;
+  }
+  if (comp_exist == true) {
+rc = SA_AIS_ERR_BAD_OPERATION;
+report_ccb_validation_error(opdata, "SaAmfCompType '%s' is in use",
+comp_type->name.c_str());
+goto done;
+  }
+} else {
+TRACE("SaAmfCompType '%p'. SaAmfCompType '%s' has no components",
+  comp_type, ct_name.c_str());
+  }
+  }
rc = SA_AIS_OK;
break;
  default:
osafassert(0);
break;
}
-
+done:
TRACE_LEAVE2("%u", rc);
return rc;
  }



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/3] osaf: add function to return takeover request expiry time [#3029]

2019-07-03 Thread Gary Lee

---
 src/osaf/consensus/consensus.cc | 4 
 src/osaf/consensus/consensus.h  | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc
index 0bebab2..814885e 100644
--- a/src/osaf/consensus/consensus.cc
+++ b/src/osaf/consensus/consensus.cc
@@ -207,6 +207,10 @@ bool Consensus::PrioritisePartitionSize() const {
   return prioritise_partition_size_;
 }
 
+uint32_t Consensus::TakeoverValidTime() const {
+  return takeover_valid_time_;
+}
+
 std::string Consensus::CurrentActive() const {
   TRACE_ENTER();
   if (use_consensus_ == false) {
diff --git a/src/osaf/consensus/consensus.h b/src/osaf/consensus/consensus.h
index eb12b2c..1fabf90 100644
--- a/src/osaf/consensus/consensus.h
+++ b/src/osaf/consensus/consensus.h
@@ -62,6 +62,8 @@ class Consensus {
 
   bool PrioritisePartitionSize() const;
 
+  uint32_t TakeoverValidTime() const;
+
   // Determine if plugin is telling us to self-fence due to loss
   // of connectivity to the KV store
   bool SelfFence(const std::string& request) const;
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/3] Review Request for amfd: improve controller failover behavior [#3029]

2019-07-03 Thread Gary Lee

Summary: osaf: add function to return takeover request expiry time [#3029]
Review request for Ticket(s): 3029
Peer Reviewer(s): Minh, Hans 
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3029
Base revision: 4f86e371d28a385f689011a0effef8aaae65e713
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy 
 OpenSAF servicesy 
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 1f48477cdcd92356cd446ad81741f9373724be7c
Author: Gary Lee 
Date:   Wed, 3 Jul 2019 16:19:17 +1000

amfd: improve controller failover behavior [#3029]

If consensus service is enabled, only perform node failover
after peer controller has self-fenced
(after 2 * FMS_TAKEOVER_REQUEST_VALID_TIME seconds).

This also means if node failover delay is set to a large value,
we do not unnecesarily wait too long before failing over assignments
previously assigned to the peer controller.

Remove unused fmd_conf_file variable.

Change some LOG_ER calls to LOG_WA.



revision 5e03fc3e30920989080f6617ca404f7f60f4a8cc
Author: Gary Lee 
Date:   Wed, 3 Jul 2019 16:19:10 +1000

fmd: add active promotion supervision timer [#3029]

Add supervision timer so controller will reboot if it cannot obtain
consensus lock within the allocation period
(2* FMS_TAKEOVER_REQUEST_VALID_TIME).

The peer controller can then safely perform a node failover
after this period of time.



revision c2a9e9d8712952526660efe678daee39f85d1d68
Author: Gary Lee 
Date:   Wed, 3 Jul 2019 15:34:36 +1000

osaf: add function to return takeover request expiry time [#3029]



Complete diffstat:
--
 src/amf/amfd/cb.h  |  1 -
 src/amf/amfd/clm.cc|  4 +-
 src/amf/amfd/main.cc   |  1 -
 src/amf/amfd/ndfsm.cc  |  8 ++--
 src/amf/amfd/ndproc.cc | 19 ++
 src/amf/amfd/node_state.cc | 23 +--
 src/amf/amfd/node_state_machine.cc | 19 ++
 src/amf/amfd/node_state_machine.h  |  2 +
 src/amf/amfd/proc.h|  1 +
 src/fm/fmd/fm_cb.h |  2 +
 src/fm/fmd/fm_main.cc  | 14 ++-
 src/fm/fmd/fm_rda.cc   | 78 ++
 src/osaf/consensus/consensus.cc|  4 ++
 src/osaf/consensus/consensus.h |  2 +
 14 files changed, 134 insertions(+), 44 deletions(-)


Testing Commands:
-
1) Ensure a 2N application is active on standby controller,
   and standy on the active controller
2) Isolate active & standby controller

Testing, Expected Results:
--
amfd should failover 2N application only after
2 * FMS_TAKEOVER_REQUEST_VALID_TIME seconds

Conditions of Submission:
-
Ack from reviewer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tre

[devel] [PATCH 2/3] fmd: add active promotion supervision timer [#3029]

2019-07-03 Thread Gary Lee

Add supervision timer so controller will reboot if it cannot obtain
consensus lock within the allocation period
(2* FMS_TAKEOVER_REQUEST_VALID_TIME).

The peer controller can then safely perform a node failover
after this period of time.
---
 src/fm/fmd/fm_cb.h|  2 ++
 src/fm/fmd/fm_main.cc | 14 -
 src/fm/fmd/fm_rda.cc  | 78 +++
 3 files changed, 69 insertions(+), 25 deletions(-)

diff --git a/src/fm/fmd/fm_cb.h b/src/fm/fmd/fm_cb.h
index 6eb0d54..b5ea5ae 100644
--- a/src/fm/fmd/fm_cb.h
+++ b/src/fm/fmd/fm_cb.h
@@ -39,6 +39,7 @@ typedef enum {
   FM_TMR_TYPE_MIN,
   FM_TMR_PROMOTE_ACTIVE,
   FM_TMR_ACTIVATION_SUPERVISION,
+  FM_TMR_CONSENSUS_SERVICE_SUPERVISION,
   FM_TMR_TYPE_MAX
 } FM_TMR_TYPE;
 
@@ -83,6 +84,7 @@ struct FM_CB {
   /* Timers */
   FM_TMR promote_active_tmr{};
   FM_TMR activation_supervision_tmr{};
+  FM_TMR consensus_service_supervision_tmr{};
 
   /* Time in terms of one hundredth of seconds (500 for 5 secs.) */
   uint32_t active_promote_tmr_val{};
diff --git a/src/fm/fmd/fm_main.cc b/src/fm/fmd/fm_main.cc
index 2eb3c16..4a843cc 100644
--- a/src/fm/fmd/fm_main.cc
+++ b/src/fm/fmd/fm_main.cc
@@ -59,7 +59,8 @@ static uint32_t fm_get_args(FM_CB *);
 static uint32_t fms_fms_exchange_node_info(FM_CB *);
 static uint32_t fms_fms_inform_terminating(FM_CB *fm_cb);
 static uint32_t fm_nid_notify(uint32_t);
-static uint32_t fm_tmr_start(FM_TMR *, SaTimeT);
+uint32_t fm_tmr_start(FM_TMR *, SaTimeT);
+void fm_tmr_stop(FM_TMR *tmr);
 static SaAisErrorT get_peer_clm_node_name(NODE_ID);
 static SaAisErrorT fm_clm_init();
 static void fm_mbx_msg_handler(FM_CB *, FM_EVT *);
@@ -449,6 +450,8 @@ static uint32_t fm_get_args(FM_CB *fm_cb) {
   /* Set timer variables */
   fm_cb->promote_active_tmr.type = FM_TMR_PROMOTE_ACTIVE;
   fm_cb->activation_supervision_tmr.type = FM_TMR_ACTIVATION_SUPERVISION;
+  fm_cb->consensus_service_supervision_tmr.type =
+FM_TMR_CONSENSUS_SERVICE_SUPERVISION;
 
   char *node_isolation_timeout = getenv("FMS_NODE_ISOLATION_TIMEOUT");
   if (node_isolation_timeout != NULL) {
@@ -704,6 +707,11 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
"Activation timer supervision "
"expired: no ACTIVE assignment received "
"within the time limit");
+  } else if (fm_mbx_evt->info.fm_tmr->type ==
+ FM_TMR_CONSENSUS_SERVICE_SUPERVISION) {
+opensaf_quick_reboot("Consensus service supervision "
+ "expired: controller was not promoted "
+ "within the time limit");
   }
   break;
 
@@ -728,6 +736,10 @@ static void fm_evt_proc_rda_callback(FM_CB *cb, FM_EVT 
*evt) {
   uint32_t rc = NCSCC_RC_SUCCESS;
 
   TRACE_ENTER2("%d", (int)evt->info.rda_info.role);
+  if (evt->info.rda_info.role == PCS_RDA_ACTIVE) {
+LOG_NO("Controller promoted. Stop supervision timer");
+fm_tmr_stop(_cb->consensus_service_supervision_tmr);
+  }
   if (evt->info.rda_info.role != PCS_RDA_ACTIVE &&
   cb->activation_supervision_tmr.status == FM_TMR_RUNNING) {
 fm_tmr_stop(>activation_supervision_tmr);
diff --git a/src/fm/fmd/fm_rda.cc b/src/fm/fmd/fm_rda.cc
index d3063ba..0544152 100644
--- a/src/fm/fmd/fm_rda.cc
+++ b/src/fm/fmd/fm_rda.cc
@@ -23,6 +23,8 @@
 #include "osaf/consensus/consensus.h"
 #include "rde/agent/rda_papi.h"
 
+extern uint32_t fm_tmr_start(FM_TMR *tmr, SaTimeT period);
+extern void fm_tmr_stop(FM_TMR *tmr);
 extern void rda_cb(uint32_t cb_hdl, PCS_RDA_CB_INFO *cb_info,
PCSRDA_RETURN_CODE error_code);
 /
@@ -64,6 +66,47 @@ done:
   return rc;
 }
 
+void promote_node(FM_CB *fm_cb) {
+  TRACE_ENTER();
+
+  Consensus consensus_service;
+  if (consensus_service.PrioritisePartitionSize() == true) {
+// Allow topology events to be processed first. The MDS thread may
+// be processing MDS down events and updating cluster_size concurrently.
+// We need cluster_size to be as accurate as possible, without waiting
+// too long for node down events.
+std::this_thread::sleep_for(std::chrono::seconds(2));
+  }
+
+  uint32_t rc;
+  rc = consensus_service.PromoteThisNode(true, fm_cb->cluster_size);
+  if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) {
+LOG_ER("Unable to set active controller in consensus service");
+opensaf_quick_reboot("Unable to set active controller "
+  "in consensus service");
+  } else if (rc == SA_AIS_ERR_EXIST) {
+// @todo if we don't reboot, we don't seem to recover from this. Can we
+// improve?
+LOG_ER(
+"A controller is already active. We were separated from the "
+"cluster?");
+opensaf_quick_reboot("A controller is already active. We were separated "
+ "from the cluster?");
+  }
+
+  PCS_RDA_REQ rda_req;
+
+  /* set the RDA role to active */
+

[devel] [PATCH 3/3] amfd: improve controller failover behavior [#3029]

2019-07-03 Thread Gary Lee

If consensus service is enabled, only perform node failover
after peer controller has self-fenced
(after 2 * FMS_TAKEOVER_REQUEST_VALID_TIME seconds).

This also means if node failover delay is set to a large value,
we do not unnecesarily wait too long before failing over assignments
previously assigned to the peer controller.

Remove unused fmd_conf_file variable.

Change some LOG_ER calls to LOG_WA.
---
 src/amf/amfd/cb.h  |  1 -
 src/amf/amfd/clm.cc|  4 ++--
 src/amf/amfd/main.cc   |  1 -
 src/amf/amfd/ndfsm.cc  |  8 
 src/amf/amfd/ndproc.cc | 19 +++
 src/amf/amfd/node_state.cc | 23 ---
 src/amf/amfd/node_state_machine.cc | 19 +++
 src/amf/amfd/node_state_machine.h  |  2 ++
 src/amf/amfd/proc.h|  1 +
 9 files changed, 59 insertions(+), 19 deletions(-)

diff --git a/src/amf/amfd/cb.h b/src/amf/amfd/cb.h
index 89cf15d..7ac743e 100644
--- a/src/amf/amfd/cb.h
+++ b/src/amf/amfd/cb.h
@@ -202,7 +202,6 @@ typedef struct cl_cb_tag {
   AVD_TMR heartbeat_tmr; /* The timer for sending heart beats to nd. */
   SaTimeT heartbeat_tmr_period;
   uint32_t minimum_cluster_size;
-  std::string fmd_conf_file;
 
   uint32_t nodes_exit_cnt; /* The counter to identifies the number
   of nodes that have exited the membership
diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc
index aeae939..cfbe36a 100644
--- a/src/amf/amfd/clm.cc
+++ b/src/amf/amfd/clm.cc
@@ -203,7 +203,7 @@ static void clm_node_exit_complete(SaClmNodeIdT nodeId) {
   }
 
   if (avd_cb->failover_list.count(node->node_info.nodeId) == 0 &&
-avd_cb->node_failover_delay == 0) {
+delay_failover(avd_cb, node->node_info.nodeId) == false) {
 avd_node_failover(node);
 avd_node_delete_nodeid(node);
   }
@@ -322,7 +322,7 @@ static void clm_track_cb(
 LOG_IN("%s: CLM node '%s' is not an AMF cluster member; MDS down 
received",
__FUNCTION__, node_name.c_str());
 if (avd_cb->failover_list.count(node->node_info.nodeId) == 0 &&
-  avd_cb->node_failover_delay == 0) {
+  delay_failover(avd_cb, node->node_info.nodeId) == false) {
   avd_node_delete_nodeid(node);
 }
 goto done;
diff --git a/src/amf/amfd/main.cc b/src/amf/amfd/main.cc
index e3d0957..03857a1 100644
--- a/src/amf/amfd/main.cc
+++ b/src/amf/amfd/main.cc
@@ -582,7 +582,6 @@ static uint32_t initialize(void) {
   }
   cb->minimum_cluster_size =
   base::GetEnv("OSAF_AMF_MIN_CLUSTER_SIZE", uint32_t{2});
-  cb->fmd_conf_file = base::GetEnv("FMS_CONF_FILE", "");
 
   node_list_db = new AmfDb;
   amfnd_svc_db = new std::set;
diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc
index 7099196..16b2def 100644
--- a/src/amf/amfd/ndfsm.cc
+++ b/src/amf/amfd/ndfsm.cc
@@ -811,7 +811,7 @@ void avd_mds_avnd_down_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
   std::shared_ptr failed_node =
 cb->failover_list.at(evt->info.node_id);
   failed_node->MdsDown();
-} else if (cb->node_failover_delay > 0) {
+} else if (delay_failover(cb, evt->info.node_id) == true) {
   LOG_NO("Node '%s' is down. Start failover delay timer",
   node->node_name.c_str());
 
@@ -821,10 +821,10 @@ void avd_mds_avnd_down_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
 }
 
 if (avd_cb->avail_state_avd == SA_AMF_HA_ACTIVE) {
-  if (cb->node_failover_delay == 0) {
+  check_quorum(cb);
+  if (delay_failover(cb, evt->info.node_id) == false) {
 avd_node_failover(node);
   }
-  check_quorum(cb);
   node->node_info.member = SA_FALSE;
   // Update standby out of sync if standby sc goes down
   if (avd_cb->node_id_avd_other == node->node_info.nodeId) {
@@ -833,7 +833,7 @@ void avd_mds_avnd_down_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
 m_AVSV_SEND_CKPT_UPDT_ASYNC_UPDT(avd_cb, node,
  AVSV_CKPT_AVD_NODE_CONFIG);
   }
-} else if (cb->node_failover_delay == 0) {
+} else if (delay_failover(cb, evt->info.node_id) == false) {
   /* Remove dynamic info for node but keep in nodeid tree.
* Possibly used at the end of controller failover to
* to failover payload nodes.
diff --git a/src/amf/amfd/ndproc.cc b/src/amf/amfd/ndproc.cc
index 5f5cbcd..0d30dfe 100644
--- a/src/amf/amfd/ndproc.cc
+++ b/src/amf/amfd/ndproc.cc
@@ -1277,6 +1277,25 @@ void avd_node_failover(AVD_AVND *node, const bool 
mw_only) {
   TRACE_LEAVE();
 }
 
+bool delay_failover(const AVD_CL_CB *cb, const SaClmNodeIdT node_id) {
+  TRACE_ENTER();
+  Consensus consensus_service;
+  bool delay = false;
+
+  if (cb->node_failover_delay > 0) {
+  delay = true;
+  } else if (node_id == cb->node_id_avd_other &&
+ consensus_service.IsEnabled() == true &&
+ consensus_service.IsRemoteFencingEnabled() == false) {
+// even though node failover delay is set to

Re: [devel] [PATCH 1/1] amf: check null before access to config objects [#3055]

2019-07-01 Thread Gary Lee


Hi Thang

ack (review only)

Thanks

Gary

On 2/7/19 12:25 pm, thang.d.nguyen wrote:

During controller goes up, it creats config object from IMM.
In case the object was deleted but comming up amfd still
receives ccb object delete callback. And it validates and
crash due to access to null pointer.
---
  src/amf/amfd/app.cc  | 17 ++---
  src/amf/amfd/apptype.cc  | 13 +++--
  src/amf/amfd/comptype.cc | 10 +-
  3 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/src/amf/amfd/app.cc b/src/amf/amfd/app.cc
index 424d828..67e5e3e 100644
--- a/src/amf/amfd/app.cc
+++ b/src/amf/amfd/app.cc
@@ -319,13 +319,16 @@ static void app_ccb_apply_cb(CcbUtilOperationData_t 
*opdata) {
  }
  case CCBUTIL_DELETE:
app = app_db->find(Amf::to_string(>objectName));
-  /* by this time all the SGs and SIs under this
-   * app object should have been *DELETED* just
-   * do a sanity check here
-   */
-  osafassert(app->list_of_sg == nullptr);
-  osafassert(app->list_of_si == nullptr);
-  avd_app_delete(app);
+  if ((app != nullptr) || (avd_cb->is_active() == true)) {
+/* by this time all the SGs and SIs under this
+ * app object should have been *DELETED* just
+ * do a sanity check here
+ */
+osafassert(app);
+osafassert(app->list_of_sg == nullptr);
+osafassert(app->list_of_si == nullptr);
+avd_app_delete(app);
+  }
break;
  default:
osafassert(0);
diff --git a/src/amf/amfd/apptype.cc b/src/amf/amfd/apptype.cc
index c22147f..20c94cb 100644
--- a/src/amf/amfd/apptype.cc
+++ b/src/amf/amfd/apptype.cc
@@ -155,6 +155,12 @@ static SaAisErrorT 
apptype_ccb_completed_cb(CcbUtilOperationData_t *opdata) {
break;
  case CCBUTIL_DELETE:
app_type = avd_apptype_get(object_name);
+  if (app_type == nullptr && avd_cb->is_active() == false) {
+opdata->userData = nullptr;
+rc = SA_AIS_OK;
+break;
+  }
+  osafassert(app_type);
if (nullptr != app_type->list_of_app) {
  /* check whether there exists a delete operation for
   * each of the App in the app_type list in the current CCB
@@ -201,8 +207,11 @@ static void apptype_ccb_apply_cb(CcbUtilOperationData_t 
*opdata) {
apptype_add_to_model(app_type);
break;
  case CCBUTIL_DELETE:
-  app_type = static_cast(opdata->userData);
-  apptype_delete(_type);
+  if ((opdata->userData != nullptr) || (avd_cb->is_active() == true)) {
+app_type = static_cast(opdata->userData);
+osafassert(app_type);
+apptype_delete(_type);
+  }
break;
  default:
osafassert(0);
diff --git a/src/amf/amfd/comptype.cc b/src/amf/amfd/comptype.cc
index 38582cc..48a333e 100644
--- a/src/amf/amfd/comptype.cc
+++ b/src/amf/amfd/comptype.cc
@@ -630,7 +630,9 @@ static void comptype_ccb_apply_cb(CcbUtilOperationData_t 
*opdata) {
comptype_db_add(comp_type);
break;
  case CCBUTIL_DELETE:
-  comptype_delete(static_cast(opdata->userData));
+  if ((opdata->userData != nullptr) || (avd_cb->is_active() == true)) {
+comptype_delete(static_cast(opdata->userData));
+  }
break;
  case CCBUTIL_MODIFY:
ccb_apply_modify_hdlr(opdata);
@@ -802,6 +804,12 @@ static SaAisErrorT 
comptype_ccb_completed_cb(CcbUtilOperationData_t *opdata) {
break;
  case CCBUTIL_DELETE:
comp_type = comptype_db->find(Amf::to_string(>objectName));
+  if (comp_type == nullptr && avd_cb->is_active() == false) {
+rc = SA_AIS_OK;
+opdata->userData = nullptr;
+break;
+  }
+  osafassert(comp_type);
if (nullptr != comp_type->list_of_comp) {
  /* check whether there exists a delete operation for
   * each of the Comp in the comp_type list in the current CCB



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] utils: Use a fence daemon as an alternative to payload reboot fencing [#3048]

2019-06-20 Thread Gary Lee


Hi Hans

Looks good, ack (review only).

One very, very minor comment:

# systemd services managed by fenced. Separate service names by 
whitespace, e.g. "opensafd"

SERVICES_TO_FENCE="opensafd"

I guess you could put a second service in the example :-)

Thanks

Gary

On 5/6/19 6:36 pm, Hans Nordebäck wrote:

---
  src/fm/Makefile.am|   6 +-
  src/fm/fmd/fm_amf.cc  |  14 +
  src/fm/fmd/tipc_server.cc |  93 ++
  src/fm/fmd/tipc_server.h  |  45 +++
  tools/devel/fenced/Makefile   |  63 
  tools/devel/fenced/README_TOOLS   |  15 +
  tools/devel/fenced/command.cc | 134 
  tools/devel/fenced/command.h  |  43 +++
  tools/devel/fenced/cpp_macros.h   |  33 ++
  tools/devel/fenced/fenced.conf|  17 +
  tools/devel/fenced/fenced_main.cc | 179 +++
  tools/devel/fenced/node_state_file.cc |  87 ++
  tools/devel/fenced/node_state_file.h  |  41 +++
  tools/devel/fenced/node_state_hdlr.cc |  54 
  tools/devel/fenced/node_state_hdlr.h  |  45 +++
  tools/devel/fenced/node_state_hdlr_factory.cc |  66 
  tools/devel/fenced/node_state_hdlr_factory.h  |  35 +++
  tools/devel/fenced/node_state_hdlr_pl.cc  | 292 ++
  tools/devel/fenced/node_state_hdlr_pl.h   |  60 
  tools/devel/fenced/node_state_hdlr_sc.cc  |  42 +++
  tools/devel/fenced/node_state_hdlr_sc.h   |  41 +++
  tools/devel/fenced/osaffenced.service |  14 +
  tools/devel/fenced/service.cc |  53 
  tools/devel/fenced/service.h  |  42 +++
  tools/devel/fenced/timer.cc   |  62 
  tools/devel/fenced/timer.h|  53 
  tools/devel/fenced/watchdog.cc|  37 +++
  tools/devel/fenced/watchdog.h |  39 +++
  28 files changed, 1703 insertions(+), 2 deletions(-)
  create mode 100644 src/fm/fmd/tipc_server.cc
  create mode 100644 src/fm/fmd/tipc_server.h
  create mode 100755 tools/devel/fenced/Makefile
  create mode 100644 tools/devel/fenced/README_TOOLS
  create mode 100644 tools/devel/fenced/command.cc
  create mode 100644 tools/devel/fenced/command.h
  create mode 100644 tools/devel/fenced/cpp_macros.h
  create mode 100644 tools/devel/fenced/fenced.conf
  create mode 100644 tools/devel/fenced/fenced_main.cc
  create mode 100644 tools/devel/fenced/node_state_file.cc
  create mode 100644 tools/devel/fenced/node_state_file.h
  create mode 100644 tools/devel/fenced/node_state_hdlr.cc
  create mode 100644 tools/devel/fenced/node_state_hdlr.h
  create mode 100644 tools/devel/fenced/node_state_hdlr_factory.cc
  create mode 100644 tools/devel/fenced/node_state_hdlr_factory.h
  create mode 100644 tools/devel/fenced/node_state_hdlr_pl.cc
  create mode 100644 tools/devel/fenced/node_state_hdlr_pl.h
  create mode 100644 tools/devel/fenced/node_state_hdlr_sc.cc
  create mode 100644 tools/devel/fenced/node_state_hdlr_sc.h
  create mode 100644 tools/devel/fenced/osaffenced.service
  create mode 100644 tools/devel/fenced/service.cc
  create mode 100644 tools/devel/fenced/service.h
  create mode 100644 tools/devel/fenced/timer.cc
  create mode 100644 tools/devel/fenced/timer.h
  create mode 100644 tools/devel/fenced/watchdog.cc
  create mode 100644 tools/devel/fenced/watchdog.h

diff --git a/src/fm/Makefile.am b/src/fm/Makefile.am
index 0f254b94f..325847ae9 100644
--- a/src/fm/Makefile.am
+++ b/src/fm/Makefile.am
@@ -20,7 +20,8 @@ noinst_HEADERS += \
src/fm/fmd/fm_cb.h \
src/fm/fmd/fm_evt.h \
src/fm/fmd/fm_mds.h \
-   src/fm/fmd/fm_mem.h
+   src/fm/fmd/fm_mem.h \
+   src/fm/fmd/tipc_server.h
  
  osaf_execbin_PROGRAMS += bin/osaffmd

  nodist_pkgclccli_SCRIPTS += \
@@ -44,7 +45,8 @@ bin_osaffmd_SOURCES = \
src/fm/fmd/fm_amf.cc \
src/fm/fmd/fm_main.cc \
src/fm/fmd/fm_mds.cc \
-   src/fm/fmd/fm_rda.cc
+   src/fm/fmd/fm_rda.cc \
+   src/fm/fmd/tipc_server.cc
  
  bin_osaffmd_LDADD = \

lib/libSaAmf.la \
diff --git a/src/fm/fmd/fm_amf.cc b/src/fm/fmd/fm_amf.cc
index e99f3ba7e..8cf284f97 100644
--- a/src/fm/fmd/fm_amf.cc
+++ b/src/fm/fmd/fm_amf.cc
@@ -34,6 +34,12 @@
  
**/
  
  #include "fm.h"

+#include "tipc_server.h"
+
+namespace {
+TIPCServer tipc_srv;
+}
+
  extern uint32_t gl_fm_hdl;
  
  uint32_t fm_amf_init(FM_AMF_CB *fm_amf_cb);

@@ -151,6 +157,11 @@ void fm_saf_CSI_set_callback(SaInvocationT invocation, 
const SaNameT *compName,
  } else {
fm_cb->amf_state = new_haState;
fm_cb->csi_assigned = true;
+  if (new_haState == SA_AMF_HA_ACTIVE) {
+tipc_srv.publish();
+  } else {
+tipc_srv.unpublish();
+  }
  }
  error = saAmfResponse(fm_amf_cb->amf_hdl, invocation, error);
}
@@

[devel] [PATCH 0/1] Review Request for amfd: prevent infinite loop V3 [#3050]

2019-06-19 Thread Gary Lee

Summary: amfd: prevent infinite loop [#3050]
Review request for Ticket(s): 3050
Peer Reviewer(s): Hans, Minh
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3050
Base revision: 68efc6010fda86d62300a687bbd8c52cba232479
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy 
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 67404028af391330860c8edb45fc0442fb90a283
Author: Gary Lee 
Date:   Thu, 20 Jun 2019 12:07:57 +1000

amfd: prevent infinite loop [#3050]

In handle_event_in_failover_state(), we iterate through
queue_evt in a while loop, but process_event() can insert
items into the queue inside the loop, and we may end
up never exiting the while loop.



Complete diffstat:
--
 src/amf/amfd/main.cc | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)


Testing Commands:
-
See ticket

Testing, Expected Results:
--
amfd does not go into an infinite loop and get
terminated by the watchdog

Conditions of Submission:
-
ack from anyone

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] amfd: prevent infinite loop [#3050]

2019-06-19 Thread Gary Lee

In handle_event_in_failover_state(), we iterate through
queue_evt in a while loop, but process_event() can insert
items into the queue inside the loop, and we may end
up never exiting the while loop.
---
 src/amf/amfd/main.cc | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/amf/amfd/main.cc b/src/amf/amfd/main.cc
index 50daa59..e3d0957 100644
--- a/src/amf/amfd/main.cc
+++ b/src/amf/amfd/main.cc
@@ -406,12 +406,18 @@ static void handle_event_in_failover_state(AVD_EVT *evt) {
 
 /* Dequeue, all the messages from the queue
and process them now */
-
-while (!cb->evt_queue.empty()) {
+auto size_before_loop = cb->evt_queue.size();
+std::queue::size_type count = 0;
+while (count < size_before_loop) {
+  // note: process_event() may insert items into
+  // the queue, so terminate loop when we have
+  // processed all the original elements
+  // to avoid infinite loop
   AVD_EVT_QUEUE *queue_evt = cb->evt_queue.front();
   cb->evt_queue.pop();
   process_event(cb, queue_evt->evt);
   delete queue_evt;
+  ++count;
 }
 
 /* Walk through all the nodes to check if any of the nodes state is
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfd: prevent infinite loop [#3050]

2019-06-19 Thread Gary Lee


Hi

I think I have to re-work this.

23.3.3.4 of the C++11 standard says:

Effects: An erase operation that erases the last element of a deque 
invalidates only the past-the-end iterator and all iterators and 
references to the erased elements.


So I've probably done the wrong thing here.

On 19/6/19 1:24 pm, Gary Lee wrote:

In handle_event_in_failover_state(), we iterate through
queue_evt in a while loop, but process_event() can insert
items into the queue inside the loop, and we may end
up never exiting the while loop.
---
  src/amf/amfd/cb.h  |  3 ++-
  src/amf/amfd/main.cc   | 13 +
  src/amf/amfd/ndfsm.cc  |  4 ++--
  src/amf/amfd/ndproc.cc |  4 ++--
  4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/src/amf/amfd/cb.h b/src/amf/amfd/cb.h
index 89cf15d..4418db6 100644
--- a/src/amf/amfd/cb.h
+++ b/src/amf/amfd/cb.h
@@ -38,6 +38,7 @@
  #include 
  
  #include 

+#include 
  #include 
  #include 
  #include 
@@ -166,7 +167,7 @@ typedef struct cl_cb_tag {
std::queue nd_msg_queue_list{};
  
/* Event Queue to hold the events during fail-over */

-  std::queue evt_queue{};
+  std::deque evt_queue{};
/*
 * MBCSv related variables.
 */
diff --git a/src/amf/amfd/main.cc b/src/amf/amfd/main.cc
index 50daa59..d22bcb6 100644
--- a/src/amf/amfd/main.cc
+++ b/src/amf/amfd/main.cc
@@ -395,7 +395,7 @@ static void handle_event_in_failover_state(AVD_EVT *evt) {
  /* Enqueue this event */
  queue_evt = new AVD_EVT_QUEUE();
  queue_evt->evt = evt;
-cb->evt_queue.push(queue_evt);
+cb->evt_queue.push_back(queue_evt);
}
  
std::map::const_iterator it =

@@ -407,9 +407,14 @@ static void handle_event_in_failover_state(AVD_EVT *evt) {
  /* Dequeue, all the messages from the queue
 and process them now */
  
-while (!cb->evt_queue.empty()) {

-  AVD_EVT_QUEUE *queue_evt = cb->evt_queue.front();
-  cb->evt_queue.pop();
+// get ref to end of queue, to make sure we don't get stuck
+// iterating through the deque, as events may be added into
+// evt_queue inside the loop (to be refactored?)
+auto end_iter = cb->evt_queue.end();
+auto iter = cb->evt_queue.begin();
+while (iter != end_iter) {
+  AVD_EVT_QUEUE *queue_evt = *iter++;
+  cb->evt_queue.pop_front();
process_event(cb, queue_evt->evt);
delete queue_evt;
  }
diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc
index 8c8f3c5..b763c79 100644
--- a/src/amf/amfd/ndfsm.cc
+++ b/src/amf/amfd/ndfsm.cc
@@ -69,7 +69,7 @@ void avd_process_state_info_queue(AVD_CL_CB *cb) {
for (i = 0; i < queue_size; i++) {
  queue_evt = cb->evt_queue.front();
  osafassert(queue_evt->evt);
-cb->evt_queue.pop();
+cb->evt_queue.pop_front();
  
  TRACE("rcv_evt: %u", queue_evt->evt->rcv_evt);
  
@@ -95,7 +95,7 @@ void avd_process_state_info_queue(AVD_CL_CB *cb) {

delete queue_evt->evt;
delete queue_evt;
  } else {
-  cb->evt_queue.push(queue_evt);
+  cb->evt_queue.push_back(queue_evt);
  }
}
  
diff --git a/src/amf/amfd/ndproc.cc b/src/amf/amfd/ndproc.cc

index 5f5cbcd..433b00a 100644
--- a/src/amf/amfd/ndproc.cc
+++ b/src/amf/amfd/ndproc.cc
@@ -350,7 +350,7 @@ void avd_nd_sisu_state_info_evh(AVD_CL_CB *cb, AVD_EVT 
*evt) {
  state_info_evt->evt = new AVD_EVT{};
  state_info_evt->evt->rcv_evt = evt->rcv_evt;
  state_info_evt->evt->info.avnd_msg = n2d_msg;
-cb->evt_queue.push(state_info_evt);
+cb->evt_queue.push_back(state_info_evt);
} else {
  LOG_WA(
  "Ignore this sisu_state_info message since node sync window has 
closed");
@@ -392,7 +392,7 @@ void avd_nd_compcsi_state_info_evh(AVD_CL_CB *cb, AVD_EVT 
*evt) {
  state_info_evt->evt = new AVD_EVT{};
  state_info_evt->evt->rcv_evt = evt->rcv_evt;
  state_info_evt->evt->info.avnd_msg = n2d_msg;
-cb->evt_queue.push(state_info_evt);
+cb->evt_queue.push_back(state_info_evt);
} else {
  LOG_WA(
  "Ignore this compcsi_state_info message since node sync window has 
closed");



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for amfd: prevent infinite loop V2 [#3050]

2019-06-18 Thread Gary Lee

Summary: amfd: prevent infinite loop [#3050]
Review request for Ticket(s): 3050
Peer Reviewer(s): Minh, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3050
Base revision: 68efc6010fda86d62300a687bbd8c52cba232479
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy 
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 7455f8da651fb14838140da1c80fe0bf2db443fd
Author: Gary Lee 
Date:   Wed, 19 Jun 2019 13:12:35 +1000

amfd: prevent infinite loop [#3050]

In handle_event_in_failover_state(), we iterate through
queue_evt in a while loop, but process_event() can insert
items into the queue inside the loop, and we may end
up never exiting the while loop.



Complete diffstat:
--
 src/amf/amfd/cb.h  |  3 ++-
 src/amf/amfd/main.cc   | 13 +
 src/amf/amfd/ndfsm.cc  |  4 ++--
 src/amf/amfd/ndproc.cc |  4 ++--
 4 files changed, 15 insertions(+), 9 deletions(-)


Testing Commands:
-
See ticket for reproduction steps.

Testing, Expected Results:
--
amfd does not go into an infinite loop and get
terminated by the watchdog

Conditions of Submission:
-
ack from reviewer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] amfd: prevent infinite loop [#3050]

2019-06-18 Thread Gary Lee

In handle_event_in_failover_state(), we iterate through
queue_evt in a while loop, but process_event() can insert
items into the queue inside the loop, and we may end
up never exiting the while loop.
---
 src/amf/amfd/cb.h  |  3 ++-
 src/amf/amfd/main.cc   | 13 +
 src/amf/amfd/ndfsm.cc  |  4 ++--
 src/amf/amfd/ndproc.cc |  4 ++--
 4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/src/amf/amfd/cb.h b/src/amf/amfd/cb.h
index 89cf15d..4418db6 100644
--- a/src/amf/amfd/cb.h
+++ b/src/amf/amfd/cb.h
@@ -38,6 +38,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -166,7 +167,7 @@ typedef struct cl_cb_tag {
   std::queue nd_msg_queue_list{};
 
   /* Event Queue to hold the events during fail-over */
-  std::queue evt_queue{};
+  std::deque evt_queue{};
   /*
* MBCSv related variables.
*/
diff --git a/src/amf/amfd/main.cc b/src/amf/amfd/main.cc
index 50daa59..d22bcb6 100644
--- a/src/amf/amfd/main.cc
+++ b/src/amf/amfd/main.cc
@@ -395,7 +395,7 @@ static void handle_event_in_failover_state(AVD_EVT *evt) {
 /* Enqueue this event */
 queue_evt = new AVD_EVT_QUEUE();
 queue_evt->evt = evt;
-cb->evt_queue.push(queue_evt);
+cb->evt_queue.push_back(queue_evt);
   }
 
   std::map::const_iterator it =
@@ -407,9 +407,14 @@ static void handle_event_in_failover_state(AVD_EVT *evt) {
 /* Dequeue, all the messages from the queue
and process them now */
 
-while (!cb->evt_queue.empty()) {
-  AVD_EVT_QUEUE *queue_evt = cb->evt_queue.front();
-  cb->evt_queue.pop();
+// get ref to end of queue, to make sure we don't get stuck
+// iterating through the deque, as events may be added into
+// evt_queue inside the loop (to be refactored?)
+auto end_iter = cb->evt_queue.end();
+auto iter = cb->evt_queue.begin();
+while (iter != end_iter) {
+  AVD_EVT_QUEUE *queue_evt = *iter++;
+  cb->evt_queue.pop_front();
   process_event(cb, queue_evt->evt);
   delete queue_evt;
 }
diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc
index 8c8f3c5..b763c79 100644
--- a/src/amf/amfd/ndfsm.cc
+++ b/src/amf/amfd/ndfsm.cc
@@ -69,7 +69,7 @@ void avd_process_state_info_queue(AVD_CL_CB *cb) {
   for (i = 0; i < queue_size; i++) {
 queue_evt = cb->evt_queue.front();
 osafassert(queue_evt->evt);
-cb->evt_queue.pop();
+cb->evt_queue.pop_front();
 
 TRACE("rcv_evt: %u", queue_evt->evt->rcv_evt);
 
@@ -95,7 +95,7 @@ void avd_process_state_info_queue(AVD_CL_CB *cb) {
   delete queue_evt->evt;
   delete queue_evt;
 } else {
-  cb->evt_queue.push(queue_evt);
+  cb->evt_queue.push_back(queue_evt);
 }
   }
 
diff --git a/src/amf/amfd/ndproc.cc b/src/amf/amfd/ndproc.cc
index 5f5cbcd..433b00a 100644
--- a/src/amf/amfd/ndproc.cc
+++ b/src/amf/amfd/ndproc.cc
@@ -350,7 +350,7 @@ void avd_nd_sisu_state_info_evh(AVD_CL_CB *cb, AVD_EVT 
*evt) {
 state_info_evt->evt = new AVD_EVT{};
 state_info_evt->evt->rcv_evt = evt->rcv_evt;
 state_info_evt->evt->info.avnd_msg = n2d_msg;
-cb->evt_queue.push(state_info_evt);
+cb->evt_queue.push_back(state_info_evt);
   } else {
 LOG_WA(
 "Ignore this sisu_state_info message since node sync window has 
closed");
@@ -392,7 +392,7 @@ void avd_nd_compcsi_state_info_evh(AVD_CL_CB *cb, AVD_EVT 
*evt) {
 state_info_evt->evt = new AVD_EVT{};
 state_info_evt->evt->rcv_evt = evt->rcv_evt;
 state_info_evt->evt->info.avnd_msg = n2d_msg;
-cb->evt_queue.push(state_info_evt);
+cb->evt_queue.push_back(state_info_evt);
   } else {
 LOG_WA(
 "Ignore this compcsi_state_info message since node sync window has 
closed");
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfd: do not queue sync messages from 'lost' nodes [#3050]

2019-06-10 Thread Gary Lee


Hi Minh

On 11/6/19 10:33 am, Minh Hon Chau wrote:

Hi Gary,

Those variables e.g node_sync_window_closed have been used before 
headless sync complete. If there is a failover during the headless 
sync, the new active will start the headless sync again, so those 
variables have not been needed to checkpoint. But here the scenario 
happens in split brain, in which the new active is in separated 
network instead of coming from headless, so I guess we do need 
checkpoint it, but the checkpoint should be done after the headless 
sync ?


I will checkpoint node_sync_window_closed in a new version. As you 
pointed out, using the timer alone isn't sufficient as sync messages 
could come before the active controller's amfnd has sent node_up (and 
therefore starting the timer).



And the change in timer.h seems not much relates to this ticket?


The values in the timer structure aren't initialized at startup. So 
things like is_active has random values. It would be good just to set 
them to known values.


Thanks

Gary



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfd: disallow delete of CtCs object if Ct maps to comp [#3028]

2019-06-05 Thread Gary Lee


Hi Phuc

Some comments below.

Thanks

Gary

On 23/5/19 4:48 pm, phuc.h.chau wrote:

Amfd crashes when su is unlocked, The reason for the crash is in the
function avd_snd_susi_msg(),get_comp_capability() is called
with csi and comp as input parameter.

In the function, get_comp_capability(), there is no CtCs object available
so ctcstype_db->find returns NULL to ctcs_type.
While accessing ctcs_type->saAmfCtCompCapability,
AMfd crashes because ctcs_type is NULL.
---
  src/amf/amfd/ctcstype.cc | 65 +++-
  1 file changed, 64 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfd/ctcstype.cc b/src/amf/amfd/ctcstype.cc
index 5dffdae..3f30ebc 100644
--- a/src/amf/amfd/ctcstype.cc
+++ b/src/amf/amfd/ctcstype.cc
@@ -28,6 +28,10 @@
  
  AmfDb *ctcstype_db = nullptr;
  
+static void find_ct_name_from_association(const std::string& haystack,

+  std::string *dn,
+  const char *needle);
+
  static void ctcstype_db_add(AVD_CTCS_TYPE *ctcstype) {
unsigned int rc = ctcstype_db->insert(ctcstype->name, ctcstype);
osafassert(rc == NCSCC_RC_SUCCESS);
@@ -187,16 +191,75 @@ static SaAisErrorT 
ctcstype_ccb_completed_cb(CcbUtilOperationData_t *opdata) {
opdata, "Modification of SaAmfCtCsType not supported");
break;
  case CCBUTIL_DELETE:
+  AVD_CTCS_TYPE *ctcstype;
+  AVD_COMP_TYPE *comp_type;
+  AVD_COMP *comp;
+  CcbUtilOperationData_t *t_opData;
+
+  ctcstype = ctcstype_db->find(Amf::to_string(>objectName));
+  if (ctcstype != nullptr) {
+std::string ct_name;
+find_ct_name_from_association(Amf::to_string(>objectName),
+  _name, ",safVersion");
+TRACE("'%s'", ct_name.c_str());
+comp_type = comptype_db->find(ct_name);
+if ((comp_type) && (nullptr != comp_type->list_of_comp)) {
+  /* check whether there exists a delete operation for
+  * each of the Comp in the comp_type list in the current CCB
+  */
+  bool comp_exist = false;
+  TRACE("SaAmfCompType '%s' has components", comp_type->name.c_str());
+  comp = comp_type->list_of_comp;
+  while (comp != nullptr) {
+TRACE("%s", osaf_extended_name_borrow(>comp_info.name));
+t_opData = ccbutil_getCcbOpDataByDN(opdata->ccbId,
+>comp_info.name);
+TRACE("%p", t_opData);
+if ((t_opData == nullptr) ||
+(t_opData->operationType != CCBUTIL_DELETE)) {
+  TRACE("Here %p", t_opData);

[Gary] Maybe replace "Here" with a more useful description.

+  comp_exist = true;
+  break;
+}
+comp = comp->comp_type_list_comp_next;
+  }
+  if (comp_exist == true) {
+rc = SA_AIS_ERR_BAD_OPERATION;
+report_ccb_validation_error(opdata, "SaAmfCompType '%s' is in use",
+comp_type->name.c_str());
+goto done;
+  }
+} else {
+TRACE("SaAmfCompType '%p'. SaAmfCompType '%s' has no components",
+  comp_type, ct_name.c_str());
+  }
+  }
rc = SA_AIS_OK;
break;
  default:
osafassert(0);
break;
}
-
+done:
TRACE_LEAVE2("%u", rc);
return rc;
  }



[Gary] avsv_sanamet_init() should already do what you need below.


+/**
+* Initialize a DN by searching for needle in haystack
+* where two times safVersion comes.
+* @param haystack
+* @param dn
+* @param needle
+* @note: "safSupportedCsType=safVersion=1\,
+* safCSType=AmfDemo1,safVersion=1,safCompType=AmfDemo1"
+*/
+static void find_ct_name_from_association(const std::string& haystack,
+  std::string *dn,
+  const char *needle) {
+  std::string::size_type pos = haystack.find(needle);
+  *dn = haystack.substr(pos + 1);
+  TRACE("dn %s", (*dn).c_str());
+}
  
  static void ctcstype_ccb_apply_cb(CcbUtilOperationData_t *opdata) {

AVD_CTCS_TYPE *ctcstype;



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfnd: fix error reading from deallocated memory [#2568]

2019-06-04 Thread Gary Lee


Hi Thanh

I will push on your behalf.

Thanks

Gary

On 5/6/19 12:29 pm, Thanh Nguyen wrote:

Invalid read is from the following
- avnd_evt_mds_ava_dn_evh() (amf/amfnd/comp.cc)
- avsv_create_association_class_dn() (amf/common/util.c)
Other changes are to fix cppcheck error report
---
  src/amf/amfnd/comp.cc | 17 +
  src/amf/common/util.c |  6 +++---
  2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/src/amf/amfnd/comp.cc b/src/amf/amfnd/comp.cc
index 38b9224..857c1dc 100644
--- a/src/amf/amfnd/comp.cc
+++ b/src/amf/amfnd/comp.cc
@@ -428,8 +428,10 @@ uint32_t avnd_evt_mds_ava_dn_evh(AVND_CB *cb, AVND_EVT 
*evt) {
   entry from the cbk list and delete the cbq */
m_AVND_COMP_CBQ_INV_GET(comp, comp->term_cbq_inv_value, cbk_rec);
comp->term_cbq_inv_value = 0;
+  uint32_t opq_hdl = 0;
+  if (cbk_rec) opq_hdl = cbk_rec->opq_hdl;
rc = avnd_comp_clc_fsm_run(cb, comp, 
AVND_COMP_CLC_PRES_FSM_EV_TERM_SUCC);
-  if (cbk_rec) avnd_comp_cbq_rec_pop_and_del(cb, comp, cbk_rec->opq_hdl, 
false);
+  if (cbk_rec) avnd_comp_cbq_rec_pop_and_del(cb, comp, opq_hdl, false);
goto done;
  }
  /* found the matching comp; trigger error processing */
@@ -2228,9 +2230,7 @@ uint32_t avnd_amf_resp_send(AVND_CB *cb, 
AVSV_AMF_API_TYPE type,
AVND_MSG msg;
AVSV_ND2ND_AVND_MSG *avnd_msg;
uint32_t rc = NCSCC_RC_SUCCESS;
-  MDS_DEST i_to_dest;
AVSV_NDA_AVA_MSG *temp_ptr = nullptr;
-  NODE_ID node_id = 0;
MDS_SYNC_SND_CTXT temp_ctxt;
TRACE_ENTER();
  
@@ -2267,8 +2267,8 @@ uint32_t avnd_amf_resp_send(AVND_CB *cb, AVSV_AMF_API_TYPE type,

  msg.info.avnd->type = AVND_AVND_AVA_MSG;
  msg.type = AVND_MSG_AVND;
  /* Send it to AvND */
-node_id = m_NCS_NODE_ID_FROM_MDS_DEST(*dest);
-i_to_dest = avnd_get_mds_dest_from_nodeid(cb, node_id);
+NODE_ID node_id = m_NCS_NODE_ID_FROM_MDS_DEST(*dest);
+MDS_DEST i_to_dest = avnd_get_mds_dest_from_nodeid(cb, node_id);
  rc = avnd_avnd_mds_send(cb, i_to_dest, );
} else {
  /* now send the response */
@@ -2646,7 +2646,8 @@ void avnd_comp_cmplete_all_assignment(AVND_CB *cb, 
AVND_COMP *comp) {
   */
  temp_csi = m_AVND_COMPDB_REC_CSI_GET_FIRST(*comp);
  
-if (cbk->cbk_info->param.csi_set.ha != temp_csi->si->curr_state) {

+if (temp_csi &&
+   (cbk->cbk_info->param.csi_set.ha != temp_csi->si->curr_state)) {
avnd_comp_cbq_rec_pop_and_del(cb, comp, cbk->opq_hdl, true);
continue;
  }
@@ -2788,7 +2789,7 @@ uint32_t comp_restart_initiate(AVND_COMP *comp) {
  rc = avnd_comp_curr_info_del(cb, it.second);
  if (NCSCC_RC_SUCCESS != rc) goto done;
  
-	// unregister the contained comp

+// unregister the contained comp
  rc = avnd_comp_unregister_contained(cb, it.second);
  if (NCSCC_RC_SUCCESS != rc) goto done;
  
@@ -2956,7 +2957,7 @@ void avnd_comp_pres_state_set(const AVND_CB *cb, AVND_COMP *comp,

   (SA_AMF_PRESENCE_ORPHANED == prv_st {
  if (cb->is_avd_down == false) {
avnd_di_uns32_upd_send(AVSV_SA_AMF_COMP, saAmfCompPresenceState_ID,
- comp->name.c_str(), comp->pres);
+ comp->name, comp->pres);
  }
}
  
diff --git a/src/amf/common/util.c b/src/amf/common/util.c

index ec76c32..d17b766 100644
--- a/src/amf/common/util.c
+++ b/src/amf/common/util.c
@@ -240,12 +240,12 @@ void avsv_create_association_class_dn(const SaNameT 
*child_dn,
}
  
  	if (dn) {

+   TRACE("dn: %s", buf);
osaf_extended_name_steal(buf, dn);
}
-   TRACE_LEAVE2("child_dn: %s parent_dn: %s dn: %s",
+   TRACE_LEAVE2("child_dn: %s parent_dn: %s",
child_dn_ptr ? child_dn_ptr : "no child dn",
-   parent_dn_ptr ? parent_dn_ptr : "no parent dn",
-   buf);
+   parent_dn_ptr ? parent_dn_ptr : "no parent dn");
  }
  
  void avsv_sanamet_init_from_association_dn(const SaNameT *haystack, SaNameT *dn,



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for amfd: do not queue sync messages from 'lost' nodes [#3050]

2019-06-04 Thread Gary Lee

Summary: amfd: do not queue sync messages from 'lost' nodes [#3050]
Review request for Ticket(s): 3050
Peer Reviewer(s): Hans, Minh
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3050
Base revision: 135b0b8862da9a036553c5db02062edb278089aa
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy 
 OpenSAF servicesn 
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 9d64d3c1d386f1019103d12588ab46fa830ee793
Author: Gary Lee 
Date:   Wed, 5 Jun 2019 13:49:45 +1000

amfd: do not queue sync messages from 'lost' nodes [#3050]

The 'lost' nodes will be rebooted, thus there is no need
to queue sync messages from these nodes.

In addition, node_sync_window_closed is not reliable as it's not
check pointed. We should remove all uses of it in another ticket?

Instead, check if the timer is running.



Complete diffstat:
--
 src/amf/amfd/cb.h  |  2 ++
 src/amf/amfd/ndproc.cc | 30 ++
 src/amf/amfd/timer.h   | 12 ++--
 3 files changed, 30 insertions(+), 14 deletions(-)


Testing Commands:
-
See ticket for reproduction steps.

Testing, Expected Results:
--
Sync messages should be discarded and not put back into the queue.

2019-06-05 12:52:31.833 SC-2 osafamfd[254]: NO Receive message with event 
type:12, msg_type:31, from node:2030f, msg_id:0
2019-06-05 12:52:31.834 SC-2 osafamfd[254]: WA sisu_state_info messages 
received from lost node (2030f)
2019-06-05 12:52:31.834 SC-2 osafamfd[254]: NO Receive message with event 
type:13, msg_type:32, from node:2030f, msg_id:0
2019-06-05 12:52:31.834 SC-2 osafamfd[254]: WA compcsi_state_info messages 
received from lost node (2030f)

Conditions of Submission:
-
Ack from anyone

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Ope

[devel] [PATCH 1/1] amfd: do not queue sync messages from 'lost' nodes [#3050]

2019-06-04 Thread Gary Lee

The 'lost' nodes will be rebooted, thus there is no need
to queue sync messages from these nodes.

In addition, node_sync_window_closed is not reliable as it's not
check pointed. We should remove all uses of it in another ticket?

Instead, check if the timer is running.
---
 src/amf/amfd/cb.h  |  2 ++
 src/amf/amfd/ndproc.cc | 30 ++
 src/amf/amfd/timer.h   | 12 ++--
 3 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/src/amf/amfd/cb.h b/src/amf/amfd/cb.h
index 89cf15d..8902d78 100644
--- a/src/amf/amfd/cb.h
+++ b/src/amf/amfd/cb.h
@@ -237,6 +237,8 @@ typedef struct cl_cb_tag {
*/
   bool active_services_exist;
   bool all_nodes_synced;
+  // @todo this should be checkpointed to standby? otherwise
+  // after a controller failover, it will still be false?
   bool node_sync_window_closed;
 
   /*
diff --git a/src/amf/amfd/ndproc.cc b/src/amf/amfd/ndproc.cc
index 5f5cbcd..20008d9 100644
--- a/src/amf/amfd/ndproc.cc
+++ b/src/amf/amfd/ndproc.cc
@@ -345,19 +345,26 @@ void avd_nd_sisu_state_info_evh(AVD_CL_CB *cb, AVD_EVT 
*evt) {
   evt->info.avnd_msg->msg_info.n2d_nd_sisu_state_info.node_id,
   evt->info.avnd_msg->msg_info.n2d_nd_sisu_state_info.msg_id);
 
-  if (cb->node_sync_window_closed == false) {
+  const SaClmNodeIdT node_id =
+evt->info.avnd_msg->msg_info.n2d_nd_sisu_state_info.node_id;
+
+  if (cb->failover_list.find(node_id) != cb->failover_list.end()) {
+// ignore msg
+LOG_WA("sisu_state_info messages received from lost node (%x)",
+   node_id);
+  } else if (cb->node_sync_tmr.is_active == true) {
 AVD_EVT_QUEUE *state_info_evt = new AVD_EVT_QUEUE();
 state_info_evt->evt = new AVD_EVT{};
 state_info_evt->evt->rcv_evt = evt->rcv_evt;
 state_info_evt->evt->info.avnd_msg = n2d_msg;
 cb->evt_queue.push(state_info_evt);
+return;
   } else {
 LOG_WA(
-"Ignore this sisu_state_info message since node sync window has 
closed");
-avsv_dnd_msg_free(n2d_msg);
+  "Ignore this sisu_state_info message since node sync window has closed");
   }
 
-  TRACE_LEAVE();
+  avsv_dnd_msg_free(n2d_msg);
 }
 
 /*
@@ -387,19 +394,26 @@ void avd_nd_compcsi_state_info_evh(AVD_CL_CB *cb, AVD_EVT 
*evt) {
   evt->info.avnd_msg->msg_info.n2d_nd_csicomp_state_info.node_id,
   evt->info.avnd_msg->msg_info.n2d_nd_csicomp_state_info.msg_id);
 
-  if (cb->node_sync_window_closed == false) {
+  const SaClmNodeIdT node_id =
+evt->info.avnd_msg->msg_info.n2d_nd_csicomp_state_info.node_id;
+
+  if (cb->failover_list.find(node_id) != cb->failover_list.end()) {
+// ignore msg
+LOG_WA("compcsi_state_info messages received from lost node (%x)",
+   node_id);
+  } else if (cb->node_sync_tmr.is_active == true) {
 AVD_EVT_QUEUE *state_info_evt = new AVD_EVT_QUEUE();
 state_info_evt->evt = new AVD_EVT{};
 state_info_evt->evt->rcv_evt = evt->rcv_evt;
 state_info_evt->evt->info.avnd_msg = n2d_msg;
 cb->evt_queue.push(state_info_evt);
+return;
   } else {
 LOG_WA(
-"Ignore this compcsi_state_info message since node sync window has 
closed");
-avsv_dnd_msg_free(n2d_msg);
+  "Ignore this compcsi_state_info message since node sync window has 
closed");
   }
 
-  TRACE_LEAVE();
+  avsv_dnd_msg_free(n2d_msg);
 }
 
 /**
diff --git a/src/amf/amfd/timer.h b/src/amf/amfd/timer.h
index 5316879..6db04c7 100644
--- a/src/amf/amfd/timer.h
+++ b/src/amf/amfd/timer.h
@@ -52,12 +52,12 @@ typedef enum avd_tmr_type {
 
 /* AVD Timer definition */
 typedef struct avd_tmr_tag {
-  tmr_t tmr_id;
-  AVD_TMR_TYPE type;
-  SaClmNodeIdT node_id;
-  std::string spons_si_name;
-  std::string dep_si_name;
-  bool is_active;
+  tmr_t tmr_id{};
+  AVD_TMR_TYPE type{AVD_TMR_MAX};
+  SaClmNodeIdT node_id{};
+  std::string spons_si_name{};
+  std::string dep_si_name{};
+  bool is_active{};
 } AVD_TMR;
 
 /* macro to start the cluster init timer. The cb structure
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfnd: fix error reading from deallocated memory [#2568]

2019-06-03 Thread Gary Lee


Hi Thanh

ack (review only).

Thanks

On 4/6/19 8:48 am, Thanh Nguyen wrote:

Invalid read is from the following
- avnd_evt_mds_ava_dn_evh() (amf/amfnd/comp.cc)
- avsv_create_association_class_dn() (amf/common/util.c)
Other changes are to fix cppcheck error report
---
  src/amf/amfnd/comp.cc | 16 
  src/amf/common/util.c |  6 +++---
  2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/amf/amfnd/comp.cc b/src/amf/amfnd/comp.cc
index 38b9224..facbace 100644
--- a/src/amf/amfnd/comp.cc
+++ b/src/amf/amfnd/comp.cc
@@ -428,8 +428,9 @@ uint32_t avnd_evt_mds_ava_dn_evh(AVND_CB *cb, AVND_EVT 
*evt) {
   entry from the cbk list and delete the cbq */
m_AVND_COMP_CBQ_INV_GET(comp, comp->term_cbq_inv_value, cbk_rec);
comp->term_cbq_inv_value = 0;
+  uint32_t opq_hdl = cbk_rec? cbk_rec->opq_hdl: 0;
rc = avnd_comp_clc_fsm_run(cb, comp, 
AVND_COMP_CLC_PRES_FSM_EV_TERM_SUCC);
-  if (cbk_rec) avnd_comp_cbq_rec_pop_and_del(cb, comp, cbk_rec->opq_hdl, 
false);
+  if (cbk_rec) avnd_comp_cbq_rec_pop_and_del(cb, comp, opq_hdl, false);
goto done;
  }
  /* found the matching comp; trigger error processing */
@@ -2228,9 +2229,7 @@ uint32_t avnd_amf_resp_send(AVND_CB *cb, 
AVSV_AMF_API_TYPE type,
AVND_MSG msg;
AVSV_ND2ND_AVND_MSG *avnd_msg;
uint32_t rc = NCSCC_RC_SUCCESS;
-  MDS_DEST i_to_dest;
AVSV_NDA_AVA_MSG *temp_ptr = nullptr;
-  NODE_ID node_id = 0;
MDS_SYNC_SND_CTXT temp_ctxt;
TRACE_ENTER();
  
@@ -2267,8 +2266,8 @@ uint32_t avnd_amf_resp_send(AVND_CB *cb, AVSV_AMF_API_TYPE type,

  msg.info.avnd->type = AVND_AVND_AVA_MSG;
  msg.type = AVND_MSG_AVND;
  /* Send it to AvND */
-node_id = m_NCS_NODE_ID_FROM_MDS_DEST(*dest);
-i_to_dest = avnd_get_mds_dest_from_nodeid(cb, node_id);
+NODE_ID node_id = m_NCS_NODE_ID_FROM_MDS_DEST(*dest);
+MDS_DEST i_to_dest = avnd_get_mds_dest_from_nodeid(cb, node_id);
  rc = avnd_avnd_mds_send(cb, i_to_dest, );
} else {
  /* now send the response */
@@ -2646,7 +2645,8 @@ void avnd_comp_cmplete_all_assignment(AVND_CB *cb, 
AVND_COMP *comp) {
   */
  temp_csi = m_AVND_COMPDB_REC_CSI_GET_FIRST(*comp);
  
-if (cbk->cbk_info->param.csi_set.ha != temp_csi->si->curr_state) {

+if (temp_csi &&
+   (cbk->cbk_info->param.csi_set.ha != temp_csi->si->curr_state)) {
avnd_comp_cbq_rec_pop_and_del(cb, comp, cbk->opq_hdl, true);
continue;
  }
@@ -2788,7 +2788,7 @@ uint32_t comp_restart_initiate(AVND_COMP *comp) {
  rc = avnd_comp_curr_info_del(cb, it.second);
  if (NCSCC_RC_SUCCESS != rc) goto done;
  
-	// unregister the contained comp

+// unregister the contained comp
  rc = avnd_comp_unregister_contained(cb, it.second);
  if (NCSCC_RC_SUCCESS != rc) goto done;
  
@@ -2956,7 +2956,7 @@ void avnd_comp_pres_state_set(const AVND_CB *cb, AVND_COMP *comp,

   (SA_AMF_PRESENCE_ORPHANED == prv_st {
  if (cb->is_avd_down == false) {
avnd_di_uns32_upd_send(AVSV_SA_AMF_COMP, saAmfCompPresenceState_ID,
- comp->name.c_str(), comp->pres);
+ comp->name, comp->pres);
  }
}
  
diff --git a/src/amf/common/util.c b/src/amf/common/util.c

index ec76c32..d17b766 100644
--- a/src/amf/common/util.c
+++ b/src/amf/common/util.c
@@ -240,12 +240,12 @@ void avsv_create_association_class_dn(const SaNameT 
*child_dn,
}
  
  	if (dn) {

+   TRACE("dn: %s", buf);
osaf_extended_name_steal(buf, dn);
}
-   TRACE_LEAVE2("child_dn: %s parent_dn: %s dn: %s",
+   TRACE_LEAVE2("child_dn: %s parent_dn: %s",
child_dn_ptr ? child_dn_ptr : "no child dn",
-   parent_dn_ptr ? parent_dn_ptr : "no parent dn",
-   buf);
+   parent_dn_ptr ? parent_dn_ptr : "no parent dn");
  }
  
  void avsv_sanamet_init_from_association_dn(const SaNameT *haystack, SaNameT *dn,



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] mds: use new TIPC getsockopt to log receive queue utilization [#3038]

2019-05-27 Thread Gary Lee


Hi Hans

ack (review only)

Thanks

On 20/5/19 10:27 pm, Hans Nordebäck wrote:

---
  00-README.conf   |  14 +++
  src/base/Makefile.am |   1 +
  src/base/statistics.h|  88 +
  src/mds/Makefile.am  |   8 +-
  src/mds/mds_dt_tipc.c|   3 +
  src/mds/mds_tipc_recvq_stats.cc  |  29 +
  src/mds/mds_tipc_recvq_stats.h   |  32 +
  src/mds/mds_tipc_recvq_stats_impl.cc | 178 +++
  src/mds/mds_tipc_recvq_stats_impl.h  |  39 ++
  9 files changed, 390 insertions(+), 2 deletions(-)
  create mode 100644 src/base/statistics.h
  create mode 100644 src/mds/mds_tipc_recvq_stats.cc
  create mode 100644 src/mds/mds_tipc_recvq_stats.h
  create mode 100644 src/mds/mds_tipc_recvq_stats_impl.cc
  create mode 100644 src/mds/mds_tipc_recvq_stats_impl.h

diff --git a/00-README.conf b/00-README.conf
index 8f20e5209..da1825f06 100644
--- a/00-README.conf
+++ b/00-README.conf
@@ -737,3 +737,17 @@ initiate a 'self-fencing' by rebooting the node, if it 
determines the node
  should no longer be active according to the consensus service, to prevent
  a split-brain situation.
  
+TIPC receive queue utilization

+==
+
+If setting the environment variable MDS_RECVQ_STATS_LOG_FREQ_SEC in a service 
config
+file enables TIPC receive queue utilisation statistics. The argument is how 
often the
+statistics will be written to syslog.
+
+Example amfd.conf:
+
+export MDS_RECVQ_STATS_LOG_FREQ_SEC=5
+
+then every 5 seconds a log record is written:
+
+May 20 12:23:30 SC-1 local0.notice osafamfd[545]: NO TIPC receive queue 
utilization (in %): min: 3.86 max: 4.38 mean: 4.15 std dev: 0.18
diff --git a/src/base/Makefile.am b/src/base/Makefile.am
index ce93562e5..025fb86a2 100644
--- a/src/base/Makefile.am
+++ b/src/base/Makefile.am
@@ -157,6 +157,7 @@ noinst_HEADERS += \
src/base/saf_error.h \
src/base/saf_mem.h \
src/base/sprr_dl_api.h \
+   src/base/statistics.h \
src/base/string_parse.h \
src/base/sysf_exc_scr.h \
src/base/sysf_ipc.h \
diff --git a/src/base/statistics.h b/src/base/statistics.h
new file mode 100644
index 0..9ce980fc1
--- /dev/null
+++ b/src/base/statistics.h
@@ -0,0 +1,88 @@
+/*  -*- OpenSAF  -*-
+ *
+ * (C) Copyright 2019 The OpenSAF Foundation
+ * Copyright Ericsson AB 2019 - All Rights Reserved.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
+ * under the GNU Lesser General Public License Version 2.1, February 1999.
+ * The complete license can be accessed from the following location:
+ * http://opensource.org/licenses/lgpl-license.php
+ * See the Copying file included with the OpenSAF distribution for full
+ * licensing terms.
+ *
+ * Author(s): Ericsson AB
+ *
+ */
+
+#ifndef STATISTICS_H_
+#define STATISTICS_H_
+
+#include 
+
+namespace base {
+
+class Statistics {
+ public:
+  void clear() {
+n_ = 0;
+  }
+
+  void push(double x) {
+n_++;
+
+// See Knuth, Art Of Computer Programming, Volume 2. The Seminumerical 
Algorithms, 4.2.2. Accuracy of Floating Point Arithmetic,
+// using the recurrence formulas:
+// M1 = x1, Mk = Mk-1 + (xk - Mk-1) / k  (15)
+// S1 = 0, Sk = Sk-1 + (xk - Mk-1) * (xk - Mk)  (16)
+// for 2 <= k <= n, sqrt(Sn/(n-1)
+if (n_ == 1) {
+  prev_m_ = current_m_ = x;
+  prev_s_ = 0;
+  min_ = x;
+  max_ = x;
+} else {
+  current_m_ = prev_m_ + (x - prev_m_) / n_;
+  current_s_ =  prev_s_ + (x - prev_m_) * (x - current_m_);
+
+  if (x > max_) max_ = x;
+  if (x < min_) min_ = x;
+  prev_m_ = current_m_;
+  prev_s_ = current_s_;
+}
+  }
+
+  double mean() const {
+return (n_ > 0) ?  current_m_ : 0;
+  }
+
+  double variance() const {
+return (n_ > 1) ? current_s_ / (n_ - 1) : 0;
+  }
+
+  double std_dev() const {
+return sqrt(variance());
+  }
+
+  double min() const {
+return min_;
+  }
+  double max() const {
+return max_;
+  }
+
+ private:
+  int n_{0};
+  double prev_m_{0};
+  double current_m_{0};
+  double prev_s_{0};
+  double current_s_{0};
+  double min_{0};
+  double max_{0};
+};
+
+}  // namespace base
+
+#endif  // STATISTICS_H_
+
diff --git a/src/mds/Makefile.am b/src/mds/Makefile.am
index 3724d2ea8..2d7b652e9 100644
--- a/src/mds/Makefile.am
+++ b/src/mds/Makefile.am
@@ -46,8 +46,12 @@ lib_libopensaf_core_la_SOURCES += \
src/mds/ncs_vda.c
  
  if ENABLE_TIPC_TRANSPORT

-noinst_HEADERS += src/mds/mds_dt_tipc.h
-lib_libopensaf_core_la_SOURCES += src/mds/mds_dt_tipc.c
+noinst_HEADERS += src/mds/mds_dt_tipc.h \
+   src/mds/mds_tipc_recvq_stats.h \
+   src/mds/mds_tipc_recvq_stats_impl.h
+lib_libopensaf_core_la_SOURCES += src/mds/mds_dt_tipc.c \
+   src/mds/mds_tipc_recvq_stats.cc

[devel] [PATCH 0/1] Review Request for rded: improve self-fencing response time [#3039]

2019-05-26 Thread Gary Lee

Summary: rded: improve self-fencing response time [#3039]
Review request for Ticket(s): 3039
Peer Reviewer(s): Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3039
Base revision: 1bff38564b69175fa4a0ea2cb1d40bd432581bd6
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy 
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision f8b4a473feafd23ce9d130a8ad245c5da75ab9b4
Author: Gary Lee 
Date:   Mon, 27 May 2019 09:54:40 +1000

rded: improve self-fencing response time [#3039]

When connectivity to consensus service is lost, it is recorded
in a state variable. When all RDE peers are lost, the node will
now self-fence immediately.



Complete diffstat:
--
 src/rde/rded/rde_cb.h|  5 +
 src/rde/rded/rde_main.cc | 18 --
 src/rde/rded/role.cc | 24 
 src/rde/rded/role.h  |  3 +++
 4 files changed, 48 insertions(+), 2 deletions(-)


Testing Commands:
-
'export FMS_RELAXED_NODE_PROMOTION=1' in fmd.conf
Block cluster from accessing consensus service
Reboot standby SC

Testing, Expected Results:
--
Active SC should self-fence immediately after noticing peer RDE is down

Conditions of Submission:
-
ack, or in 7 days

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] rded: improve self-fencing response time [#3039]

2019-05-26 Thread Gary Lee

When connectivity to consensus service is lost, it is recorded
in a state variable. When all RDE peers are lost, the node will
now self-fence immediately.
---
 src/rde/rded/rde_cb.h|  5 +
 src/rde/rded/rde_main.cc | 18 --
 src/rde/rded/role.cc | 24 
 src/rde/rded/role.h  |  3 +++
 4 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h
index 9a0919c..e35fdab 100644
--- a/src/rde/rded/rde_cb.h
+++ b/src/rde/rded/rde_cb.h
@@ -18,6 +18,7 @@
 #ifndef RDE_RDED_RDE_CB_H_
 #define RDE_RDED_RDE_CB_H_
 
+#include 
 #include 
 #include 
 #include "base/osaf_utility.h"
@@ -37,6 +38,8 @@
 enum class State {kNotActive = 0, kNotActiveSeenPeer, kActiveElected,
   kActiveElectedSeenPeer, kActiveFailover};
 
+enum class ConsensusState {kUnknown = 0, kConnected, kDisconnected};
+
 struct RDE_CONTROL_BLOCK {
   SYSF_MBX mbx;
   NCSCONTEXT task_handle;
@@ -49,6 +52,8 @@ struct RDE_CONTROL_BLOCK {
   // used for discovering peer controllers, regardless of their role
   std::set peer_controllers{};
   State state{State::kNotActive};
+  std::atomic 
consensus_service_state{ConsensusState::kUnknown};
+  std::atomic state_refresh_thread_started{false}; // consensus service
 };
 
 enum RDE_MSG_TYPE {
diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
index 456d2ce..1a7e587 100644
--- a/src/rde/rded/rde_main.cc
+++ b/src/rde/rded/rde_main.cc
@@ -178,6 +178,19 @@ static void handle_mbx_event() {
 case RDE_MSG_CONTROLLER_DOWN:
   rde_cb->peer_controllers.erase(msg->fr_node_id);
   TRACE("peer_controllers: size %zu", rde_cb->peer_controllers.size());
+  if (role->role() == PCS_RDA_ACTIVE) {
+Consensus consensus_service;
+if (consensus_service.IsEnabled() == true &&
+rde_cb->consensus_service_state == ConsensusState::kDisconnected &&
+consensus_service.IsRelaxedNodePromotionEnabled() == true &&
+role->IsPeerPresent() == false) {
+LOG_NO("Lost connectivity to consensus service. No peer present");
+if (consensus_service.IsRemoteFencingEnabled() == false) {
+opensaf_quick_reboot("Lost connectivity to consensus service. "
+ "Rebooting this node");
+}
+}
+  }
   break;
 case RDE_MSG_TAKEOVER_REQUEST_CALLBACK: {
   rde_cb->monitor_takeover_req_thread_running = false;
@@ -214,7 +227,7 @@ static void handle_mbx_event() {
   if (consensus_service.IsRelaxedNodePromotionEnabled() == true) {
   if (rde_cb->state == State::kActiveElected) {
 TRACE("Relaxed mode is enabled");
-TRACE(" No peer SC yet seen, ignore consensus service 
failure");
+TRACE("No peer SC yet seen, ignore consensus service failure");
 // if relaxed node promotion is enabled, and we have yet to see
 // a peer SC after being promoted, tolerate consensus service
 // not working
@@ -227,13 +240,14 @@ static void handle_mbx_event() {
 // we have seen the peer, and peer is still connected, tolerate
 // consensus service not working
 fencing_required = false;
+rde_cb->consensus_service_state = 
ConsensusState::kDisconnected;
   }
   }
   if (fencing_required == true) {
 LOG_NO("Lost connectivity to consensus service");
 if (consensus_service.IsRemoteFencingEnabled() == false) {
 opensaf_quick_reboot("Lost connectivity to consensus service. "
-   "Rebooting this node");
+ "Rebooting this node");
 }
   }
 }
diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index 3effc25..b8c8157 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -215,6 +215,18 @@ timespec* Role::Poll(timespec* ts) {
 is_candidate).detach();
   }
 }
+  } else if (role_ == PCS_RDA_ACTIVE) {
+RDE_CONTROL_BLOCK* cb = rde_get_control_block();
+if (cb->consensus_service_state == ConsensusState::kUnknown ||
+cb->consensus_service_state == ConsensusState::kDisconnected) {
+  // consensus service was previously disconnected, refresh state
+  Consensus consensus_service;
+  if (consensus_service.IsEnabled() == true &&
+cb->state_refresh_thread_started == false) {
+cb->state_refresh_thread_started = true;
+std::thread(::RefreshConsensusState, this, cb).detach();
+  }
+}
   }
   return timeout;
 }
@@ -351,3 +363,15 @@ void Role::PromoteNodeLate() {
   this, cb->cluster_members.size(),
   true).detach();
 }
+
+void Role::RefreshConsensusState(RDE_CONTROL_BLOCK* cb) {
+  TRACE_ENTER();
+
+  Consensus consensus_service;
+  if (consensus_service.IsWritable()

Re: [devel] [PATCH 1/1] amfnd: reboot to recovery if msg id received by amfd mismatch with msg id sent by amfnd [#3040]

2019-05-20 Thread Gary Lee


Hi Thang

Looks good to me. Nagu, any comments?

Thanks

Gary

On 15/5/19 12:14 am, thang.d.nguyen wrote:

During SC failover, message received on ACTIVE AMFD
can not be checked point to AMFD on STANDBY SC.
But the AMFND still process the message ack for that
message then it remove from queue.
STANDBY SC takes ACTIVE and mismatch message id b/w
AMFD and AMFND on new ACTIVE. As consequence,
clm track start can not invoked to update cluster
member nodes if these nodes was rebooted.

In this case, amfnd need rebooting automatically to
recovery it.
---
  src/amf/amfnd/verify.cc | 15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/src/amf/amfnd/verify.cc b/src/amf/amfnd/verify.cc
index 5726ad9..ddb1d15 100644
--- a/src/amf/amfnd/verify.cc
+++ b/src/amf/amfnd/verify.cc
@@ -116,12 +116,14 @@ uint32_t avnd_evt_avd_verify_evh(AVND_CB *cb, AVND_EVT 
*evt) {
avnd_diq_rec_del(cb, rec);
continue;
  } else {
+  if ((rcv_id + 1) == (*((uint32_t *)(>msg.info.avd->msg_info))) &&
+  (msg_found == false)) {
+msg_found = true;
+  }
avnd_diq_rec_send(cb, rec);
  
TRACE_1("AVND record %u sent, upon fail-over",

*((uint32_t *)(>msg.info.avd->msg_info)));
-
-  msg_found = true;
  }
  ++iter;
}
@@ -129,9 +131,12 @@ uint32_t avnd_evt_avd_verify_evh(AVND_CB *cb, AVND_EVT 
*evt) {
if ((cb->snd_msg_id != info->rcv_id_cnt) && (msg_found == false)) {
  /* Log error, seems to be some problem.*/
  LOG_EM(
-"AVND record not found, after failover, snd_msg_id = %u, receive id = 
%u",
-cb->snd_msg_id, info->rcv_id_cnt);
-return NCSCC_RC_FAILURE;
+"AVND record not found for msg id = %u", (rcv_id + 1));
+opensaf_reboot(
+avnd_cb->node_info.nodeId,
+osaf_extended_name_borrow(_cb->node_info.executionEnvironment),
+"AVND record not found, after failover");
+exit(0);
}
  
/*



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for base: strip leading and trailing quotes [#3041]

2019-05-17 Thread Gary Lee

Summary: base: strip leading and trailing quotes [#3041]
Review request for Ticket(s): 3041
Peer Reviewer(s): Hans, Minh, Vu 
Pull request to:
Affected branch(es): develop
Development branch: ticket-3041
Base revision: 55466efcacc6d83f104ee747ebc189688ccc2de1
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y 
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 6bd164279a2fbd881c4700566960f3ede728f4df
Author: Gary Lee 
Date:   Fri, 17 May 2019 22:09:05 +1000

base: strip leading and trailing quotes [#3041]

ConfigFileReader enables runtime 'reload' of .conf files.
However, if the environment variable is surrounded by quotes,
it adds the quotes to the value which is not the expected behaviour.

export FOO="foo"

FOO should contain just foo, not "foo".



Complete diffstat:
--
 src/base/config_file_reader.cc  | 15 +++
 src/osaf/consensus/consensus.cc |  1 +
 2 files changed, 16 insertions(+)


Testing Commands:
-
pkill -SIGUSR2 osaffmd (turn on tracing)
pkill -SIGHUP osaffmd (reload)

Testing, Expected Results:
--
Check osaffmd trace:

<143>1 2019-05-17T20:11:56.865293+10:00 SC-1 osaffmd 188 osaffmd [meta 
sequenceId="13"] 188:osaf/consensus/consensus.cc:298 TR Setting 
'FMS_HA_ENV_HEALTHCHECK_KEY' to 'Default'

It should not say '"Default"', but 'Default'

Conditions of Submission:
-
Ack from anyone

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] base: strip leading and trailing quotes [#3041]

2019-05-17 Thread Gary Lee

ConfigFileReader enables runtime 'reload' of .conf files.
However, if the environment variable is surrounded by quotes,
it adds the quotes to the value which is not the expected behaviour.

export FOO="foo"

FOO should contain just foo, not "foo".
---
 src/base/config_file_reader.cc  | 15 +++
 src/osaf/consensus/consensus.cc |  1 +
 2 files changed, 16 insertions(+)

diff --git a/src/base/config_file_reader.cc b/src/base/config_file_reader.cc
index 63cad7d..0132547 100644
--- a/src/base/config_file_reader.cc
+++ b/src/base/config_file_reader.cc
@@ -36,6 +36,18 @@ static void trim(std::string& str) {
   right_trim(str);
 }
 
+static void strip_quotes(std::string& str) {
+  // trim leading and trailing quotes
+  if (str.front() == '"' ||
+  str.front() == '\'') {
+str.erase(0, 1);  // delete first char
+  }
+  if (str.back() == '"' ||
+str.back() == '\'') {
+str.pop_back();  // delete last char
+  }
+}
+
 ConfigFileReader::SettingsMap ConfigFileReader::ParseFile(
 const std::string& filename) {
   const std::string prefix("export");
@@ -80,6 +92,9 @@ ConfigFileReader::SettingsMap ConfigFileReader::ParseFile(
   std::string value = line.substr(equal + 1);
   trim(value);
 
+  strip_quotes(key);
+  strip_quotes(value);
+
   map[key] = value;
 }
 file.close();
diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc
index 480f7d2..0bebab2 100644
--- a/src/osaf/consensus/consensus.cc
+++ b/src/osaf/consensus/consensus.cc
@@ -295,6 +295,7 @@ bool Consensus::ReloadConfiguration() {
   continue;
 }
 int rc;
+TRACE("Setting '%s' to '%s'", kv.first.c_str(), kv.second.c_str());
 rc = setenv(kv.first.c_str(), kv.second.c_str(), 1);
 osafassert(rc == 0);
   }
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] amfnd: don't attempt su failover if active controller is rebooting [#3035]

2019-05-10 Thread Gary Lee


Hi Alex

ack (review only)

Gary

On 8/5/19 5:46 am, Jones, Alex wrote:
In N+M model CSI-remove responses can get lost if active controller 
reboots.
In this case SG will be stuck in unstable state, and standby will 
never get

assignments.

We are the active controller, active for N+M, SU failover is set, and
failfast on termination failure is set for the nodes. If a component 
in the

SU crashes, and another component fails during cleanup, the node does
failfast. It currently attempts to do su failover in this case, but the
csi-remove responses from the payload can get lost because we are 
rebooting.

They eventually show up on the new active, but we get message-id errors.

Set a flag when the active controller is about to reboot. If the flag 
is set,

then don't do SU failover. Let the new active take care of the failover.
---
src/amf/amfd/node.cc | 1 +
src/amf/amfd/node.h | 1 +
src/amf/amfd/sgproc.cc | 7 +++
src/amf/amfd/util.cc | 3 +++
4 files changed, 12 insertions(+)

diff --git a/src/amf/amfd/node.cc b/src/amf/amfd/node.cc
index 7fc764f22..b8d8a7d77 100644
--- a/src/amf/amfd/node.cc
+++ b/src/amf/amfd/node.cc
@@ -121,6 +121,7 @@ void AVD_AVND::initialize() {
clm_pend_inv = {};
clm_change_start_preceded = {};
recvr_fail_sw = {};
+ actv_ctrl_reboot_in_progress = {};
admin_ng = {};
}

diff --git a/src/amf/amfd/node.h b/src/amf/amfd/node.h
index ecee5c591..dbe48dc43 100644
--- a/src/amf/amfd/node.h
+++ b/src/amf/amfd/node.h
@@ -140,6 +140,7 @@ class AVD_AVND {
CLM completed cb. */
bool recvr_fail_sw; /* to indicate there was node reboot because of node
failover/switchover.*/
+ bool actv_ctrl_reboot_in_progress;
AVD_AMF_NG *admin_ng; /* points to the nodegroup on which admin 
operation is

going on.*/
uint16_t node_up_msg_count; /* to count of node_up msg that director had
diff --git a/src/amf/amfd/sgproc.cc b/src/amf/amfd/sgproc.cc
index 1537acac3..7c8d9a558 100644
--- a/src/amf/amfd/sgproc.cc
+++ b/src/amf/amfd/sgproc.cc
@@ -478,6 +478,13 @@ static uint32_t sg_su_failover_func(AVD_SU *su) {
goto done;
}

+ if (su->su_on_node->actv_ctrl_reboot_in_progress) {
+ TRACE("'%s' is already going down, so not doing SU failover",
+ su->name.c_str());
+ rc = NCSCC_RC_SUCCESS;
+ goto done;
+ }
+
su->set_oper_state(SA_AMF_OPERATIONAL_DISABLED);
su->set_readiness_state(SA_AMF_READINESS_OUT_OF_SERVICE);
if (su->saAmfSUAdminState == SA_AMF_ADMIN_LOCKED)
diff --git a/src/amf/amfd/util.cc b/src/amf/amfd/util.cc
index 14a4e0485..0dc3e99e3 100644
--- a/src/amf/amfd/util.cc
+++ b/src/amf/amfd/util.cc
@@ -1802,6 +1802,9 @@ void avd_d2n_reboot_snd(AVD_AVND *node) {
if (avd_d2n_msg_snd(avd_cb, node, d2n_msg) != NCSCC_RC_SUCCESS) {
LOG_ER("%s: snd to %x failed", __FUNCTION__, node->node_info.nodeId);
d2n_msg_free(d2n_msg);
+ } else if (node->node_info.nodeId == avd_cb->node_id_avd) {
+ TRACE("rebooting active amf director which is ourself");
+ node->actv_ctrl_reboot_in_progress = true;
}
}

--
2.17.2



Notice: This e-mail together with any attachments may contain 
information of Ribbon Communications Inc. that is confidential and/or 
proprietary for the sole use of the intended recipient. Any review, 
disclosure, reliance or distribution by others or forwarding without 
express permission is strictly prohibited. If you are not the intended 
recipient, please notify the sender immediately and then delete all 
copies, including any attachments.




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] mbc: prevent infinite peer_up message loop [#3021]

2019-04-29 Thread Gary Lee


Hi

I will push this on Wednesday if there are no comments.

Thanks

Gary

On 26/3/19 1:16 pm, Gary Lee wrote:

If the active and standby SCs are split into network partitions, it is
possible a RED_UP never arrives even though we have already
received MBC PEER_UP. The service using MBC will then get stuck
in an infinite loop and probably fail health checks.

To cater for 'normal' race conditions between MDS topology and data
messages, allow only up to 255 loops. If this is exceeded, the msg
will be discarded.
---
  src/mbc/mbcsv_evt_msg.h |  2 ++
  src/mbc/mbcsv_peer.c| 10 ++
  2 files changed, 12 insertions(+)

diff --git a/src/mbc/mbcsv_evt_msg.h b/src/mbc/mbcsv_evt_msg.h
index f11a553..9eef747 100644
--- a/src/mbc/mbcsv_evt_msg.h
+++ b/src/mbc/mbcsv_evt_msg.h
@@ -197,6 +197,8 @@ typedef struct mbcsv_evt {
  MBCSV_EVT_MDS_SUBSCR_INFO mds_sub_evt;
} info;
  
+  uint32_t hops;

+
  } MBCSV_EVT;
  
  /***

diff --git a/src/mbc/mbcsv_peer.c b/src/mbc/mbcsv_peer.c
index b45904f..1d4b257 100644
--- a/src/mbc/mbcsv_peer.c
+++ b/src/mbc/mbcsv_peer.c
@@ -826,6 +826,15 @@ uint32_t mbcsv_process_peer_up_info(MBCSV_EVT *msg, 
CKPT_INST *ckpt,
memcpy(evt, msg, sizeof(MBCSV_EVT));
  
  			TRACE_4("Still RED_UP event not arrived of the peer");

+   if (evt->hops < 255) {
+   ++evt->hops;
+   } else {
+   LOG_WA("RED_UP missing, discarding peer up");
+   m_NCS_UNLOCK(_cb.peer_list_lock,
+   NCS_LOCK_WRITE);
+   m_MMGR_FREE_MBCSV_EVT(evt);
+   return NCSCC_RC_FAILURE;
+   }
  
  			/* Again post the event, till RED_UP event arrives */

if (NCSCC_RC_SUCCESS !=
@@ -833,6 +842,7 @@ uint32_t mbcsv_process_peer_up_info(MBCSV_EVT *msg, 
CKPT_INST *ckpt,
TRACE_LEAVE2("ipc send failed");
m_NCS_UNLOCK(_cb.peer_list_lock,
 NCS_LOCK_WRITE);
+   m_MMGR_FREE_MBCSV_EVT(evt);
return NCSCC_RC_FAILURE;
}
  



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for mbc: prevent infinite peer_up message loop [#3021]

2019-03-25 Thread Gary Lee

Summary: mbc: prevent infinite peer_up message loop [#3021]
Review request for Ticket(s): 3021
Peer Reviewer(s): Canh, Anders, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3021
Base revision: 7f68859e0dc70179eff72515f28bc69ffd1ab208
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y 
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 4825d97b7e9565daae7b36aaba7a7c8717ff627c
Author: Gary Lee 
Date:   Tue, 26 Mar 2019 13:08:16 +1100

mbc: prevent infinite peer_up message loop [#3021]

If the active and standby SCs are split into network partitions, it is
possible a RED_UP never arrives even though we have already
received MBC PEER_UP. The service using MBC will then get stuck
in an infinite loop and probably fail health checks.

To cater for 'normal' race conditions between MDS topology and data
messages, allow only up to 255 loops. If this is exceeded, the msg
will be discarded.



Complete diffstat:
--
 src/mbc/mbcsv_evt_msg.h |  2 ++
 src/mbc/mbcsv_peer.c| 10 ++
 2 files changed, 12 insertions(+)


Testing Commands:
-
Ran test that splits SCs and reproduces the reported issue

Testing, Expected Results:
--
No more amfd coredumps due to watchdog timeouts

Conditions of Submission:
-
Ack from reviewer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] mbc: prevent infinite peer_up message loop [#3021]

2019-03-25 Thread Gary Lee

If the active and standby SCs are split into network partitions, it is
possible a RED_UP never arrives even though we have already
received MBC PEER_UP. The service using MBC will then get stuck
in an infinite loop and probably fail health checks.

To cater for 'normal' race conditions between MDS topology and data
messages, allow only up to 255 loops. If this is exceeded, the msg
will be discarded.
---
 src/mbc/mbcsv_evt_msg.h |  2 ++
 src/mbc/mbcsv_peer.c| 10 ++
 2 files changed, 12 insertions(+)

diff --git a/src/mbc/mbcsv_evt_msg.h b/src/mbc/mbcsv_evt_msg.h
index f11a553..9eef747 100644
--- a/src/mbc/mbcsv_evt_msg.h
+++ b/src/mbc/mbcsv_evt_msg.h
@@ -197,6 +197,8 @@ typedef struct mbcsv_evt {
 MBCSV_EVT_MDS_SUBSCR_INFO mds_sub_evt;
   } info;
 
+  uint32_t hops;
+
 } MBCSV_EVT;
 
 
/***
diff --git a/src/mbc/mbcsv_peer.c b/src/mbc/mbcsv_peer.c
index b45904f..1d4b257 100644
--- a/src/mbc/mbcsv_peer.c
+++ b/src/mbc/mbcsv_peer.c
@@ -826,6 +826,15 @@ uint32_t mbcsv_process_peer_up_info(MBCSV_EVT *msg, 
CKPT_INST *ckpt,
memcpy(evt, msg, sizeof(MBCSV_EVT));
 
TRACE_4("Still RED_UP event not arrived of the peer");
+   if (evt->hops < 255) {
+   ++evt->hops;
+   } else {
+   LOG_WA("RED_UP missing, discarding peer up");
+   m_NCS_UNLOCK(_cb.peer_list_lock,
+   NCS_LOCK_WRITE);
+   m_MMGR_FREE_MBCSV_EVT(evt);
+   return NCSCC_RC_FAILURE;
+   }
 
/* Again post the event, till RED_UP event arrives */
if (NCSCC_RC_SUCCESS !=
@@ -833,6 +842,7 @@ uint32_t mbcsv_process_peer_up_info(MBCSV_EVT *msg, 
CKPT_INST *ckpt,
TRACE_LEAVE2("ipc send failed");
m_NCS_UNLOCK(_cb.peer_list_lock,
 NCS_LOCK_WRITE);
+   m_MMGR_FREE_MBCSV_EVT(evt);
return NCSCC_RC_FAILURE;
}
 
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for osaf: ensure an error is returned if takeover_request fails [#3023]

2019-03-25 Thread Gary Lee

Summary: osaf: ensure an error is returned if takeover_request fails [#3023]
Review request for Ticket(s): 3023
Peer Reviewer(s): Hans, Minh
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3023
Base revision: 819801c5414f73bfbdb3f4101958981ae1d29bb3
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y 
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 7034e7149d0cd4e74078287c516fc33fad21076f
Author: Gary Lee 
Date:   Tue, 26 Mar 2019 10:51:52 +1100

osaf: ensure an error is returned if takeover_request fails [#3023]

if we cannot read the result of a takeover_request, ensure we
return an error



Complete diffstat:
--
 src/osaf/consensus/consensus.cc | 2 ++
 1 file changed, 2 insertions(+)


Testing Commands:
-
ran regression tests

Testing, Expected Results:
--
OK

Conditions of Submission:
-
ack from any reviewer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] osaf: ensure an error is returned if takeover_request fails [#3023]

2019-03-25 Thread Gary Lee

if we cannot read the result of a takeover_request, ensure we
return an error
---
 src/osaf/consensus/consensus.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc
index cf307b3..480f7d2 100644
--- a/src/osaf/consensus/consensus.cc
+++ b/src/osaf/consensus/consensus.cc
@@ -433,6 +433,8 @@ SaAisErrorT Consensus::CreateTakeoverRequest(const 
std::string& current_owner,
  return rc;
   }
 
+  // in case takeover request cannot be read
+  rc = SA_AIS_ERR_FAILED_OPERATION;
   // wait up to max_takeover_retry seconds for request to be answered
   retries = 0;
   while (retries < max_takeover_retry_) {
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/1] osaf: improve response time in etcd3.plugin [#3016]

2019-03-11 Thread Gary Lee

if the initial call to watch takeover request in etcd3.plugin
is made when etcd has already been shutdown (for example,
when etcd is running locally and the node is being shutdown),
the plugin should return 0 with a fake takeover request to ensure
rded shuts down promptly. Otherwise, it will keep calling
watch, delaying node shutdown.
---
 src/osaf/consensus/plugins/etcd3.plugin | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/osaf/consensus/plugins/etcd3.plugin 
b/src/osaf/consensus/plugins/etcd3.plugin
index acccd98..d926885 100644
--- a/src/osaf/consensus/plugins/etcd3.plugin
+++ b/src/osaf/consensus/plugins/etcd3.plugin
@@ -357,9 +357,16 @@ watch() {
 return 0
   fi
 done
+  else
+# etcd down?
+if [ "$watch_key" == "$takeover_request" ]; then
+  hostname=`cat $node_name_file`
+  echo "$hostname SC-0 1000 UNDEFINED"
+  return 0
+else
+  return 1
+fi
   fi
-
-  return 1
 }
 
 # argument parsing
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/1] Review Request for osaf: improve response time in etcd3.plugin [#3016]

2019-03-11 Thread Gary Lee

Summary: osaf: improve response time in etcd3.plugin [#3016]
Review request for Ticket(s): 3016
Peer Reviewer(s): Thuan
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3016
Base revision: 0a3f48cfaf9f443c405cfd7122904c5cbe607226
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples y 
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision ce0af7444b489620bc3f1a5ba5d876f563167b00
Author: Gary Lee 
Date:   Tue, 12 Mar 2019 11:20:35 +1100

osaf: improve response time in etcd3.plugin [#3016]

if the initial call to watch takeover request in etcd3.plugin
is made when etcd has already been shutdown (for example,
when etcd is running locally and the node is being shutdown),
the plugin should return 0 with a fake takeover request to ensure
rded shuts down promptly. Otherwise, it will keep calling
watch, delaying node shutdown.



Complete diffstat:
--
 src/osaf/consensus/plugins/etcd3.plugin | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] dtm: Fix dtm close socket due to duplication of adding node IP info [#2984]

2019-03-06 Thread Gary Lee


Hi Canh

One minor comment, KEY_TYPES should probably be called KeyTypes. Also, 
can you make it an enum class, rather than plain enum?


Thanks

Gary

On 7/3/19 12:53 am, Hans Nordebäck wrote:

Hi Canh,

ack, review only. I think it would be good to separate the re-factoring
part in a separate ticket though.

/BR Hans

On 12/18/18 08:25, Canh Van Truong wrote:

During cluster start, one node (node 1) broadcast up msg to other node. The
remote node (node 2) get this msg and send the connection to node 1 (connect()).
Similarly node 1 send the connection to  node 2 after node 2 broadcast up msg 
to.
Beside of node 2 connect() to node 1, node 2 also add the IP and ID info of 
node 1 to database.
But before of that, node 2 may also accept the connection that come from node 
1. The
acception is also add node ID of node 1. So there is 2 times adding the node ID
info of node 1 to database in node 2. This causes the socket connection is 
closed
and node is  restart again.

The patch change to retrieve node from database by node IP instead node ID in
processing connection. This will reject the double of establishing connection
between 2 nodes and also double of adding node IP to database.
---
   src/dtm/dtmnd/dtm.h   | 11 --
   src/dtm/dtmnd/dtm_inter_trans.cc  |  3 +-
   src/dtm/dtmnd/dtm_node.cc |  2 +-
   src/dtm/dtmnd/dtm_node_db.cc  | 79 
---
   src/dtm/dtmnd/dtm_node_sockets.cc | 20 ++
   5 files changed, 72 insertions(+), 43 deletions(-)

diff --git a/src/dtm/dtmnd/dtm.h b/src/dtm/dtmnd/dtm.h
index 28c811e65..a06b8f503 100644
--- a/src/dtm/dtmnd/dtm.h
+++ b/src/dtm/dtmnd/dtm.h
@@ -45,6 +45,11 @@ typedef enum {
 DTM_MBX_MSG_TYPE = 5,
   } MBX_POST_TYPES;
   
+typedef enum {

+  DTM_NODE_ID_KEY_TYPE = 0,
+  DTM_NODE_IP_KEY_TYPE = 2,
+} KEY_TYPES;
+
   typedef struct dtm_rcv_msg_elem {
 void *next;
 MBX_POST_TYPES type;
@@ -99,10 +104,10 @@ typedef struct dtm_snd_msg_elem {
   
   extern void node_discovery_process(void *arg);

   extern uint32_t dtm_cb_init(DTM_INTERNODE_CB *dtms_cb);
-extern DTM_NODE_DB *dtm_node_get_by_id(uint32_t nodeid);
+extern DTM_NODE_DB *dtm_node_get(uint8_t *key, KEY_TYPES type);
   extern DTM_NODE_DB *dtm_node_getnext_by_id(uint32_t node_id);
-extern uint32_t dtm_node_add(DTM_NODE_DB *node, int i);
-extern uint32_t dtm_node_delete(DTM_NODE_DB *nnode, int i);
+extern uint32_t dtm_node_add(DTM_NODE_DB *node, KEY_TYPES type);
+extern uint32_t dtm_node_delete(DTM_NODE_DB *nnode, KEY_TYPES type);
   extern DTM_NODE_DB *dtm_node_new(const DTM_NODE_DB *new_node);
   extern void dtm_print_config(DTM_INTERNODE_CB *config);
   extern int dtm_read_config(DTM_INTERNODE_CB *config,
diff --git a/src/dtm/dtmnd/dtm_inter_trans.cc b/src/dtm/dtmnd/dtm_inter_trans.cc
index 9d8335466..9b4194614 100644
--- a/src/dtm/dtmnd/dtm_inter_trans.cc
+++ b/src/dtm/dtmnd/dtm_inter_trans.cc
@@ -235,9 +235,10 @@ static uint32_t dtm_internode_snd_msg_common(DTM_NODE_DB 
*node, uint8_t *buffer,
   uint32_t dtm_internode_snd_msg_to_node(uint8_t *buffer, uint16_t len,
  NODE_ID node_id) {
 DTM_NODE_DB *node = nullptr;
+  uint8_t *key = reinterpret_cast(_id);
   
 TRACE_ENTER();

-  node = dtm_node_get_by_id(node_id);
+  node = dtm_node_get(key, DTM_NODE_ID_KEY_TYPE);
   
 if (nullptr != node) {

   if (NCSCC_RC_SUCCESS != dtm_internode_snd_msg_common(node, buffer, len)) 
{
diff --git a/src/dtm/dtmnd/dtm_node.cc b/src/dtm/dtmnd/dtm_node.cc
index de2f94738..72506f262 100644
--- a/src/dtm/dtmnd/dtm_node.cc
+++ b/src/dtm/dtmnd/dtm_node.cc
@@ -125,7 +125,7 @@ uint32_t dtm_process_node_info(DTM_INTERNODE_CB *dtms_cb, 
DTM_NODE_DB *node,
 memcpy(node->node_name, data, nodename_len);
 node->node_name[nodename_len] = '\0';
 node->comm_status = true;
-  if (dtm_node_add(node, 0) != NCSCC_RC_SUCCESS) {
+  if (dtm_node_add(node, DTM_NODE_ID_KEY_TYPE) != NCSCC_RC_SUCCESS) {
   LOG_ER(
   "DTM:  A node already exists in the cluster with similar "
   "configuration (possible duplicate IP address and/or node id), please 
"
diff --git a/src/dtm/dtmnd/dtm_node_db.cc b/src/dtm/dtmnd/dtm_node_db.cc
index 1c9da4dac..1038f0918 100644
--- a/src/dtm/dtmnd/dtm_node_db.cc
+++ b/src/dtm/dtmnd/dtm_node_db.cc
@@ -123,24 +123,49 @@ uint32_t dtm_cb_init(DTM_INTERNODE_CB *dtms_cb) {
   }
   
   /**

- * Retrieve node from node db by nodeid
+ * Retrieve node from node db
*
- * @param nodeid
+ * @param key
+ * @param i
*
- * @return NCSCC_RC_SUCCESS
- * @return NCSCC_RC_FAILURE
+ * @return node
*
*/
-DTM_NODE_DB *dtm_node_get_by_id(uint32_t nodeid) {
+DTM_NODE_DB *dtm_node_get(uint8_t *key, KEY_TYPES type) {
 TRACE_ENTER();
 DTM_INTERNODE_CB *dtms_cb = dtms_gl_cb;
+  DTM_NODE_DB *node = nullptr;
   
-  DTM_NODE_DB *node = reinterpret_cast(ncs_patricia_tree_get(

-  _cb->nodeid_tree, reinterpret_cast()));
-  if (node !=

Re: [devel] [PATCH 1/1] imm: fix racing in sending discard-node during network split [#3012]

2019-03-03 Thread Gary Lee

Hi Vu

Ack (review only)

Thanks

On 25/2/19, 6:30 pm, "Vu Minh Nguyen"  wrote:

At the time of spliting the cluster into 02 partitions but keeping a node
such as PL-3 connecting with both partitions, just IMMND on PL-3 will get
discard-node messages from both active IMMD on partition #1 and from standby
IMMD on partition #2.

That race later on caused IMMND on PL-3 crashed due to the mismatch
found at finalize-sync.

This patch makes a minor change at standby IMMD - rather then sending the
discard-node message even in standby role, will put the message in queue
and only broadcast it when the standby is assigned to active.
---
 src/imm/immd/immd_proc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/imm/immd/immd_proc.c b/src/imm/immd/immd_proc.c
index c16232d2d..69e23f2d3 100644
--- a/src/imm/immd/immd_proc.c
+++ b/src/imm/immd/immd_proc.c
@@ -778,7 +778,7 @@ uint32_t immd_process_immnd_down(IMMD_CB *cb, 
IMMD_IMMND_INFO_NODE *immnd_info,
}
}
 
-   if (active || !cb->immd_remote_up) {
+   if (active) {
/*
 ** HAFE - Let IMMND subscribe for IMMND up/down events instead?
 ** ABT - Not for now. IMMND up/down are only subscribed by
-- 
2.19.2






___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 2/2] rded: do not send SUCCESS to main thread [#3008]

2019-02-20 Thread Gary Lee


Hi Hans

Without the return statement, RDE could potentially proceed with setting 
itself to active etc..


We didn't notice this because opensaf_reboot() has this, but we're no 
longer calling that.


    if (use_fallback) {
        /* Wait for the alarm signal we set up earlier. */
        for (;;)
            pause();
    }

Probably a better fix is to add something similar to opensaf_quick_reboot().

Thanks

Gary

On 20/2/19 11:54 pm, Hans Nordebäck wrote:

Hi Gary,

a question, why was the return's added? /BR HansN

On 2/19/19 05:10, Gary Lee wrote:

do not send RDE_MSG_ACTIVE_PROMOTION_SUCCESS to
main thread if lock cannot be obtained
---
   src/rde/rded/role.cc | 2 ++
   1 file changed, 2 insertions(+)

diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index 06e93c6..3effc25 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -114,6 +114,7 @@ void Role::PromoteNode(const uint64_t cluster_size,
   LOG_ER("Unable to set active controller in consensus service");
   opensaf_quick_reboot("Unable to set active controller "
   "in consensus service");
+return;
 }
   
 RDE_CONTROL_BLOCK* cb = rde_get_control_block();

@@ -135,6 +136,7 @@ void Role::PromoteNode(const uint64_t cluster_size,
   LOG_ER("Unable to set active controller in consensus service");
   opensaf_quick_reboot("Unable to set active controller in "
   "consensus service");
+return;
 }
 std::this_thread::sleep_for(std::chrono::seconds(1));
   }



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 0/2] Review Request for fmd: improve failover response time [#3008]

2019-02-18 Thread Gary Lee

Summary: fmd: improve failover response time V2 [#3008]
Review request for Ticket(s): 3008
Peer Reviewer(s): Hans, Minh
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3008
Base revision: 5766361568498f8a496d87d8daafe9bffbd75ed9
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy 
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 8ccffc2cd9cd117578227e9cd49421e5c578fec6
Author: Gary Lee 
Date:   Tue, 19 Feb 2019 14:57:53 +1100

rded: do not send SUCCESS to main thread [#3008]

do not send RDE_MSG_ACTIVE_PROMOTION_SUCCESS to
main thread if lock cannot be obtained



revision 28e17d107f4a079155e03d9f875a3c0262ea19f5
Author: Gary Lee 
Date:   Tue, 19 Feb 2019 14:57:53 +1100

fmd: improve failover response time [#3008]

Improve failover response time if split brain prevention is enabled
but FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is set to 0.

Also, return immediately if node promotion fails to avoid
sending active role to RDA.



Complete diffstat:
--
 src/fm/fmd/fm_rda.cc | 14 +-
 src/rde/rded/role.cc |  2 ++
 2 files changed, 11 insertions(+), 5 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 2/2] rded: do not send SUCCESS to main thread [#3008]

2019-02-18 Thread Gary Lee

do not send RDE_MSG_ACTIVE_PROMOTION_SUCCESS to
main thread if lock cannot be obtained
---
 src/rde/rded/role.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index 06e93c6..3effc25 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -114,6 +114,7 @@ void Role::PromoteNode(const uint64_t cluster_size,
 LOG_ER("Unable to set active controller in consensus service");
 opensaf_quick_reboot("Unable to set active controller "
 "in consensus service");
+return;
   }
 
   RDE_CONTROL_BLOCK* cb = rde_get_control_block();
@@ -135,6 +136,7 @@ void Role::PromoteNode(const uint64_t cluster_size,
 LOG_ER("Unable to set active controller in consensus service");
 opensaf_quick_reboot("Unable to set active controller in "
 "consensus service");
+return;
   }
   std::this_thread::sleep_for(std::chrono::seconds(1));
 }
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

[devel] [PATCH 1/2] fmd: improve failover response time [#3008]

2019-02-18 Thread Gary Lee

Improve failover response time if split brain prevention is enabled
but FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is set to 0.

Also, return immediately if node promotion fails to avoid
sending active role to RDA.
---
 src/fm/fmd/fm_rda.cc | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/fm/fmd/fm_rda.cc b/src/fm/fmd/fm_rda.cc
index 504757c..d3063ba 100644
--- a/src/fm/fmd/fm_rda.cc
+++ b/src/fm/fmd/fm_rda.cc
@@ -88,17 +88,20 @@ uint32_t fm_rda_set_role(FM_CB *fm_cb, PCS_RDA_ROLE role) {
 
   Consensus consensus_service;
   if (consensus_service.IsEnabled() == true) {
-// Allow topology events to be processed first. The MDS thread may
-// be processing MDS down events and updating cluster_size concurrently.
-// We need cluster_size to be as accurate as possible, without waiting
-// too long for node down events.
-std::this_thread::sleep_for(std::chrono::seconds(4));
+if (consensus_service.PrioritisePartitionSize() == true) {
+  // Allow topology events to be processed first. The MDS thread may
+  // be processing MDS down events and updating cluster_size concurrently.
+  // We need cluster_size to be as accurate as possible, without waiting
+  // too long for node down events.
+  std::this_thread::sleep_for(std::chrono::seconds(4));
+}
 
 rc = consensus_service.PromoteThisNode(true, fm_cb->cluster_size);
 if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) {
   LOG_ER("Unable to set active controller in consensus service");
   opensaf_quick_reboot("Unable to set active controller "
   "in consensus service");
+  return NCSCC_RC_FAILURE;
 } else if (rc == SA_AIS_ERR_EXIST) {
   // @todo if we don't reboot, we don't seem to recover from this. Can we
   // improve?
@@ -107,6 +110,7 @@ uint32_t fm_rda_set_role(FM_CB *fm_cb, PCS_RDA_ROLE role) {
   "cluster?");
   opensaf_quick_reboot("A controller is already active. We were separated "
"from the cluster?");
+  return NCSCC_RC_FAILURE;
 }
   }
 
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 904 matches

Mail list logo