Re: [devel] [PATCH 1/1] base: Check return code from unlink in nid_create_ipc [#2829]

2018-04-06 Thread Anders Widell
Ack with minor comment: instead of calling access(), you could maybe 
simply check for the ENOENT errno value from unlink()?


regards,

Anders Widell


On 04/05/2018 11:53 AM, Hans Nordeback wrote:

---
  src/nid/agent/nid_ipc.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/nid/agent/nid_ipc.c b/src/nid/agent/nid_ipc.c
index 4f43cd309..1a77fd8e2 100644
--- a/src/nid/agent/nid_ipc.c
+++ b/src/nid/agent/nid_ipc.c
@@ -28,6 +28,7 @@
  
  #include 

  #include 
+#include 
  #include "osaf/configmake.h"
  
  #include "nid/agent/nid_api.h"

@@ -56,7 +57,13 @@ uint32_t nid_create_ipc(char *strbuf)
mode_t mask;
  
  	/* Lets Remove any such file if it already exists */

-   unlink(NID_FIFO);
+   if (access(NID_FIFO, F_OK ) != -1 ) {
+   if (unlink(NID_FIFO) < 0) {
+   sprintf(strbuf, " FAILURE: Unable To Delete FIFO Error: 
%s\n",
+   strerror(errno));
+   return NCSCC_RC_FAILURE;
+   }
+   }
  
  	mask = umask(0);
  



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] log: Handling of IMM OI BAD HANDLE in log server is incorrect [#2799]

2018-04-06 Thread Lennart Lund
Hi Canh,

I will do a small change based on the comment before pushing. Also see my 
comment [Lennart]

Thanks
Lennart

From: Canh Van Truong [mailto:canh.v.tru...@dektech.com.au]
Sent: den 6 april 2018 08:29
To: Lennart Lund ; Vu Minh Nguyen 

Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] log: Handling of IMM OI BAD HANDLE in log server is 
incorrect [#2799]


Hi Lennart,



Ack with one comment [Canh]



Thanks

Canh



-Original Message-

From: Lennart Lund [mailto:lennart.l...@ericsson.com]

Sent: Thursday, April 5, 2018 7:38 PM

To: vu.m.ngu...@dektech.com.au; 
canh.v.tru...@dektech.com.au

Cc: 
opensaf-devel@lists.sourceforge.net;
 Lennart Lund >

Subject: [PATCH 1/1] log: Handling of IMM OI BAD HANDLE in log server is 
incorrect [#2799]



Recovery of OI handle shall be started in all places where BAD HANDLE

can be returned. Creation of OI must be done in background thread.

Ongoing creation must be possible to stop e.g if server is becoming

standby

---

src/log/Makefile.am  |   3 +

src/log/logd/lgs.h   |  24 ---

src/log/logd/lgs_amf.cc  |  25 +--

src/log/logd/lgs_cb.h|   3 -

src/log/logd/lgs_config.cc   |  28 ++-

src/log/logd/lgs_config.h|   5 +-

src/log/logd/lgs_evt.cc  |  56 +++---

src/log/logd/lgs_imm.cc  | 333 ---

src/log/logd/lgs_imm.h   |  53 +

src/log/logd/lgs_main.cc | 116 ---

src/log/logd/lgs_mbcsv_v2.cc |   7 +-

src/log/logd/lgs_mbcsv_v3.cc |   7 +-

src/log/logd/lgs_mbcsv_v5.cc |   3 +-

src/log/logd/lgs_oi_admin.cc | 464 +++

src/log/logd/lgs_oi_admin.h  | 108 ++

src/log/logd/lgs_recov.cc|   4 +-

src/log/logd/lgs_stream.cc   |  85 ++--

17 files changed, 857 insertions(+), 467 deletions(-)

create mode 100644 src/log/logd/lgs_imm.h

create mode 100644 src/log/logd/lgs_oi_admin.cc

create mode 100644 src/log/logd/lgs_oi_admin.h



diff --git a/src/log/Makefile.am b/src/log/Makefile.am

index 3d951eb5d..5d33d355b 100644

--- a/src/log/Makefile.am

+++ b/src/log/Makefile.am

@@ -79,6 +79,7 @@ noinst_HEADERS += \

   src/log/logd/lgs_file.h \

   src/log/logd/lgs_filehdl.h \

   src/log/logd/lgs_fmt.h \

+ src/log/logd/lgs_imm.h \

   src/log/logd/lgs_imm_gcfg.h \

   src/log/logd/lgs_mbcsv.h \

   src/log/logd/lgs_mbcsv_v1.h \

@@ -86,6 +87,7 @@ noinst_HEADERS += \

   src/log/logd/lgs_mbcsv_v3.h \

   src/log/logd/lgs_mbcsv_v5.h \

   src/log/logd/lgs_mbcsv_v6.h \

+ src/log/logd/lgs_oi_admin.h \

   src/log/logd/lgs_recov.h \

   src/log/logd/lgs_stream.h \

   src/log/logd/lgs_util.h \

@@ -139,6 +141,7 @@ bin_osaflogd_SOURCES = \

   src/log/logd/lgs_mbcsv_v5.cc \

   src/log/logd/lgs_mbcsv_v6.cc \

   src/log/logd/lgs_mds.cc \

+ src/log/logd/lgs_oi_admin.cc \

   src/log/logd/lgs_recov.cc \

   src/log/logd/lgs_stream.cc \

   src/log/logd/lgs_util.cc \

diff --git a/src/log/logd/lgs.h b/src/log/logd/lgs.h

index 18e6d9281..b1d773375 100644

--- a/src/log/logd/lgs.h

+++ b/src/log/logd/lgs.h

@@ -95,7 +95,6 @@ extern uint32_t mbox_msgs[NCS_IPC_PRIORITY_MAX];

extern bool mbox_full[NCS_IPC_PRIORITY_MAX];

extern uint32_t mbox_low[NCS_IPC_PRIORITY_MAX];

extern pthread_mutex_t lgs_mbox_init_mutex;

-extern pthread_mutex_t lgs_OI_init_mutex;



 extern uint32_t initialize_for_assignment(lgs_cb_t *cb, SaAmfHAStateT 
ha_state);



@@ -108,27 +107,4 @@ extern uint32_t lgs_mds_msg_send(lgs_cb_t *cb, lgsv_msg_t 
*msg, MDS_DEST *dest,

  MDS_SYNC_SND_CTXT *mds_ctxt,

  MDS_SEND_PRIORITY_TYPE prio);



-extern SaAisErrorT lgs_imm_create_configStream(lgs_cb_t *cb);

-extern void logRootDirectory_filemove(const std::string _logRootDirectory,

-  const std::string _logRootDirectory,

-  time_t *cur_time_in);

-extern void logDataGroupname_fileown(const char *new_logDataGroupname);

-

-extern void lgs_imm_impl_reinit_nonblocking(lgs_cb_t *cb);

-extern void lgs_imm_init_OI_handle(SaImmOiHandleT *immOiHandle,

-   SaSelectionObjectT *immSelectionObject);

-extern void lgs_imm_impl_set(SaImmOiHandleT *immOiHandle,

- SaSelectionObjectT *immSelectionObject);

-extern SaAisErrorT lgs_imm_init_configStreams(lgs_cb_t *cb);

-

-// Functions for recovery handling

-void lgs_cleanup_abandoned_streams();

-void lgs_delete_one_stream_object(const 

Re: [devel] [PATCH 1/1] log: Handling of IMM OI BAD HANDLE in log server is incorrect [#2799]

2018-04-06 Thread Canh Van Truong
Hi Lennart,

 

Ack with one comment [Canh]

 

Thanks

Canh

 

-Original Message-

From: Lennart Lund [mailto:lennart.l...@ericsson.com] 

Sent: Thursday, April 5, 2018 7:38 PM

To: vu.m.ngu...@dektech.com.au; canh.v.tru...@dektech.com.au

Cc: opensaf-devel@lists.sourceforge.net; Lennart Lund


Subject: [PATCH 1/1] log: Handling of IMM OI BAD HANDLE in log server is
incorrect [#2799]

 

Recovery of OI handle shall be started in all places where BAD HANDLE

can be returned. Creation of OI must be done in background thread.

Ongoing creation must be possible to stop e.g if server is becoming

standby

---

src/log/Makefile.am  |   3 +

src/log/logd/lgs.h   |  24 ---

src/log/logd/lgs_amf.cc  |  25 +--

src/log/logd/lgs_cb.h|   3 -

src/log/logd/lgs_config.cc   |  28 ++-

src/log/logd/lgs_config.h|   5 +-

src/log/logd/lgs_evt.cc  |  56 +++---

src/log/logd/lgs_imm.cc  | 333 ---

src/log/logd/lgs_imm.h   |  53 +

src/log/logd/lgs_main.cc | 116 ---

src/log/logd/lgs_mbcsv_v2.cc |   7 +-

src/log/logd/lgs_mbcsv_v3.cc |   7 +-

src/log/logd/lgs_mbcsv_v5.cc |   3 +-

src/log/logd/lgs_oi_admin.cc | 464
+++

src/log/logd/lgs_oi_admin.h  | 108 ++

src/log/logd/lgs_recov.cc|   4 +-

src/log/logd/lgs_stream.cc   |  85 ++--

17 files changed, 857 insertions(+), 467 deletions(-)

create mode 100644 src/log/logd/lgs_imm.h

create mode 100644 src/log/logd/lgs_oi_admin.cc

create mode 100644 src/log/logd/lgs_oi_admin.h

 

diff --git a/src/log/Makefile.am b/src/log/Makefile.am

index 3d951eb5d..5d33d355b 100644

--- a/src/log/Makefile.am

+++ b/src/log/Makefile.am

@@ -79,6 +79,7 @@ noinst_HEADERS += \

   src/log/logd/lgs_file.h \

   src/log/logd/lgs_filehdl.h \

   src/log/logd/lgs_fmt.h \

+ src/log/logd/lgs_imm.h \

   src/log/logd/lgs_imm_gcfg.h \

   src/log/logd/lgs_mbcsv.h \

   src/log/logd/lgs_mbcsv_v1.h \

@@ -86,6 +87,7 @@ noinst_HEADERS += \

   src/log/logd/lgs_mbcsv_v3.h \

   src/log/logd/lgs_mbcsv_v5.h \

   src/log/logd/lgs_mbcsv_v6.h \

+ src/log/logd/lgs_oi_admin.h \

   src/log/logd/lgs_recov.h \

   src/log/logd/lgs_stream.h \

   src/log/logd/lgs_util.h \

@@ -139,6 +141,7 @@ bin_osaflogd_SOURCES = \

   src/log/logd/lgs_mbcsv_v5.cc \

   src/log/logd/lgs_mbcsv_v6.cc \

   src/log/logd/lgs_mds.cc \

+ src/log/logd/lgs_oi_admin.cc \

   src/log/logd/lgs_recov.cc \

   src/log/logd/lgs_stream.cc \

   src/log/logd/lgs_util.cc \

diff --git a/src/log/logd/lgs.h b/src/log/logd/lgs.h

index 18e6d9281..b1d773375 100644

--- a/src/log/logd/lgs.h

+++ b/src/log/logd/lgs.h

@@ -95,7 +95,6 @@ extern uint32_t mbox_msgs[NCS_IPC_PRIORITY_MAX];

extern bool mbox_full[NCS_IPC_PRIORITY_MAX];

extern uint32_t mbox_low[NCS_IPC_PRIORITY_MAX];

extern pthread_mutex_t lgs_mbox_init_mutex;

-extern pthread_mutex_t lgs_OI_init_mutex;

 extern uint32_t initialize_for_assignment(lgs_cb_t *cb, SaAmfHAStateT
ha_state);

@@ -108,27 +107,4 @@ extern uint32_t lgs_mds_msg_send(lgs_cb_t *cb,
lgsv_msg_t *msg, MDS_DEST *dest,

  MDS_SYNC_SND_CTXT *mds_ctxt,

  MDS_SEND_PRIORITY_TYPE prio);

-extern SaAisErrorT lgs_imm_create_configStream(lgs_cb_t *cb);

-extern void logRootDirectory_filemove(const std::string
_logRootDirectory,

-  const std::string
_logRootDirectory,

-  time_t *cur_time_in);

-extern void logDataGroupname_fileown(const char *new_logDataGroupname);

-

-extern void lgs_imm_impl_reinit_nonblocking(lgs_cb_t *cb);

-extern void lgs_imm_init_OI_handle(SaImmOiHandleT *immOiHandle,

-   SaSelectionObjectT *immSelectionObject);

-extern void lgs_imm_impl_set(SaImmOiHandleT *immOiHandle,

- SaSelectionObjectT *immSelectionObject);

-extern SaAisErrorT lgs_imm_init_configStreams(lgs_cb_t *cb);

-

-// Functions for recovery handling

-void lgs_cleanup_abandoned_streams();

-void lgs_delete_one_stream_object(const std::string _str);

-void lgs_search_stream_objects();

-SaUint32T *lgs_get_scAbsenceAllowed_attr(SaUint32T *attr_val);

-int lgs_get_streamobj_attr(SaImmAttrValuesT_2 ***attrib_out,

-   const std::string _name,

-   SaImmHandleT *immOmHandle);

-int lgs_free_streamobj_attr(SaImmHandleT immHandle);

-

#endif  // LOG_LOGD_LGS_H_

diff --git a/src/log/logd/lgs_amf.cc b/src/log/logd/lgs_amf.cc

index 6fa044ff2..89146 100644

--- a/src/log/logd/lgs_amf.cc

+++ b/src/log/logd/lgs_amf.cc

@@ -23,6 +23,7 @@

#include "osaf/immutil/immutil.h"


[devel] [PATCH 1/5] osaf: extend API to include a create key and an enhanced set key function [#2795]

2018-04-06 Thread Gary Lee
- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin
---
 src/osaf/consensus/plugins/etcd.plugin   |  86 +++-
 src/osaf/consensus/plugins/etcd3.plugin  | 355 +++
 src/osaf/consensus/plugins/sample.plugin |  67 +-
 3 files changed, 490 insertions(+), 18 deletions(-)
 create mode 100644 src/osaf/consensus/plugins/etcd3.plugin

diff --git a/src/osaf/consensus/plugins/etcd.plugin 
b/src/osaf/consensus/plugins/etcd.plugin
index 586059b32..6ed85ac92 100644
--- a/src/osaf/consensus/plugins/etcd.plugin
+++ b/src/osaf/consensus/plugins/etcd.plugin
@@ -29,7 +29,7 @@ readonly etcd_timeout="5s"
 #   0 - success,  is echoed to stdout
 #   non-zero - failure
 get() {
-  readonly key=$1
+  readonly key="$1"
 
   if value=$(etcdctl $etcd_options --timeout $etcd_timeout get 
"$directory$key" 2>&1)
   then
@@ -49,8 +49,8 @@ get() {
 #   0 - success
 #   non-zero - failure
 setkey() {
-  readonly key=$1
-  readonly value=$2
+  readonly key="$1"
+  readonly value="$2"
 
   if etcdctl $etcd_options --timeout $etcd_timeout set "$directory$key" \
 "$value" >/dev/null
@@ -61,6 +61,58 @@ setkey() {
   fi
 }
 
+# create
+#   create  and set to  in key-value store. Fails if the key
+#   already exists
+# params:
+#   $1 - 
+#   $2 - 
+# returns:
+#   0 - success
+#   1 - already exists
+#   2 or above - other failure
+create_key() {
+  readonly key="$1"
+  readonly value="$2"
+
+  if output=$(etcdctl $etcd_options --timeout $etcd_timeout mk 
"$directory$key" \
+"$value" 2>&1)
+  then
+return 0
+  else
+if echo $output | grep "already exists"
+then
+  return 1
+fi
+  fi
+
+  return 2
+}
+
+# set
+#   set  to  in key-value store, if the existing value matches
+#   
+# params:
+#   $1 - 
+#   $2 - 
+#   $3 - 
+# returns:
+#   0 - success
+#   non-zero - failure
+setkey_match_prev() {
+  readonly key="$1"
+  readonly value="$2"
+  readonly prev="$3"
+
+  if etcdctl $etcd_options --timeout $etcd_timeout set "$directory$key" \
+"$value" --swap-with-value "$prev" >/dev/null
+  then
+return 0
+  else
+return 1
+  fi
+}
+
 # erase
 #   erase  in key-value store
 # params:
@@ -69,7 +121,7 @@ setkey() {
 #   0 - success
 #   non-zero - failure
 erase() {
-  readonly key=$1
+  readonly key="$1"
 
   if etcdctl $etcd_options --timeout $etcd_timeout \
 rm "$directory$key" >/dev/null 2>&1
@@ -90,8 +142,8 @@ erase() {
 #   2 or above - other failure
 # NOTE: if lock is already acquired by , then timeout is extended
 lock() {
-  readonly owner=$1
-  readonly timeout=$2
+  readonly owner="$1"
+  readonly timeout="$2"
 
   if etcdctl $etcd_options --timeout $etcd_timeout \
 mk "$directory$keyname" "$owner" \
@@ -145,7 +197,7 @@ lock_owner() {
 #   2 or above - other failure
 #
 unlock() {
-  readonly owner=$1
+  readonly owner="$1"
   readonly forced=${2:-false}
 
   if [ "$forced" = false ]; then
@@ -185,7 +237,7 @@ unlock() {
 #   0 - success,  is echoed to stdout
 #   non-zero - failure
 watch() {
-  readonly key=$1
+  readonly key="$1"
 
   if value=$(etcdctl $etcd_options --timeout $etcd_timeout \
 watch "$directory$key" 2>&1)
@@ -216,6 +268,22 @@ case "$1" in
 setkey "$2" "$3"
 exit $?
 ;;
+  set_if_prev)
+if [ "$#" -ne 4 ]; then
+  echo "Usage: $0 set   "
+  exit 1
+fi
+setkey_match_prev "$2" "$3" "$4"
+exit $?
+;;
+  create)
+if [ "$#" -ne 3 ]; then
+  echo "Usage: $0 create  "
+  exit 1
+fi
+create_key "$2" "$3"
+exit $?
+;;
   erase)
 if [ "$#" -ne 2 ]; then
   echo "Usage: $0 erase "
@@ -269,7 +337,7 @@ case "$1" in
 exit $?
 ;;
   *)
-echo "Usage: $0 {get|set|erase|lock|unlock|lock_owner|watch|watch_lock}"
+echo "Usage: $0 
{get|set|create|set_if_prev|erase|lock|unlock|lock_owner|watch|watch_lock}"
 ;;
 esac
 
diff --git a/src/osaf/consensus/plugins/etcd3.plugin 
b/src/osaf/consensus/plugins/etcd3.plugin
new file mode 100644
index 0..df05be540
--- /dev/null
+++ b/src/osaf/consensus/plugins/etcd3.plugin
@@ -0,0 +1,355 @@
+#!/usr/bin/env bash
+#  -*- OpenSAF  -*-
+#
+# (C) Copyright 2018 Ericsson AB 2018 - All Rights Reserved.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+# or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
+# under the GNU Lesser General Public License Version 2.1, February 1999.
+# The complete license can be accessed from the following location:
+# http://opensource.org/licenses/lgpl-license.php
+# See the Copying file included with the OpenSAF distribution for full
+# licensing terms.
+#
+# Please note: this API is subject to change and may be modified
+# in a future version of OpenSAF. Future API versions may not be
+# backward compatible. This plugin may need to be adapted.
+
+readonly 

[devel] [PATCH 5/5] rded: adapt to new Consensus API [#2795]

2018-04-06 Thread Gary Lee
- add 3 new internal message:

RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK

- subscribe to AMFND service up events to keep track of the number
  of cluster members

- listen for takeover requests in KV store
---
 src/rde/rded/rde_cb.h| 12 ++--
 src/rde/rded/rde_main.cc | 72 ++--
 src/rde/rded/rde_mds.cc  | 39 ++
 src/rde/rded/rde_rda.cc  |  2 +-
 src/rde/rded/role.cc | 46 ++-
 src/rde/rded/role.h  |  2 +-
 6 files changed, 135 insertions(+), 38 deletions(-)

diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h
index fc100849a..f5ad689c3 100644
--- a/src/rde/rded/rde_cb.h
+++ b/src/rde/rded/rde_cb.h
@@ -19,12 +19,13 @@
 #define RDE_RDED_RDE_CB_H_
 
 #include 
+#include 
 #include "base/osaf_utility.h"
 #include "mds/mds_papi.h"
 #include "rde/agent/rda_papi.h"
+#include "rde/common/rde_rda_common.h"
 #include "rde/rded/rde_amf.h"
 #include "rde/rded/rde_rda.h"
-#include "rde/common/rde_rda_common.h"
 
 /*
  **  RDE_CONTROL_BLOCK
@@ -39,7 +40,9 @@ struct RDE_CONTROL_BLOCK {
   bool task_terminate;
   RDE_RDA_CB rde_rda_cb;
   RDE_AMF_CB rde_amf_cb;
-  bool monitor_lock_thread_running;
+  bool monitor_lock_thread_running{false};
+  bool monitor_takeover_req_thread_running{false};
+  std::set cluster_members{};
 };
 
 enum RDE_MSG_TYPE {
@@ -47,7 +50,10 @@ enum RDE_MSG_TYPE {
   RDE_MSG_PEER_DOWN = 2,
   RDE_MSG_PEER_INFO_REQ = 3,
   RDE_MSG_PEER_INFO_RESP = 4,
-  RDE_MSG_NEW_ACTIVE_CALLBACK = 5
+  RDE_MSG_NEW_ACTIVE_CALLBACK = 5,
+  RDE_MSG_NODE_UP = 6,
+  RDE_MSG_NODE_DOWN = 7,
+  RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8
 };
 
 struct rde_peer_info {
diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
index 78e7256c1..a91f6896c 100644
--- a/src/rde/rded/rde_main.cc
+++ b/src/rde/rded/rde_main.cc
@@ -17,6 +17,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -28,17 +29,16 @@
 #include 
 #include 
 #include 
-#include "osaf/consensus/consensus.h"
+#include "base/conf.h"
 #include "base/daemon.h"
 #include "base/logtrace.h"
+#include "base/ncs_main_papi.h"
 #include "base/osaf_poll.h"
 #include "mds/mds_papi.h"
-#include "base/ncs_main_papi.h"
 #include "nid/agent/nid_api.h"
-#include 
+#include "osaf/consensus/consensus.h"
 #include "rde/rded/rde_cb.h"
 #include "rde/rded/role.h"
-#include "base/conf.h"
 
 #define RDA_MAX_CLIENTS 32
 
@@ -47,13 +47,15 @@ enum { FD_TERM = 0, FD_AMF = 1, FD_MBX, FD_RDA_SERVER, 
FD_CLIENT_START };
 static void SendPeerInfoResp(MDS_DEST mds_dest);
 static void CheckForSplitBrain(const rde_msg *msg);
 
-const char *rde_msg_name[] = {
-"-",
-"RDE_MSG_PEER_UP(1)",
-"RDE_MSG_PEER_DOWN(2)",
-"RDE_MSG_PEER_INFO_REQ(3)",
-"RDE_MSG_PEER_INFO_RESP(4)",
-};
+const char *rde_msg_name[] = {"-",
+  "RDE_MSG_PEER_UP(1)",
+  "RDE_MSG_PEER_DOWN(2)",
+  "RDE_MSG_PEER_INFO_REQ(3)",
+  "RDE_MSG_PEER_INFO_RESP(4)",
+  "RDE_MSG_NEW_ACTIVE_CALLBACK(5)"
+  "RDE_MSG_NODE_UP(6)",
+  "RDE_MSG_NODE_DOWN(7)",
+  "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)"};
 
 static RDE_CONTROL_BLOCK _rde_cb;
 static RDE_CONTROL_BLOCK *rde_cb = &_rde_cb;
@@ -130,11 +132,12 @@ static void handle_mbx_event() {
 if (my_node.compare(active_controller) != 0) {
   // we are meant to be active, but consensus service doesn't think so
   LOG_WA("Role does not match consensus service. New controller: %s",
-active_controller.c_str());
+ active_controller.c_str());
   if (consensus_service.IsRemoteFencingEnabled() == false) {
 LOG_ER("Probable split-brain. Rebooting this node");
+
 opensaf_reboot(0, nullptr,
-  "Split-brain detected by consensus service");
+   "Split-brain detected by consensus service");
   }
 }
 
@@ -144,6 +147,44 @@ static void handle_mbx_event() {
   }
   break;
 }
+case RDE_MSG_NODE_UP:
+  rde_cb->cluster_members.insert(msg->fr_node_id);
+  TRACE("cluster_size %zu", rde_cb->cluster_members.size());
+  break;
+case RDE_MSG_NODE_DOWN:
+  rde_cb->cluster_members.erase(msg->fr_node_id);
+  TRACE("cluster_size %zu", rde_cb->cluster_members.size());
+  break;
+case RDE_MSG_TAKEOVER_REQUEST_CALLBACK: {
+  rde_cb->monitor_takeover_req_thread_running = false;
+
+  if (role->role() == PCS_RDA_ACTIVE) {
+LOG_NO("Received takeover request. Our network size is %zu",
+   rde_cb->cluster_members.size());
+
+Consensus consensus_service;
+Consensus::TakeoverState state =
+consensus_service.HandleTakeoverRequest(
+rde_cb->cluster_members.size());
+
+if (state == 

[devel] [PATCH 4/5] fmd: adapt to new Consensus API [#2795]

2018-04-06 Thread Gary Lee
---
 src/fm/fmd/fm_main.cc | 26 +-
 src/fm/fmd/fm_mds.cc  |  2 ++
 src/fm/fmd/fm_rda.cc  | 15 +--
 3 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/src/fm/fmd/fm_main.cc b/src/fm/fmd/fm_main.cc
index 73c9b9ccd..3371ec5e8 100644
--- a/src/fm/fmd/fm_main.cc
+++ b/src/fm/fmd/fm_main.cc
@@ -551,21 +551,12 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
* trigerred quicker than the node_down event
* has been received.
*/
-  if (fm_cb->role == PCS_RDA_STANDBY) {
-const std::string current_active =
-consensus_service.CurrentActive();
-if (current_active.compare(osaf_extended_name_borrow(
-_cb->peer_clm_node_name)) == 0) {
-  // update consensus service, before fencing old active controller
-  consensus_service.DemoteCurrentActive();
-}
-  }
 
   if (fm_cb->use_remote_fencing) {
 if (fm_cb->peer_node_terminated == false) {
   // if peer_sc_up is true then
   // the node has come up already
-  if (fm_cb->peer_sc_up == false && fm_cb->immnd_down == true) {
+  if (consensus_service.IsEnabled() == false) {
 opensaf_reboot(fm_cb->peer_node_id,
(char *)fm_cb->peer_clm_node_name.value,
"Received Node Down for peer controller");
@@ -580,8 +571,7 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
 fm_cb->mutex_.Lock();
 peer_node_name = fm_cb->peer_node_name;
 fm_cb->mutex_.Unlock();
-opensaf_reboot(fm_cb->peer_node_id,
-   peer_node_name.c_str(),
+opensaf_reboot(fm_cb->peer_node_id, peer_node_name.c_str(),
"Received Node Down for peer controller");
   }
   if (!((fm_cb->role == PCS_RDA_ACTIVE) &&
@@ -632,12 +622,6 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
 }
 
 Consensus consensus_service;
-const std::string current_active = consensus_service.CurrentActive();
-if (current_active.compare(
-osaf_extended_name_borrow(_cb->peer_clm_node_name)) == 0) {
-  // update consensus service, before fencing old active controller
-  consensus_service.DemoteCurrentActive();
-}
 
 /* Now. Try resetting other blade */
 fm_cb->role = PCS_RDA_ACTIVE;
@@ -645,7 +629,8 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
 LOG_NO("Reseting peer controller node id: %x",
unsigned(fm_cb->peer_node_id));
 if (fm_cb->use_remote_fencing) {
-  if (fm_cb->peer_node_terminated == false) {
+  if (fm_cb->peer_node_terminated == false &&
+  consensus_service.IsEnabled() == false) {
 opensaf_reboot(fm_cb->peer_node_id,
(char *)fm_cb->peer_clm_node_name.value,
"Received Node Down for peer controller");
@@ -658,8 +643,7 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
   fm_cb->mutex_.Lock();
   peer_node_name = fm_cb->peer_node_name;
   fm_cb->mutex_.Unlock();
-  opensaf_reboot(fm_cb->peer_node_id,
- peer_node_name.c_str(),
+  opensaf_reboot(fm_cb->peer_node_id, peer_node_name.c_str(),
  "Received Node Down for Active peer");
 }
 fm_rda_set_role(fm_cb, PCS_RDA_ACTIVE);
diff --git a/src/fm/fmd/fm_mds.cc b/src/fm/fmd/fm_mds.cc
index 277a357d2..be25a5610 100644
--- a/src/fm/fmd/fm_mds.cc
+++ b/src/fm/fmd/fm_mds.cc
@@ -373,6 +373,7 @@ static uint32_t fm_mds_node_evt(FM_CB *cb,
 case NCSMDS_NODE_DOWN:
   if (cb->cluster_size != 0) {
 --cb->cluster_size;
+TRACE("cluster_size %" PRIu64, cb->cluster_size);
 TRACE("Node down event for node id %x, cluster size is now: %llu",
   node_evt->node_id, (unsigned long long)cb->cluster_size);
 check_for_node_isolation(cb);
@@ -397,6 +398,7 @@ static uint32_t fm_mds_node_evt(FM_CB *cb,
 
 case NCSMDS_NODE_UP:
   ++cb->cluster_size;
+  TRACE("cluster_size %" PRIu64, cb->cluster_size);
   TRACE("Node up event for node id %x, cluster size is now: %llu",
 node_evt->node_id, (unsigned long long)cb->cluster_size);
   check_for_node_isolation(cb);
diff --git a/src/fm/fmd/fm_rda.cc b/src/fm/fmd/fm_rda.cc
index 47e1f1d32..1bbf2369d 100644
--- a/src/fm/fmd/fm_rda.cc
+++ b/src/fm/fmd/fm_rda.cc
@@ -87,13 +87,24 @@ uint32_t fm_rda_set_role(FM_CB *fm_cb, PCS_RDA_ROLE role) {
   osafassert(role == PCS_RDA_ACTIVE);
 
   Consensus consensus_service;
-  rc = consensus_service.PromoteThisNode();
-  if (rc != SA_AIS_OK) {
+  rc = consensus_service.PromoteThisNode(true, 

[devel] [PATCH 2/5] osaf: add lock takeover request fuction [#2795]

2018-04-06 Thread Gary Lee
- add create and set (if previous value matches) functions to KeyValue class
- add Consensus::MonitorTakeoverRequest() function for use by RDE to answer 
takeover requests
- add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, 
it will
  create a takeover request in the KV store. An existing SC can reject the lock 
takeover
---
 src/osaf/consensus/consensus.cc | 385 ++--
 src/osaf/consensus/consensus.h  |  51 +-
 src/osaf/consensus/key_value.cc | 102 +++
 src/osaf/consensus/key_value.h  |  19 +-
 4 files changed, 456 insertions(+), 101 deletions(-)

diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc
index cc04f3518..b639d72e1 100644
--- a/src/osaf/consensus/consensus.cc
+++ b/src/osaf/consensus/consensus.cc
@@ -15,13 +15,17 @@
 #include "osaf/consensus/consensus.h"
 #include 
 #include 
+#include 
 #include 
 #include "base/conf.h"
 #include "base/getenv.h"
 #include "base/logtrace.h"
 #include "base/ncssysf_def.h"
 
-SaAisErrorT Consensus::PromoteThisNode() {
+const std::string Consensus::kTakeoverRequestKeyname = "takeover_request";
+
+SaAisErrorT Consensus::PromoteThisNode(const bool graceful_takeover,
+const uint64_t cluster_size) {
   TRACE_ENTER();
   SaAisErrorT rc;
 
@@ -39,33 +43,30 @@ SaAisErrorT Consensus::PromoteThisNode() {
   }
 
   if (rc == SA_AIS_ERR_EXIST) {
+// there's a chance the lock has been released since the lock attempt
 // get the current active controller
-std::string current_active("");
-retries = 0;
-rc = KeyValue::LockOwner(current_active);
-while (rc != SA_AIS_OK && retries < kMaxRetry) {
-  ++retries;
-  std::this_thread::sleep_for(kSleepInterval);
-  rc = KeyValue::LockOwner(current_active);
-}
-if (rc != SA_AIS_OK) {
-  LOG_ER("Failed to get current lock owner. Will attempt to lock anyway");
+std::string current_active = CurrentActive();
+
+if (current_active.empty() == true) {
+  LOG_WA("Failed to get current lock owner. Will attempt to lock anyway");
 }
 
+bool take_over_request_created = false;
 LOG_NO("Current active controller is %s", current_active.c_str());
 
-// there's a chance the lock has been released since the lock attempt
 if (current_active.empty() == false) {
-  // remove current active controller's lock and fence it
-  retries = 0;
-  rc = KeyValue::Unlock(current_active);
-  while (rc == SA_AIS_ERR_TRY_AGAIN && retries < kMaxRetry) {
-LOG_IN("Trying to unlock");
-++retries;
-std::this_thread::sleep_for(kSleepInterval);
-rc = KeyValue::Unlock(current_active);
+  if (graceful_takeover == true) {
+rc = CreateTakeoverRequest(current_active, base::Conf::NodeName(),
+   cluster_size);
+if (rc != SA_AIS_OK) {
+  LOG_WA("Takeover request failed (%d)", rc);
+  return rc;
+}
+take_over_request_created = true;
   }
 
+  // remove current active controller's lock and fence it
+  rc = Demote(current_active);
   if (rc == SA_AIS_OK) {
 FenceNode(current_active);
   } else {
@@ -82,6 +83,23 @@ SaAisErrorT Consensus::PromoteThisNode() {
   std::this_thread::sleep_for(kSleepInterval);
   rc = KeyValue::Lock(base::Conf::NodeName(), kLockTimeout);
 }
+
+if (take_over_request_created == true) {
+  SaAisErrorT rc1;
+
+  // remove takeover request
+  rc1 = KeyValue::Erase(kTakeoverRequestKeyname);
+  retries = 0;
+  while (rc1 != SA_AIS_OK && retries < kMaxRetry) {
+++retries;
+std::this_thread::sleep_for(kSleepInterval);
+rc1 = KeyValue::Erase(kTakeoverRequestKeyname);
+  }
+
+  if (rc1 != SA_AIS_OK) {
+LOG_WA("Could not remove takeover request");
+  }
+}
   }
 
   if (rc == SA_AIS_OK) {
@@ -93,43 +111,23 @@ SaAisErrorT Consensus::PromoteThisNode() {
   return rc;
 }
 
-SaAisErrorT Consensus::Demote(const std::string& node = "") {
+SaAisErrorT Consensus::Demote(const std::string& node) {
   TRACE_ENTER();
   if (use_consensus_ == false) {
 return SA_AIS_OK;
   }
 
+  osafassert(node.empty() == false);
+
   SaAisErrorT rc = SA_AIS_ERR_FAILED_OPERATION;
   uint32_t retries = 0;
 
-  // check current active node
-  std::string current_active;
-  rc = KeyValue::LockOwner(current_active);
-  while (rc != SA_AIS_OK && retries < kMaxRetry) {
-++retries;
-std::this_thread::sleep_for(kSleepInterval);
-rc = KeyValue::LockOwner(current_active);
-  }
-
-  if (rc != SA_AIS_OK) {
-LOG_ER("Failed to get lock owner");
-return rc;
-  }
-
-  LOG_NO("Demoting %s as active controller", current_active.c_str());
-
-  if (node.empty() == false && node != current_active) {
-// node is not the current active controller!
-osafassert(false);
-  }
-
-  retries = 0;
-  rc = KeyValue::Unlock(current_active);
+  rc = KeyValue::Unlock(node);
   while (rc 

[devel] [PATCH 3/5] amfd: adapt to new Consensus API [#2795]

2018-04-06 Thread Gary Lee
---
 src/amf/amfd/role.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amf/amfd/role.cc b/src/amf/amfd/role.cc
index c8aa9cf1f..790983ee7 100644
--- a/src/amf/amfd/role.cc
+++ b/src/amf/amfd/role.cc
@@ -1217,7 +1217,7 @@ uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb) {
   osaf_mutex_unlock_ordie(_reinit_mutex);
 
   Consensus consensus_service;
-  rc = consensus_service.PromoteThisNode();
+  rc = consensus_service.PromoteThisNode(false, 0);
   if (rc != SA_AIS_OK) {
 LOG_ER("Unable to set active controller in consensus service");
 osafassert(false);
-- 
2.14.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/5] Review Request for split-brain: select active SC from largest network partition V2 [#2795]

2018-04-06 Thread Gary Lee
Summary: Split-brain: select active SC from largest network partition V2 
[#2795] 
Review request for Ticket(s): 2795
Peer Reviewer(s): Anders, Hans, Ravi 
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2795
Base revision: b3c8028c3312ffe13c815dbe0249947a5c4947dc
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy 
 Core libraries  y 
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

Changes from V1:

Delete takeover request *after* obtaining lock
Add etcd3.plugin


revision b23363e798e8f02eae550f520a1525afe11362bc
Author: Gary Lee 
Date:   Fri, 6 Apr 2018 16:07:41 +1000

rded: adapt to new Consensus API [#2795]

- add 3 new internal message:

RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK

- subscribe to AMFND service up events to keep track of the number
  of cluster members

- listen for takeover requests in KV store



revision 09843b483a03433a747ec650b1cbd33678661d90
Author: Gary Lee 
Date:   Fri, 6 Apr 2018 16:07:41 +1000

fmd: adapt to new Consensus API [#2795]



revision 88fb3cc8475dcaa0d075ab50990bb175b3ea2eca
Author: Gary Lee 
Date:   Fri, 6 Apr 2018 16:07:41 +1000

amfd: adapt to new Consensus API [#2795]



revision 15b585c34bc963e0f7d1c1627b04eec133288859
Author: Gary Lee 
Date:   Fri, 6 Apr 2018 16:07:41 +1000

osaf: add lock takeover request fuction [#2795]

- add create and set (if previous value matches) functions to KeyValue class
- add Consensus::MonitorTakeoverRequest() function for use by RDE to answer 
takeover requests
- add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, 
it will
  create a takeover request in the KV store. An existing SC can reject the lock 
takeover



revision 51080e9a4779e32c9caf66d280dac2cdf848a68e
Author: Gary Lee 
Date:   Fri, 6 Apr 2018 16:07:41 +1000

osaf: extend API to include a create key and an enhanced set key function 
[#2795]

- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin



Added Files:

 src/osaf/consensus/plugins/etcd3.plugin


Complete diffstat:
--
 src/amf/amfd/role.cc |   2 +-
 src/fm/fmd/fm_main.cc|  26 +--
 src/fm/fmd/fm_mds.cc |   2 +
 src/fm/fmd/fm_rda.cc |  15 +-
 src/osaf/consensus/consensus.cc  | 385 ++-
 src/osaf/consensus/consensus.h   |  51 +++-
 src/osaf/consensus/key_value.cc  | 102 +---
 src/osaf/consensus/key_value.h   |  19 +-
 src/osaf/consensus/plugins/etcd.plugin   |  86 ++-
 src/osaf/consensus/plugins/etcd3.plugin  | 355 
 src/osaf/consensus/plugins/sample.plugin |  67 +-
 src/rde/rded/rde_cb.h|  12 +-
 src/rde/rded/rde_main.cc |  72 --
 src/rde/rded/rde_mds.cc  |  39 +++-
 src/rde/rded/rde_rda.cc  |   2 +-
 src/rde/rded/role.cc |  46 ++--
 src/rde/rded/role.h  |   2 +-
 17 files changed, 1102 insertions(+), 181 deletions(-)


Testing Commands:
-
Some tests performed:

1) SI swap of safSi=SC-2N,safApp=OpenSAF
2) Isolate standby cluster (eg. use iptables to block port 6700 on a TCP system)
3) Isolate active cluster

Testing, Expected Results:
--

1) No error
2) Standby will fail to be promoted as active as the takeover request is 
rejected
3) Standby will be promoted

Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is