[devel] [PATCH 0/2] Review Request for base: Add support for OpenSAF local log file [#2306]

2018-04-11 Thread Minh Chau
Summary: base: Add support to direct OpenSAF logging to local node file [#2306]
Review request for Ticket(s): 2306
Peer Reviewer(s): Anders, Hans, Ravi
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2306
Base revision: aff54ff091727f27830443332b830890668749cf
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision f7f80c0372017bb3b5b94c927e08a86adf61e286
Author: Minh Chau 
Date:   Thu, 12 Apr 2018 09:02:46 +1000

base: Example of OSAF_LOCAL_LOG_FILE environment variable in amfd.conf file 
[#2306]

This commit is only an example to show where OSAF_LOCAL_LOG_FILE is
configured. This commit will not be pushed.
The introduction of OSAF_LOCAL_LOG_FILE for all services will be
in another commit.



revision 9ad63ddd3259950ab270aa9797742fbb801a6594
Author: Minh Chau 
Date:   Thu, 12 Apr 2018 08:36:05 +1000

base: Add support to direct OpenSAF logging to local node file [#2306]

Unify TraceLog and MdsLog class to one class (TraceLog) so it can be
used as a common log client.
Add new instance TraceLog for OpenSAF logging to local file, which can
be enabled/disabled via environment variable OSAF_LOCAL_NODE_LOG



Complete diffstat:
--
 src/amf/amfd/amfd.conf |   6 ++
 src/base/logtrace.cc   | 167 +
 src/base/logtrace.h|  50 +--
 src/mds/mds_log.cc | 114 +++--
 4 files changed, 146 insertions(+), 191 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
ack from reviewers


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that 

[devel] [PATCH 2/2] base: Example of OSAF_LOCAL_LOG_FILE environment variable in amfd.conf file [#2306]

2018-04-11 Thread Minh Chau
This commit is only an example to show where OSAF_LOCAL_LOG_FILE is
configured. This commit will not be pushed.
The introduction of OSAF_LOCAL_LOG_FILE for all services will be
in another commit.
---
 src/amf/amfd/amfd.conf | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/amf/amfd/amfd.conf b/src/amf/amfd/amfd.conf
index 6ee111b..a5e4c21 100644
--- a/src/amf/amfd/amfd.conf
+++ b/src/amf/amfd/amfd.conf
@@ -19,3 +19,9 @@ export AVSV_HB_PERIOD=100
 
 # Uncomment the next line to enable info level logging
 #args="--loglevel=info"
+
+# Uncomment the next line to enable this service to log to OpenSAF local node 
log file
+# Only logging priority with equal or higher than LOG_WARNING to system log 
file
+# All logging will be recored in new local node log file
+# The log file is $PKGLOGDIR/osaf.log
+export OSAF_LOCAL_NODE_LOG=1
\ No newline at end of file
-- 
2.7.4


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/2] base: Add support to direct OpenSAF logging to local node file [#2306]

2018-04-11 Thread Minh Chau
Unify TraceLog and MdsLog class to one class (TraceLog) so it can be
used as a common log client.
Add new instance TraceLog for OpenSAF logging to local file, which can
be enabled/disabled via environment variable OSAF_LOCAL_NODE_LOG
---
 src/base/logtrace.cc | 167 ++-
 src/base/logtrace.h  |  50 +--
 src/mds/mds_log.cc   | 114 +++
 3 files changed, 140 insertions(+), 191 deletions(-)

diff --git a/src/base/logtrace.cc b/src/base/logtrace.cc
index b046fab..857e31c 100644
--- a/src/base/logtrace.cc
+++ b/src/base/logtrace.cc
@@ -36,15 +36,10 @@
 #include 
 #include 
 #include 
-#include "base/buffer.h"
-#include "base/conf.h"
-#include "base/log_message.h"
-#include "base/macros.h"
-#include "base/mutex.h"
+#include "base/getenv.h"
 #include "base/ncsgl_defs.h"
 #include "base/osaf_utility.h"
 #include "base/time.h"
-#include "base/unix_client_socket.h"
 #include "dtm/common/osaflog_protocol.h"
 
 namespace global {
@@ -55,65 +50,38 @@ const char *const prefix_name[] = {"EM", "AL", "CR", "ER", 
"WA", "NO", "IN",
"T6", "T7", "T8", ">>", "<<"};
 char *msg_id;
 int logmask;
+const char* osaf_log_file = "osaf.log";
+bool enable_osaf_log = false;
 
 }  // namespace global
 
-class TraceLog {
- public:
-  static bool Init();
-  static void Log(base::LogMessage::Severity severity, const char *fmt,
-  va_list ap);
-
- private:
-  TraceLog(const std::string , const std::string _name,
-   uint32_t proc_id, const std::string _id,
-   const std::string _name);
-  void LogInternal(base::LogMessage::Severity severity, const char *fmt,
-   va_list ap);
-  static constexpr const uint32_t kMaxSequenceId = uint32_t{0x7fff};
-  static TraceLog *instance_;
-  const base::LogMessage::HostName fqdn_;
-  const base::LogMessage::AppName app_name_;
-  const base::LogMessage::ProcId proc_id_;
-  const base::LogMessage::MsgId msg_id_;
-  uint32_t sequence_id_;
-  base::UnixClientSocket log_socket_;
-  base::Buffer<512> buffer_;
-  base::Mutex mutex_;
-
-  DELETE_COPY_AND_MOVE_OPERATORS(TraceLog);
-};
-
-TraceLog *TraceLog::instance_ = nullptr;
-
-TraceLog::TraceLog(const std::string , const std::string _name,
-   uint32_t proc_id, const std::string _id,
-   const std::string _name)
-: fqdn_{base::LogMessage::HostName{fqdn}},
-  app_name_{base::LogMessage::AppName{app_name}},
-  proc_id_{base::LogMessage::ProcId{std::to_string(proc_id)}},
-  msg_id_{base::LogMessage::MsgId{msg_id}},
-  sequence_id_{1},
-  log_socket_{socket_name, base::UnixSocket::kBlocking},
-  buffer_{},
-  mutex_{} {}
-
-bool TraceLog::Init() {
-  if (instance_ != nullptr) return false;
-  char app_name[49];
-  char pid_path[1024];
+static TraceLog gl_trace;
+static TraceLog gl_osaflog;
+TraceLog::TraceLog()
+: sequence_id_{1}, buffer_{} {
+  mutex_ = nullptr;
+  log_socket_ = nullptr;
+}
+
+TraceLog::~TraceLog() {
+  if (mutex_) delete mutex_;
+  if (log_socket_) delete log_socket_;
+}
+bool TraceLog::Init(const char *msg_id, WriteMode mode) {
+  char app_name[NAME_MAX];
+  char pid_path[PATH_MAX];
   uint32_t process_id = getpid();
   char *token, *saveptr;
   char *pid_name = nullptr;
 
-  memset(app_name, 0, 49);
-  memset(pid_path, 0, 1024);
+  memset(app_name, 0, NAME_MAX);
+  memset(pid_path, 0, PATH_MAX);
 
   snprintf(pid_path, sizeof(pid_path), "/proc/%" PRIu32 "/cmdline", 
process_id);
   FILE *f = fopen(pid_path, "r");
   if (f != nullptr) {
 size_t size;
-size = fread(pid_path, sizeof(char), 1024, f);
+size = fread(pid_path, sizeof(char), PATH_MAX, f);
 if (size > 0) {
   if ('\n' == pid_path[size - 1]) pid_path[size - 1] = '\0';
 }
@@ -131,19 +99,28 @@ bool TraceLog::Init() {
   }
   base::Conf::InitFullyQualifiedDomainName();
   const std::string  = base::Conf::FullyQualifiedDomainName();
-  instance_ = new TraceLog{fqdn, app_name, process_id, global::msg_id,
-   Osaflog::kServerSocketPath};
-  return instance_ != nullptr;
+
+  fqdn_ = base::LogMessage::HostName(fqdn);
+  app_name_ = base::LogMessage::AppName(app_name);
+  proc_id_ = base::LogMessage::ProcId{std::to_string(process_id)};
+  msg_id_ = base::LogMessage::MsgId{msg_id};
+  log_socket_ = new base::UnixClientSocket{Osaflog::kServerSocketPath,
+static_cast(mode)};
+  mutex_ = new base::Mutex{};
+
+  return true;
 }
 
 void TraceLog::Log(base::LogMessage::Severity severity, const char *fmt,
va_list ap) {
-  if (instance_ != nullptr) instance_->LogInternal(severity, fmt, ap);
+  if (log_socket_ != nullptr && mutex_ != nullptr) {
+LogInternal(severity, fmt, ap);
+  }
 }
 
 void TraceLog::LogInternal(base::LogMessage::Severity severity, const char 
*fmt,
va_list ap) {
-  base::Lock lock(mutex_);
+  base::Lock lock(*mutex_);
   uint32_t id = 

Re: [devel] [PATCH 1/1] imm: fix memory leaked in immnd [#2825]

2018-04-11 Thread Hans Nordebäck

ack, review only.

/Thanks HansN


On 04/05/2018 04:39 AM, Vu Minh Nguyen wrote:

The allocated memory is not freed before returning from the function
ImmModel::setCcbErrorString().
---
  src/imm/immnd/ImmModel.cc | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/imm/immnd/ImmModel.cc b/src/imm/immnd/ImmModel.cc
index f7c8fc0..87ded27 100644
--- a/src/imm/immnd/ImmModel.cc
+++ b/src/imm/immnd/ImmModel.cc
@@ -10910,7 +10910,6 @@ SaAisErrorT ImmModel::deleteObject(ObjectMap::iterator& 
oi, SaUint32T reqConn,
  void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString,
   va_list vl) {
int errLen = strlen(errorString) + 1;
-  char* fmtError = (char*)malloc(errLen);
int len;
va_list args;
int isValidationErrString = 0;
@@ -10921,6 +10920,9 @@ void ImmModel::setCcbErrorString(CcbInfo* ccb, const 
char* errorString,
  return;
}
  
+  char* fmtError = (char*)malloc(errLen);

+  osafassert(fmtError);
+
va_copy(args, vl);
len = vsnprintf(fmtError, errLen, errorString, args);
va_end(args);
@@ -10930,7 +10932,8 @@ void ImmModel::setCcbErrorString(CcbInfo* ccb, const 
char* errorString,
if (len > errLen) {
  char* newFmtError = (char*)realloc(fmtError, len);
  if (newFmtError == nullptr) {
-  TRACE_5("realloc error ,No memory ");
+  TRACE_5("realloc error, no memory");
+  free(fmtError);
return;
  } else {
fmtError = newFmtError;



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for imm: fix cppcheck issues in apitest [#2384]

2018-04-11 Thread srinivas
Summary: imm: fix Cppcheck issues in imm/src/apitest [#2384]
Review request for Ticket(s): 2384
Peer Reviewer(s):  Vu Minh, Ravi
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2384
Base revision: aff54ff091727f27830443332b830890668749cf
Personal repository: git://git.code.sf.net/u/sri-mangipudy/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   y
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
fixed cppcheck issues

revision 065a81b536830a9967bc5b65e8d7fcd950edb2ef
Author: srinivas 
Date:   Wed, 11 Apr 2018 16:51:10 +0530

imm: fix Cppcheck issues in imm/src/apitest [#2384]



Complete diffstat:
--
 src/imm/apitest/implementer/applier.c  | 4 +---
 src/imm/apitest/implementer/test_SaImmOiCcb.c  | 6 ++
 src/imm/apitest/implementer/test_saImmOiAugmentCcbInitialize.c | 6 ++
 src/imm/apitest/implementer/test_saImmOiImplementerSet.c   | 3 +--
 src/imm/apitest/implementer/test_saImmOiLongDn.c   | 3 +--
 src/imm/apitest/implementer/test_saImmOiSaStringT.c| 3 +--
 src/imm/apitest/management/test_saImmOmLongDn.c| 6 ++
 src/imm/apitest/management/test_saImmOmSaStringT.c | 3 +--
 src/imm/apitest/management/test_saImmOmThreadInterference.c| 3 +--
 9 files changed, 12 insertions(+), 25 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, 

[devel] [PATCH 1/1] imm: fix Cppcheck issues in imm/src/apitest [#2384]

2018-04-11 Thread srinivas
---
 src/imm/apitest/implementer/applier.c  | 4 +---
 src/imm/apitest/implementer/test_SaImmOiCcb.c  | 6 ++
 src/imm/apitest/implementer/test_saImmOiAugmentCcbInitialize.c | 6 ++
 src/imm/apitest/implementer/test_saImmOiImplementerSet.c   | 3 +--
 src/imm/apitest/implementer/test_saImmOiLongDn.c   | 3 +--
 src/imm/apitest/implementer/test_saImmOiSaStringT.c| 3 +--
 src/imm/apitest/management/test_saImmOmLongDn.c| 6 ++
 src/imm/apitest/management/test_saImmOmSaStringT.c | 3 +--
 src/imm/apitest/management/test_saImmOmThreadInterference.c| 3 +--
 9 files changed, 12 insertions(+), 25 deletions(-)

diff --git a/src/imm/apitest/implementer/applier.c 
b/src/imm/apitest/implementer/applier.c
index a8d5cd2..a16440e 100644
--- a/src/imm/apitest/implementer/applier.c
+++ b/src/imm/apitest/implementer/applier.c
@@ -220,7 +220,6 @@ static void saImmOiAdminOperationCallback(
 
 int main(int argc, char *argv[])
 {
-   int c;
struct option long_options[] = {
{"parameter", required_argument, 0, 'p'},
{"operation-id", required_argument, 0, 'o'},
@@ -249,7 +248,7 @@ int main(int argc, char *argv[])
params[0] = NULL;
 
while (1) {
-   c = getopt_long(argc, argv, "p:o:t:a:h", long_options, NULL);
+   int c = getopt_long(argc, argv, "p:o:t:a:h", long_options, 
NULL);
 
if (c ==
-1) /* have all command-line options have been parsed? */
@@ -386,7 +385,6 @@ int main(int argc, char *argv[])
fprintf(stderr, "saImmOiDispatch returned %s\n",
saf_error(error));
exit(EXIT_FAILURE);
-   break;
}
}
}
diff --git a/src/imm/apitest/implementer/test_SaImmOiCcb.c 
b/src/imm/apitest/implementer/test_SaImmOiCcb.c
index 2ee5034..a768b4d 100644
--- a/src/imm/apitest/implementer/test_SaImmOiCcb.c
+++ b/src/imm/apitest/implementer/test_SaImmOiCcb.c
@@ -185,7 +185,6 @@ done:
 static void *objectImplementerThreadMain(void *arg)
 {
struct pollfd fds[2];
-   int ret;
char buf[384];
const SaImmOiImplementerNameT implementerName = buf;
SaSelectionObjectT selObj;
@@ -210,7 +209,7 @@ static void *objectImplementerThreadMain(void *arg)
/* We can receive five callbacks: create, delete, modify, completed &
 * apply */
while (1) {
-   ret = poll(fds, 2, -1);
+   int ret = poll(fds, 2, -1);
if (ret == -1)
fprintf(stderr, "poll error: %s\n", strerror(errno));
 
@@ -243,7 +242,6 @@ static void *objectImplementerThreadMain(void *arg)
 static void *classImplementerThreadMain(void *arg)
 {
struct pollfd fds[2];
-   int ret;
const SaImmOiImplementerNameT implementerName =
(SaImmOiImplementerNameT) __FUNCTION__;
SaSelectionObjectT selObj;
@@ -264,7 +262,7 @@ static void *classImplementerThreadMain(void *arg)
fds[1].events = POLLIN;
 
while (1) {
-   ret = poll(fds, 2, -1);
+   int ret = poll(fds, 2, -1);
if (ret == -1)
fprintf(stderr, "poll error: %s\n", strerror(errno));
 
diff --git a/src/imm/apitest/implementer/test_saImmOiAugmentCcbInitialize.c 
b/src/imm/apitest/implementer/test_saImmOiAugmentCcbInitialize.c
index 64f0312..c623018 100644
--- a/src/imm/apitest/implementer/test_saImmOiAugmentCcbInitialize.c
+++ b/src/imm/apitest/implementer/test_saImmOiAugmentCcbInitialize.c
@@ -391,7 +391,6 @@ static void *immOiObjectDispatchThread(void *arg)
SaImmOiHandleT handle;
SaSelectionObjectT selObj;
struct pollfd fds[2];
-   int ret;
 
TRACE_ENTER();
 
@@ -417,7 +416,7 @@ static void *immOiObjectDispatchThread(void *arg)
objectDispatchThreadIsSet = 1;
 
while (1) {
-   ret = poll(fds, 2, -1);
+   int ret = poll(fds, 2, -1);
if (ret == -1)
fprintf(stderr, "poll error: %s\n", strerror(errno));
 
@@ -454,7 +453,6 @@ static void *immOiClassDispatchThread(void *arg)
SaImmOiHandleT handle;
SaSelectionObjectT selObj;
struct pollfd fds[2];
-   int ret;
 
TRACE_ENTER();
 
@@ -474,7 +472,7 @@ static void *immOiClassDispatchThread(void *arg)
classDispatchThreadIsSet = 1;
 
while (1) {
-   ret = poll(fds, 2, -1);
+   int ret = poll(fds, 2, -1);
if (ret == -1)
fprintf(stderr, "poll error: %s\n", strerror(errno));
 
diff --git a/src/imm/apitest/implementer/test_saImmOiImplementerSet.c 
b/src/imm/apitest/implementer/test_saImmOiImplementerSet.c
index 549ab9f..11792e1 100644
--- 

[devel] [PATCH 5/5] rded: adapt to new Consensus API [#2795]

2018-04-11 Thread Gary Lee
- add 3 new internal message:

RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK

- subscribe to AMFND service up events to keep track of the number
  of cluster members

- listen for takeover requests in KV store
---
 src/rde/rded/rde_cb.h| 12 ++--
 src/rde/rded/rde_main.cc | 75 ++--
 src/rde/rded/rde_mds.cc  | 39 +
 src/rde/rded/rde_rda.cc  |  2 +-
 src/rde/rded/role.cc | 46 -
 src/rde/rded/role.h  |  2 +-
 6 files changed, 137 insertions(+), 39 deletions(-)

diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h
index fc100849a..f5ad689c3 100644
--- a/src/rde/rded/rde_cb.h
+++ b/src/rde/rded/rde_cb.h
@@ -19,12 +19,13 @@
 #define RDE_RDED_RDE_CB_H_
 
 #include 
+#include 
 #include "base/osaf_utility.h"
 #include "mds/mds_papi.h"
 #include "rde/agent/rda_papi.h"
+#include "rde/common/rde_rda_common.h"
 #include "rde/rded/rde_amf.h"
 #include "rde/rded/rde_rda.h"
-#include "rde/common/rde_rda_common.h"
 
 /*
  **  RDE_CONTROL_BLOCK
@@ -39,7 +40,9 @@ struct RDE_CONTROL_BLOCK {
   bool task_terminate;
   RDE_RDA_CB rde_rda_cb;
   RDE_AMF_CB rde_amf_cb;
-  bool monitor_lock_thread_running;
+  bool monitor_lock_thread_running{false};
+  bool monitor_takeover_req_thread_running{false};
+  std::set cluster_members{};
 };
 
 enum RDE_MSG_TYPE {
@@ -47,7 +50,10 @@ enum RDE_MSG_TYPE {
   RDE_MSG_PEER_DOWN = 2,
   RDE_MSG_PEER_INFO_REQ = 3,
   RDE_MSG_PEER_INFO_RESP = 4,
-  RDE_MSG_NEW_ACTIVE_CALLBACK = 5
+  RDE_MSG_NEW_ACTIVE_CALLBACK = 5,
+  RDE_MSG_NODE_UP = 6,
+  RDE_MSG_NODE_DOWN = 7,
+  RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8
 };
 
 struct rde_peer_info {
diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
index 78e7256c1..b395312d6 100644
--- a/src/rde/rded/rde_main.cc
+++ b/src/rde/rded/rde_main.cc
@@ -17,6 +17,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -28,17 +29,16 @@
 #include 
 #include 
 #include 
-#include "osaf/consensus/consensus.h"
+#include "base/conf.h"
 #include "base/daemon.h"
 #include "base/logtrace.h"
+#include "base/ncs_main_papi.h"
 #include "base/osaf_poll.h"
 #include "mds/mds_papi.h"
-#include "base/ncs_main_papi.h"
 #include "nid/agent/nid_api.h"
-#include 
+#include "osaf/consensus/consensus.h"
 #include "rde/rded/rde_cb.h"
 #include "rde/rded/role.h"
-#include "base/conf.h"
 
 #define RDA_MAX_CLIENTS 32
 
@@ -47,13 +47,15 @@ enum { FD_TERM = 0, FD_AMF = 1, FD_MBX, FD_RDA_SERVER, 
FD_CLIENT_START };
 static void SendPeerInfoResp(MDS_DEST mds_dest);
 static void CheckForSplitBrain(const rde_msg *msg);
 
-const char *rde_msg_name[] = {
-"-",
-"RDE_MSG_PEER_UP(1)",
-"RDE_MSG_PEER_DOWN(2)",
-"RDE_MSG_PEER_INFO_REQ(3)",
-"RDE_MSG_PEER_INFO_RESP(4)",
-};
+const char *rde_msg_name[] = {"-",
+  "RDE_MSG_PEER_UP(1)",
+  "RDE_MSG_PEER_DOWN(2)",
+  "RDE_MSG_PEER_INFO_REQ(3)",
+  "RDE_MSG_PEER_INFO_RESP(4)",
+  "RDE_MSG_NEW_ACTIVE_CALLBACK(5)"
+  "RDE_MSG_NODE_UP(6)",
+  "RDE_MSG_NODE_DOWN(7)",
+  "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)"};
 
 static RDE_CONTROL_BLOCK _rde_cb;
 static RDE_CONTROL_BLOCK *rde_cb = &_rde_cb;
@@ -127,14 +129,16 @@ static void handle_mbx_event() {
   LOG_NO("New active controller notification from consensus service");
 
   if (role->role() == PCS_RDA_ACTIVE) {
-if (my_node.compare(active_controller) != 0) {
+if (my_node.compare(active_controller) != 0 &&
+active_controller.empty() == false) {
   // we are meant to be active, but consensus service doesn't think so
   LOG_WA("Role does not match consensus service. New controller: %s",
-active_controller.c_str());
+ active_controller.c_str());
   if (consensus_service.IsRemoteFencingEnabled() == false) {
 LOG_ER("Probable split-brain. Rebooting this node");
+
 opensaf_reboot(0, nullptr,
-  "Split-brain detected by consensus service");
+   "Split-brain detected by consensus service");
   }
 }
 
@@ -144,6 +148,44 @@ static void handle_mbx_event() {
   }
   break;
 }
+case RDE_MSG_NODE_UP:
+  rde_cb->cluster_members.insert(msg->fr_node_id);
+  TRACE("cluster_size %zu", rde_cb->cluster_members.size());
+  break;
+case RDE_MSG_NODE_DOWN:
+  rde_cb->cluster_members.erase(msg->fr_node_id);
+  TRACE("cluster_size %zu", rde_cb->cluster_members.size());
+  break;
+case RDE_MSG_TAKEOVER_REQUEST_CALLBACK: {
+  rde_cb->monitor_takeover_req_thread_running = false;
+
+  if (role->role() == PCS_RDA_ACTIVE) {
+LOG_NO("Received takeover request. Our network size is %zu",
+   

[devel] [PATCH 1/5] osaf: extend API to include a create key and an enhanced set key function [#2795]

2018-04-11 Thread Gary Lee
- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin
---
 src/osaf/consensus/plugins/etcd.plugin   |  86 +++-
 src/osaf/consensus/plugins/etcd3.plugin  | 366 +++
 src/osaf/consensus/plugins/sample.plugin |  67 +-
 3 files changed, 501 insertions(+), 18 deletions(-)
 create mode 100644 src/osaf/consensus/plugins/etcd3.plugin

diff --git a/src/osaf/consensus/plugins/etcd.plugin 
b/src/osaf/consensus/plugins/etcd.plugin
index 586059b32..6ed85ac92 100644
--- a/src/osaf/consensus/plugins/etcd.plugin
+++ b/src/osaf/consensus/plugins/etcd.plugin
@@ -29,7 +29,7 @@ readonly etcd_timeout="5s"
 #   0 - success,  is echoed to stdout
 #   non-zero - failure
 get() {
-  readonly key=$1
+  readonly key="$1"
 
   if value=$(etcdctl $etcd_options --timeout $etcd_timeout get 
"$directory$key" 2>&1)
   then
@@ -49,8 +49,8 @@ get() {
 #   0 - success
 #   non-zero - failure
 setkey() {
-  readonly key=$1
-  readonly value=$2
+  readonly key="$1"
+  readonly value="$2"
 
   if etcdctl $etcd_options --timeout $etcd_timeout set "$directory$key" \
 "$value" >/dev/null
@@ -61,6 +61,58 @@ setkey() {
   fi
 }
 
+# create
+#   create  and set to  in key-value store. Fails if the key
+#   already exists
+# params:
+#   $1 - 
+#   $2 - 
+# returns:
+#   0 - success
+#   1 - already exists
+#   2 or above - other failure
+create_key() {
+  readonly key="$1"
+  readonly value="$2"
+
+  if output=$(etcdctl $etcd_options --timeout $etcd_timeout mk 
"$directory$key" \
+"$value" 2>&1)
+  then
+return 0
+  else
+if echo $output | grep "already exists"
+then
+  return 1
+fi
+  fi
+
+  return 2
+}
+
+# set
+#   set  to  in key-value store, if the existing value matches
+#   
+# params:
+#   $1 - 
+#   $2 - 
+#   $3 - 
+# returns:
+#   0 - success
+#   non-zero - failure
+setkey_match_prev() {
+  readonly key="$1"
+  readonly value="$2"
+  readonly prev="$3"
+
+  if etcdctl $etcd_options --timeout $etcd_timeout set "$directory$key" \
+"$value" --swap-with-value "$prev" >/dev/null
+  then
+return 0
+  else
+return 1
+  fi
+}
+
 # erase
 #   erase  in key-value store
 # params:
@@ -69,7 +121,7 @@ setkey() {
 #   0 - success
 #   non-zero - failure
 erase() {
-  readonly key=$1
+  readonly key="$1"
 
   if etcdctl $etcd_options --timeout $etcd_timeout \
 rm "$directory$key" >/dev/null 2>&1
@@ -90,8 +142,8 @@ erase() {
 #   2 or above - other failure
 # NOTE: if lock is already acquired by , then timeout is extended
 lock() {
-  readonly owner=$1
-  readonly timeout=$2
+  readonly owner="$1"
+  readonly timeout="$2"
 
   if etcdctl $etcd_options --timeout $etcd_timeout \
 mk "$directory$keyname" "$owner" \
@@ -145,7 +197,7 @@ lock_owner() {
 #   2 or above - other failure
 #
 unlock() {
-  readonly owner=$1
+  readonly owner="$1"
   readonly forced=${2:-false}
 
   if [ "$forced" = false ]; then
@@ -185,7 +237,7 @@ unlock() {
 #   0 - success,  is echoed to stdout
 #   non-zero - failure
 watch() {
-  readonly key=$1
+  readonly key="$1"
 
   if value=$(etcdctl $etcd_options --timeout $etcd_timeout \
 watch "$directory$key" 2>&1)
@@ -216,6 +268,22 @@ case "$1" in
 setkey "$2" "$3"
 exit $?
 ;;
+  set_if_prev)
+if [ "$#" -ne 4 ]; then
+  echo "Usage: $0 set   "
+  exit 1
+fi
+setkey_match_prev "$2" "$3" "$4"
+exit $?
+;;
+  create)
+if [ "$#" -ne 3 ]; then
+  echo "Usage: $0 create  "
+  exit 1
+fi
+create_key "$2" "$3"
+exit $?
+;;
   erase)
 if [ "$#" -ne 2 ]; then
   echo "Usage: $0 erase "
@@ -269,7 +337,7 @@ case "$1" in
 exit $?
 ;;
   *)
-echo "Usage: $0 {get|set|erase|lock|unlock|lock_owner|watch|watch_lock}"
+echo "Usage: $0 
{get|set|create|set_if_prev|erase|lock|unlock|lock_owner|watch|watch_lock}"
 ;;
 esac
 
diff --git a/src/osaf/consensus/plugins/etcd3.plugin 
b/src/osaf/consensus/plugins/etcd3.plugin
new file mode 100644
index 0..451440567
--- /dev/null
+++ b/src/osaf/consensus/plugins/etcd3.plugin
@@ -0,0 +1,366 @@
+#!/usr/bin/env bash
+#  -*- OpenSAF  -*-
+#
+# (C) Copyright 2018 Ericsson AB 2018 - All Rights Reserved.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+# or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
+# under the GNU Lesser General Public License Version 2.1, February 1999.
+# The complete license can be accessed from the following location:
+# http://opensource.org/licenses/lgpl-license.php
+# See the Copying file included with the OpenSAF distribution for full
+# licensing terms.
+#
+# Please note: this API is subject to change and may be modified
+# in a future version of OpenSAF. Future API versions may not be
+# backward compatible. This plugin may need to be adapted.
+
+readonly 

[devel] [PATCH 0/5] Review Request for split-brain: select active SC from largest network partition V3 [#2795]

2018-04-11 Thread Gary Lee
Summary: split-brain: select active SC from largest network partition V3 
[#2795] 
Review request for Ticket(s): 2795
Peer Reviewer(s): Anders, Ravi, Hans 
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2795
Base revision: 1c302a300e449e8a8527671fbd6c7f4e2b41e95d
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy 
 Core libraries  y 
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

*** Changes from V2: ***

fmd: made cluster_size atomic
fmd: wait 3 seconds before promoting to active, to allow topology events to be 
processed first
osaf: add check for existing takeover request, before trying to lock
etcdv3 plugin: reliablity improvements


revision c7bc78656d5de11f6147727bd8612274fb6e438f
Author: Gary Lee 
Date:   Wed, 11 Apr 2018 17:16:46 +1000

rded: adapt to new Consensus API [#2795]

- add 3 new internal message:

RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK

- subscribe to AMFND service up events to keep track of the number
  of cluster members

- listen for takeover requests in KV store



revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393
Author: Gary Lee 
Date:   Wed, 11 Apr 2018 17:16:18 +1000

fmd: adapt to new Consensus API [#2795]



revision 812a315af21df06b2f9fdcc3d8fd5b7bbad3e550
Author: Gary Lee 
Date:   Wed, 11 Apr 2018 17:15:41 +1000

amfd: adapt to new Consensus API [#2795]



revision b8a37c1b8965826e5faffbfebc44a84bdb6433a1
Author: Gary Lee 
Date:   Wed, 11 Apr 2018 17:14:39 +1000

osaf: add lock takeover request fuction [#2795]

- add create and set (if previous value matches) functions to KeyValue class
- add Consensus::MonitorTakeoverRequest() function for use by RDE to answer 
takeover requests
- add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, 
it will
  create a takeover request in the KV store. An existing SC can reject the lock 
takeover



revision 955be872ba5887b1b521eac9f7732dd3f6afc593
Author: Gary Lee 
Date:   Wed, 11 Apr 2018 17:13:45 +1000

osaf: extend API to include a create key and an enhanced set key function 
[#2795]

- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin



Added Files:

 src/osaf/consensus/plugins/etcd3.plugin


Complete diffstat:
--
 src/amf/amfd/role.cc |   2 +-
 src/fm/fmd/fm_cb.h   |   2 +-
 src/fm/fmd/fm_main.cc|  26 +-
 src/fm/fmd/fm_mds.cc |   2 +
 src/fm/fmd/fm_rda.cc |  27 +-
 src/osaf/consensus/consensus.cc  | 435 ++-
 src/osaf/consensus/consensus.h   |  55 +++-
 src/osaf/consensus/key_value.cc  | 105 +---
 src/osaf/consensus/key_value.h   |  19 +-
 src/osaf/consensus/plugins/etcd.plugin   |  86 +-
 src/osaf/consensus/plugins/etcd3.plugin  | 366 ++
 src/osaf/consensus/plugins/sample.plugin |  67 -
 src/rde/rded/rde_cb.h|  12 +-
 src/rde/rded/rde_main.cc |  75 --
 src/rde/rded/rde_mds.cc  |  39 ++-
 src/rde/rded/rde_rda.cc  |   2 +-
 src/rde/rded/role.cc |  46 +++-
 src/rde/rded/role.h  |   2 +-
 18 files changed, 1180 insertions(+), 188 deletions(-)


Testing Commands:
-
1) SI swap of safSi=SC-2N,safApp=OpenSAF
2) Isolate standby cluster (eg. use iptables to block port 6700 on a TCP system)
3) Isolate active cluster

Testing, Expected Results:
--
1) No error
2) Standby will fail to be promoted as active as the takeover request is 
rejected
3) Standby will be promoted

Conditions of Submission:
-
Ack from any reviewer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed 

[devel] [PATCH 3/5] amfd: adapt to new Consensus API [#2795]

2018-04-11 Thread Gary Lee
---
 src/amf/amfd/role.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amf/amfd/role.cc b/src/amf/amfd/role.cc
index c8aa9cf1f..790983ee7 100644
--- a/src/amf/amfd/role.cc
+++ b/src/amf/amfd/role.cc
@@ -1217,7 +1217,7 @@ uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb) {
   osaf_mutex_unlock_ordie(_reinit_mutex);
 
   Consensus consensus_service;
-  rc = consensus_service.PromoteThisNode();
+  rc = consensus_service.PromoteThisNode(false, 0);
   if (rc != SA_AIS_OK) {
 LOG_ER("Unable to set active controller in consensus service");
 osafassert(false);
-- 
2.14.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 4/5] fmd: adapt to new Consensus API [#2795]

2018-04-11 Thread Gary Lee
---
 src/fm/fmd/fm_cb.h|  2 +-
 src/fm/fmd/fm_main.cc | 26 +-
 src/fm/fmd/fm_mds.cc  |  2 ++
 src/fm/fmd/fm_rda.cc  | 27 ++-
 4 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/src/fm/fmd/fm_cb.h b/src/fm/fmd/fm_cb.h
index cfa50d883..010ab735a 100644
--- a/src/fm/fmd/fm_cb.h
+++ b/src/fm/fmd/fm_cb.h
@@ -100,7 +100,7 @@ struct FM_CB {
 
   std::atomic peer_sc_up{false};
   bool well_connected{false};
-  uint64_t cluster_size{};
+  std::atomic cluster_size{};
   struct timespec last_well_connected{};
   struct timespec node_isolation_timeout{};
   SaClmHandleT clm_hdl{};
diff --git a/src/fm/fmd/fm_main.cc b/src/fm/fmd/fm_main.cc
index 73c9b9ccd..3371ec5e8 100644
--- a/src/fm/fmd/fm_main.cc
+++ b/src/fm/fmd/fm_main.cc
@@ -551,21 +551,12 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
* trigerred quicker than the node_down event
* has been received.
*/
-  if (fm_cb->role == PCS_RDA_STANDBY) {
-const std::string current_active =
-consensus_service.CurrentActive();
-if (current_active.compare(osaf_extended_name_borrow(
-_cb->peer_clm_node_name)) == 0) {
-  // update consensus service, before fencing old active controller
-  consensus_service.DemoteCurrentActive();
-}
-  }
 
   if (fm_cb->use_remote_fencing) {
 if (fm_cb->peer_node_terminated == false) {
   // if peer_sc_up is true then
   // the node has come up already
-  if (fm_cb->peer_sc_up == false && fm_cb->immnd_down == true) {
+  if (consensus_service.IsEnabled() == false) {
 opensaf_reboot(fm_cb->peer_node_id,
(char *)fm_cb->peer_clm_node_name.value,
"Received Node Down for peer controller");
@@ -580,8 +571,7 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
 fm_cb->mutex_.Lock();
 peer_node_name = fm_cb->peer_node_name;
 fm_cb->mutex_.Unlock();
-opensaf_reboot(fm_cb->peer_node_id,
-   peer_node_name.c_str(),
+opensaf_reboot(fm_cb->peer_node_id, peer_node_name.c_str(),
"Received Node Down for peer controller");
   }
   if (!((fm_cb->role == PCS_RDA_ACTIVE) &&
@@ -632,12 +622,6 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
 }
 
 Consensus consensus_service;
-const std::string current_active = consensus_service.CurrentActive();
-if (current_active.compare(
-osaf_extended_name_borrow(_cb->peer_clm_node_name)) == 0) {
-  // update consensus service, before fencing old active controller
-  consensus_service.DemoteCurrentActive();
-}
 
 /* Now. Try resetting other blade */
 fm_cb->role = PCS_RDA_ACTIVE;
@@ -645,7 +629,8 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
 LOG_NO("Reseting peer controller node id: %x",
unsigned(fm_cb->peer_node_id));
 if (fm_cb->use_remote_fencing) {
-  if (fm_cb->peer_node_terminated == false) {
+  if (fm_cb->peer_node_terminated == false &&
+  consensus_service.IsEnabled() == false) {
 opensaf_reboot(fm_cb->peer_node_id,
(char *)fm_cb->peer_clm_node_name.value,
"Received Node Down for peer controller");
@@ -658,8 +643,7 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
   fm_cb->mutex_.Lock();
   peer_node_name = fm_cb->peer_node_name;
   fm_cb->mutex_.Unlock();
-  opensaf_reboot(fm_cb->peer_node_id,
- peer_node_name.c_str(),
+  opensaf_reboot(fm_cb->peer_node_id, peer_node_name.c_str(),
  "Received Node Down for Active peer");
 }
 fm_rda_set_role(fm_cb, PCS_RDA_ACTIVE);
diff --git a/src/fm/fmd/fm_mds.cc b/src/fm/fmd/fm_mds.cc
index 277a357d2..60db5dab1 100644
--- a/src/fm/fmd/fm_mds.cc
+++ b/src/fm/fmd/fm_mds.cc
@@ -373,6 +373,7 @@ static uint32_t fm_mds_node_evt(FM_CB *cb,
 case NCSMDS_NODE_DOWN:
   if (cb->cluster_size != 0) {
 --cb->cluster_size;
+TRACE("cluster_size %llu", (unsigned long long)cb->cluster_size);
 TRACE("Node down event for node id %x, cluster size is now: %llu",
   node_evt->node_id, (unsigned long long)cb->cluster_size);
 check_for_node_isolation(cb);
@@ -397,6 +398,7 @@ static uint32_t fm_mds_node_evt(FM_CB *cb,
 
 case NCSMDS_NODE_UP:
   ++cb->cluster_size;
+  TRACE("cluster_size %llu", (unsigned long long)cb->cluster_size);
   TRACE("Node up event for node id %x, cluster size is now: %llu",
 node_evt->node_id, (unsigned 

[devel] [PATCH 2/5] osaf: add lock takeover request fuction [#2795]

2018-04-11 Thread Gary Lee
- add create and set (if previous value matches) functions to KeyValue class
- add Consensus::MonitorTakeoverRequest() function for use by RDE to answer 
takeover requests
- add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, 
it will
  create a takeover request in the KV store. An existing SC can reject the lock 
takeover
---
 src/osaf/consensus/consensus.cc | 435 ++--
 src/osaf/consensus/consensus.h  |  55 -
 src/osaf/consensus/key_value.cc | 105 +++---
 src/osaf/consensus/key_value.h  |  19 +-
 4 files changed, 511 insertions(+), 103 deletions(-)

diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc
index cc04f3518..f2245dd01 100644
--- a/src/osaf/consensus/consensus.cc
+++ b/src/osaf/consensus/consensus.cc
@@ -15,13 +15,17 @@
 #include "osaf/consensus/consensus.h"
 #include 
 #include 
+#include 
 #include 
 #include "base/conf.h"
 #include "base/getenv.h"
 #include "base/logtrace.h"
 #include "base/ncssysf_def.h"
 
-SaAisErrorT Consensus::PromoteThisNode() {
+const std::string Consensus::kTakeoverRequestKeyname = "takeover_request";
+
+SaAisErrorT Consensus::PromoteThisNode(const bool graceful_takeover,
+const uint64_t cluster_size) {
   TRACE_ENTER();
   SaAisErrorT rc;
 
@@ -29,6 +33,10 @@ SaAisErrorT Consensus::PromoteThisNode() {
 return SA_AIS_OK;
   }
 
+  // check if there is an existing takeover requests, we cannot
+  // attempt to lock until that is complete
+  CheckForExistingTakeoverRequest();
+
   uint32_t retries = 0;
   rc = KeyValue::Lock(base::Conf::NodeName(), kLockTimeout);
   while (rc == SA_AIS_ERR_TRY_AGAIN && retries < kMaxRetry) {
@@ -39,33 +47,30 @@ SaAisErrorT Consensus::PromoteThisNode() {
   }
 
   if (rc == SA_AIS_ERR_EXIST) {
+// there's a chance the lock has been released since the lock attempt
 // get the current active controller
-std::string current_active("");
-retries = 0;
-rc = KeyValue::LockOwner(current_active);
-while (rc != SA_AIS_OK && retries < kMaxRetry) {
-  ++retries;
-  std::this_thread::sleep_for(kSleepInterval);
-  rc = KeyValue::LockOwner(current_active);
-}
-if (rc != SA_AIS_OK) {
-  LOG_ER("Failed to get current lock owner. Will attempt to lock anyway");
+std::string current_active = CurrentActive();
+
+if (current_active.empty() == true) {
+  LOG_WA("Failed to get current lock owner. Will attempt to lock anyway");
 }
 
+bool take_over_request_created = false;
 LOG_NO("Current active controller is %s", current_active.c_str());
 
-// there's a chance the lock has been released since the lock attempt
 if (current_active.empty() == false) {
-  // remove current active controller's lock and fence it
-  retries = 0;
-  rc = KeyValue::Unlock(current_active);
-  while (rc == SA_AIS_ERR_TRY_AGAIN && retries < kMaxRetry) {
-LOG_IN("Trying to unlock");
-++retries;
-std::this_thread::sleep_for(kSleepInterval);
-rc = KeyValue::Unlock(current_active);
+  if (graceful_takeover == true) {
+rc = CreateTakeoverRequest(current_active, base::Conf::NodeName(),
+   cluster_size);
+if (rc != SA_AIS_OK) {
+  LOG_WA("Takeover request failed (%d)", rc);
+  return rc;
+}
+take_over_request_created = true;
   }
 
+  // remove current active controller's lock and fence it
+  rc = Demote(current_active);
   if (rc == SA_AIS_OK) {
 FenceNode(current_active);
   } else {
@@ -82,6 +87,23 @@ SaAisErrorT Consensus::PromoteThisNode() {
   std::this_thread::sleep_for(kSleepInterval);
   rc = KeyValue::Lock(base::Conf::NodeName(), kLockTimeout);
 }
+
+if (take_over_request_created == true) {
+  SaAisErrorT rc1;
+
+  // remove takeover request
+  rc1 = KeyValue::Erase(kTakeoverRequestKeyname);
+  retries = 0;
+  while (rc1 != SA_AIS_OK && retries < kMaxRetry) {
+++retries;
+std::this_thread::sleep_for(kSleepInterval);
+rc1 = KeyValue::Erase(kTakeoverRequestKeyname);
+  }
+
+  if (rc1 != SA_AIS_OK) {
+LOG_WA("Could not remove takeover request");
+  }
+}
   }
 
   if (rc == SA_AIS_OK) {
@@ -93,43 +115,23 @@ SaAisErrorT Consensus::PromoteThisNode() {
   return rc;
 }
 
-SaAisErrorT Consensus::Demote(const std::string& node = "") {
+SaAisErrorT Consensus::Demote(const std::string& node) {
   TRACE_ENTER();
   if (use_consensus_ == false) {
 return SA_AIS_OK;
   }
 
-  SaAisErrorT rc = SA_AIS_ERR_FAILED_OPERATION;
-  uint32_t retries = 0;
-
-  // check current active node
-  std::string current_active;
-  rc = KeyValue::LockOwner(current_active);
-  while (rc != SA_AIS_OK && retries < kMaxRetry) {
-++retries;
-std::this_thread::sleep_for(kSleepInterval);
-rc = KeyValue::LockOwner(current_active);
-  }
-
-  if (rc != SA_AIS_OK) {
-LOG_ER("Failed to get