[devel] [PATCH 0/1] Review Request for rded: run controller promotion code in new thread V2 [#2857]

2018-05-23 Thread Gary Lee
Summary: rded: run controller promotion code in new thread V2 [#2857]
Review request for Ticket(s): 2857
Peer Reviewer(s): Hans, Ravi 
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2857
Base revision: e0bcf786e0b3417d31b767073bb789ef150eb2ad
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy 
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
V2 - the only change is addition of a single line:

+  election_end_time_ = base::kTimespecMax;

This is to ensure we don't launch another thread,
before SetRole.

revision 047b0545824d8a6118a98b933ea8ed89ebab4a3a
Author: Gary Lee 
Date:   Thu, 24 May 2018 14:12:46 +1000

rded: run controller promotion code in new thread [#2857]

Currently, the consensus code relating to node promotion
is run from the main thread. We can improve rded's
responsiveness by moving this code into another thread.



Complete diffstat:
--
 src/rde/rded/rde_cb.h|  3 +-
 src/rde/rded/rde_main.cc |  6 +++-
 src/rde/rded/role.cc | 83 +++-
 src/rde/rded/role.h  |  2 ++
 4 files changed, 62 insertions(+), 32 deletions(-)


Testing Commands:
-
Legacy tests

Testing, Expected Results:
--
Pass

Conditions of Submission:
-
Ack from any reviewer, or in 7 days


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] rded: run controller promotion code in new thread [#2857]

2018-05-23 Thread Gary Lee
Currently, the consensus code relating to node promotion
is run from the main thread. We can improve rded's
responsiveness by moving this code into another thread.
---
 src/rde/rded/rde_cb.h|  3 +-
 src/rde/rded/rde_main.cc |  6 +++-
 src/rde/rded/role.cc | 83 +++-
 src/rde/rded/role.h  |  2 ++
 4 files changed, 62 insertions(+), 32 deletions(-)

diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h
index f5ad689c3..877687341 100644
--- a/src/rde/rded/rde_cb.h
+++ b/src/rde/rded/rde_cb.h
@@ -53,7 +53,8 @@ enum RDE_MSG_TYPE {
   RDE_MSG_NEW_ACTIVE_CALLBACK = 5,
   RDE_MSG_NODE_UP = 6,
   RDE_MSG_NODE_DOWN = 7,
-  RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8
+  RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8,
+  RDE_MSG_ACTIVE_PROMOTION_SUCCESS = 9
 };
 
 struct rde_peer_info {
diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
index c5b4b8283..c59aa4536 100644
--- a/src/rde/rded/rde_main.cc
+++ b/src/rde/rded/rde_main.cc
@@ -55,7 +55,8 @@ const char *rde_msg_name[] = {"-",
   "RDE_MSG_NEW_ACTIVE_CALLBACK(5)"
   "RDE_MSG_NODE_UP(6)",
   "RDE_MSG_NODE_DOWN(7)",
-  "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)"};
+  "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)",
+  "RDE_MSG_ACTIVE_PROMOTION_SUCCESS(9)"};
 
 static RDE_CONTROL_BLOCK _rde_cb;
 static RDE_CONTROL_BLOCK *rde_cb = &_rde_cb;
@@ -186,6 +187,9 @@ static void handle_mbx_event() {
 LOG_WA("Received takeover request when not active");
   }
 } break;
+case RDE_MSG_ACTIVE_PROMOTION_SUCCESS:
+  role->NodePromoted();
+  break;
 default:
   LOG_ER("%s: discarding unknown message type %u", __FUNCTION__, 
msg->type);
   break;
diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index 1b5a6ae89..a03372413 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -22,6 +22,7 @@
 #include "rde/rded/role.h"
 #include 
 #include 
+#include 
 #include "base/getenv.h"
 #include "base/logtrace.h"
 #include "base/ncs_main_papi.h"
@@ -63,6 +64,55 @@ void Role::MonitorCallback(const std::string& key, const 
std::string& new_value,
   osafassert(status == NCSCC_RC_SUCCESS);
 }
 
+void Role::PromoteNode(const uint64_t cluster_size) {
+  TRACE_ENTER();
+  SaAisErrorT rc;
+
+  Consensus consensus_service;
+
+  rc = consensus_service.PromoteThisNode(true, cluster_size);
+  if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) {
+LOG_ER("Unable to set active controller in consensus service");
+opensaf_reboot(0, nullptr,
+   "Unable to set active controller in consensus service");
+  }
+
+  if (rc == SA_AIS_ERR_EXIST) {
+LOG_WA("Another controller is already active");
+return;
+  }
+
+  RDE_CONTROL_BLOCK* cb = rde_get_control_block();
+
+  // send msg to main thread
+  rde_msg* msg = static_cast(malloc(sizeof(rde_msg)));
+  msg->type = RDE_MSG_ACTIVE_PROMOTION_SUCCESS;
+  uint32_t status;
+  status = m_NCS_IPC_SEND(>mbx, msg, NCS_IPC_PRIORITY_HIGH);
+  osafassert(status == NCSCC_RC_SUCCESS);
+}
+
+void Role::NodePromoted() {
+  ExecutePreActiveScript();
+  LOG_NO("Switched to ACTIVE from %s", to_string(role()));
+  role_ = PCS_RDA_ACTIVE;
+  rde_rda_send_role(role_);
+
+  Consensus consensus_service;
+  RDE_CONTROL_BLOCK* cb = rde_get_control_block();
+
+  // register for callback if active controller is changed
+  // in consensus service
+  if (cb->monitor_lock_thread_running == false) {
+cb->monitor_lock_thread_running = true;
+consensus_service.MonitorLock(MonitorCallback, cb->mbx);
+  }
+  if (cb->monitor_takeover_req_thread_running == false) {
+cb->monitor_takeover_req_thread_running = true;
+consensus_service.MonitorTakeoverRequest(MonitorCallback, cb->mbx);
+  }
+}
+
 Role::Role(NODE_ID own_node_id)
 : known_nodes_{},
   role_{PCS_RDA_QUIESCED},
@@ -82,37 +132,10 @@ timespec* Role::Poll(timespec* ts) {
   *ts = election_end_time_ - now;
   timeout = ts;
 } else {
+  election_end_time_ = base::kTimespecMax;
   RDE_CONTROL_BLOCK* cb = rde_get_control_block();
-  SaAisErrorT rc;
-  Consensus consensus_service;
-
-  rc = consensus_service.PromoteThisNode(true, cb->cluster_members.size());
-  if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) {
-LOG_ER("Unable to set active controller in consensus service");
-opensaf_reboot(0, nullptr,
-   "Unable to set active controller in consensus service");
-  }
-
-  if (rc == SA_AIS_ERR_EXIST) {
-LOG_WA("Another controller is already active");
-return timeout;
-  }
-
-  ExecutePreActiveScript();
-  LOG_NO("Switched to ACTIVE from %s", to_string(role()));
-  role_ = PCS_RDA_ACTIVE;
-  rde_rda_send_role(role_);
-
-  // register for callback if active controller is changed
-  // in consensus service
-  

Re: [devel] [PATCH 1/1] base: Destructor of TraceLog causes coredump V2 [#2860]

2018-05-23 Thread Hans Nordebäck
Hi Minh,


yes you are right about the possibility for a segv, but using a std::shared_ptr 
instead of the naked ptr may be an option ?


/Thanks Hans


Från: Minh Hon Chau 
Skickat: den 24 maj 2018 02:34:13
Till: Hans Nordebäck; Anders Widell; Gary Lee
Kopia: opensaf-devel@lists.sourceforge.net
Ämne: Re: [PATCH 1/1] base: Destructor of TraceLog causes coredump V2 [#2860]

Hi Hans,

It is good to give an option to Mutex class not to abort. We can avoid
the abort in mutex_unlock (as reported in coredump), but I feel the
issue is still there.

We may hit a problem (segv?) with "mutex_->good()" since the other
thread is wiping out the mutex_ in destructor, it is a matter of timing
to happen I guess.

As we don't have (and don't want to have) any protection between two
threads for the TraceLog, so the good one (I hope) is making one of
those threads not to touch the TraceLog.

If you don't like to remove the destructor, another way is locating the
gl_trace/gl_log to the HEAP?

Thanks,

Minh



On 23/05/18 20:50, Hans Nordeback wrote:
> Change Mutex class to make it possible for caller to decide if abort
> ---
>   src/base/logtrace_client.cc |  5 -
>   src/base/mutex.cc   |  2 +-
>   src/base/mutex.h| 22 +-
>   3 files changed, 18 insertions(+), 11 deletions(-)
>
> diff --git a/src/base/logtrace_client.cc b/src/base/logtrace_client.cc
> index 0dac6d389..f597c1ae3 100644
> --- a/src/base/logtrace_client.cc
> +++ b/src/base/logtrace_client.cc
> @@ -76,7 +76,7 @@ bool TraceLog::Init(const char *msg_id, WriteMode mode) {
> msg_id_ = base::LogMessage::MsgId{msg_id};
> log_socket_ = new base::UnixClientSocket{Osaflog::kServerSocketPath,
>   static_cast(mode)};
> -  mutex_ = new base::Mutex{};
> +  mutex_ = new base::Mutex{false};
>
> return true;
>   }
> @@ -91,6 +91,9 @@ void TraceLog::Log(base::LogMessage::Severity severity, 
> const char *fmt,
>   void TraceLog::LogInternal(base::LogMessage::Severity severity, const char 
> *fmt,
>  va_list ap) {
> base::Lock lock(*mutex_);
> +
> +  if (!mutex_->good()) return;
> +
> uint32_t id = sequence_id_;
> sequence_id_ = id < kMaxSequenceId ? id + 1 : 1;
> buffer_.clear();
> diff --git a/src/base/mutex.cc b/src/base/mutex.cc
> index 5fa6ac55a..1627ac20b 100644
> --- a/src/base/mutex.cc
> +++ b/src/base/mutex.cc
> @@ -20,7 +20,7 @@
>
>   namespace base {
>
> -Mutex::Mutex() : mutex_{} {
> +Mutex::Mutex(bool abort) : abort_{abort}, mutex_{}, result_{0} {
> pthread_mutexattr_t attr;
> int result = pthread_mutexattr_init();
> if (result != 0) osaf_abort(result);
> diff --git a/src/base/mutex.h b/src/base/mutex.h
> index 7b3cee187..e3c54a711 100644
> --- a/src/base/mutex.h
> +++ b/src/base/mutex.h
> @@ -31,30 +31,34 @@ namespace base {
>   class Mutex {
>public:
> using NativeHandleType = pthread_mutex_t*;
> -  Mutex();
> +  Mutex(bool abort = true);
> ~Mutex();
> void Lock() {
> -int result = pthread_mutex_lock(_);
> -if (result != 0) osaf_abort(result);
> +result_ = pthread_mutex_lock(_);
> +if (abort_ && result_ != 0) osaf_abort(result_);
> }
> bool TryLock() {
> -int result = pthread_mutex_trylock(_);
> -if (result == 0) {
> +result_ = pthread_mutex_trylock(_);
> +if (result_ == 0) {
> return true;
> -} else if (result == EBUSY) {
> +} else if (result_ == EBUSY) {
> return false;
>   } else {
> -  osaf_abort(result);
> +  if (abort_) osaf_abort(result_);
> +  return false;
>   }
> }
> void Unlock() {
> -int result = pthread_mutex_unlock(_);
> -if (result != 0) osaf_abort(result);
> +result_ = pthread_mutex_unlock(_);
> +if (abort_ && result_ != 0) osaf_abort(result_);
> }
> NativeHandleType native_handle() { return _; }
>
> +  bool good() const {return result_ == 0;};
>private:
> +  bool abort_;
> pthread_mutex_t mutex_;
> +  int result_;
> DELETE_COPY_AND_MOVE_OPERATORS(Mutex);
>   };
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] base: Destructor of TraceLog causes coredump V2 [#2860]

2018-05-23 Thread Minh Hon Chau

Hi Hans,

It is good to give an option to Mutex class not to abort. We can avoid 
the abort in mutex_unlock (as reported in coredump), but I feel the 
issue is still there.


We may hit a problem (segv?) with "mutex_->good()" since the other 
thread is wiping out the mutex_ in destructor, it is a matter of timing 
to happen I guess.


As we don't have (and don't want to have) any protection between two 
threads for the TraceLog, so the good one (I hope) is making one of 
those threads not to touch the TraceLog.


If you don't like to remove the destructor, another way is locating the 
gl_trace/gl_log to the HEAP?


Thanks,

Minh



On 23/05/18 20:50, Hans Nordeback wrote:

Change Mutex class to make it possible for caller to decide if abort
---
  src/base/logtrace_client.cc |  5 -
  src/base/mutex.cc   |  2 +-
  src/base/mutex.h| 22 +-
  3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/src/base/logtrace_client.cc b/src/base/logtrace_client.cc
index 0dac6d389..f597c1ae3 100644
--- a/src/base/logtrace_client.cc
+++ b/src/base/logtrace_client.cc
@@ -76,7 +76,7 @@ bool TraceLog::Init(const char *msg_id, WriteMode mode) {
msg_id_ = base::LogMessage::MsgId{msg_id};
log_socket_ = new base::UnixClientSocket{Osaflog::kServerSocketPath,
  static_cast(mode)};
-  mutex_ = new base::Mutex{};
+  mutex_ = new base::Mutex{false};
  
return true;

  }
@@ -91,6 +91,9 @@ void TraceLog::Log(base::LogMessage::Severity severity, const 
char *fmt,
  void TraceLog::LogInternal(base::LogMessage::Severity severity, const char 
*fmt,
 va_list ap) {
base::Lock lock(*mutex_);
+
+  if (!mutex_->good()) return;
+
uint32_t id = sequence_id_;
sequence_id_ = id < kMaxSequenceId ? id + 1 : 1;
buffer_.clear();
diff --git a/src/base/mutex.cc b/src/base/mutex.cc
index 5fa6ac55a..1627ac20b 100644
--- a/src/base/mutex.cc
+++ b/src/base/mutex.cc
@@ -20,7 +20,7 @@
  
  namespace base {
  
-Mutex::Mutex() : mutex_{} {

+Mutex::Mutex(bool abort) : abort_{abort}, mutex_{}, result_{0} {
pthread_mutexattr_t attr;
int result = pthread_mutexattr_init();
if (result != 0) osaf_abort(result);
diff --git a/src/base/mutex.h b/src/base/mutex.h
index 7b3cee187..e3c54a711 100644
--- a/src/base/mutex.h
+++ b/src/base/mutex.h
@@ -31,30 +31,34 @@ namespace base {
  class Mutex {
   public:
using NativeHandleType = pthread_mutex_t*;
-  Mutex();
+  Mutex(bool abort = true);
~Mutex();
void Lock() {
-int result = pthread_mutex_lock(_);
-if (result != 0) osaf_abort(result);
+result_ = pthread_mutex_lock(_);
+if (abort_ && result_ != 0) osaf_abort(result_);
}
bool TryLock() {
-int result = pthread_mutex_trylock(_);
-if (result == 0) {
+result_ = pthread_mutex_trylock(_);
+if (result_ == 0) {
return true;
-} else if (result == EBUSY) {
+} else if (result_ == EBUSY) {
return false;
  } else {
-  osaf_abort(result);
+  if (abort_) osaf_abort(result_);
+  return false;
  }
}
void Unlock() {
-int result = pthread_mutex_unlock(_);
-if (result != 0) osaf_abort(result);
+result_ = pthread_mutex_unlock(_);
+if (abort_ && result_ != 0) osaf_abort(result_);
}
NativeHandleType native_handle() { return _; }
  
+  bool good() const {return result_ == 0;};

   private:
+  bool abort_;
pthread_mutex_t mutex_;
+  int result_;
DELETE_COPY_AND_MOVE_OPERATORS(Mutex);
  };
  



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for base: Destructor of TraceLog causes coredump V2 [#2860]

2018-05-23 Thread Hans Nordeback
Summary: base: Destructor of TraceLog causes coredump V2 [#2860]
Review request for Ticket(s): 2860
Peer Reviewer(s): Minh, Gary, AndersW
Pull request to:
Affected branch(es): develop
Development branch: ticket-2860
Base revision: e0bcf786e0b3417d31b767073bb789ef150eb2ad
Personal repository: git://git.code.sf.net/u/hansnordeback/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision c8e1aced8519c4f77819b1dbc92bdedfcf4734ea
Author: Hans Nordeback 
Date:   Wed, 23 May 2018 12:36:40 +0200

base: Destructor of TraceLog causes coredump V2 [#2860]

Change Mutex class to make it possible for caller to decide if abort



Complete diffstat:
--
 src/base/logtrace_client.cc |  5 -
 src/base/mutex.cc   |  2 +-
 src/base/mutex.h| 22 +-
 3 files changed, 18 insertions(+), 11 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] base: Destructor of TraceLog causes coredump V2 [#2860]

2018-05-23 Thread Hans Nordeback
Change Mutex class to make it possible for caller to decide if abort
---
 src/base/logtrace_client.cc |  5 -
 src/base/mutex.cc   |  2 +-
 src/base/mutex.h| 22 +-
 3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/src/base/logtrace_client.cc b/src/base/logtrace_client.cc
index 0dac6d389..f597c1ae3 100644
--- a/src/base/logtrace_client.cc
+++ b/src/base/logtrace_client.cc
@@ -76,7 +76,7 @@ bool TraceLog::Init(const char *msg_id, WriteMode mode) {
   msg_id_ = base::LogMessage::MsgId{msg_id};
   log_socket_ = new base::UnixClientSocket{Osaflog::kServerSocketPath,
 static_cast(mode)};
-  mutex_ = new base::Mutex{};
+  mutex_ = new base::Mutex{false};
 
   return true;
 }
@@ -91,6 +91,9 @@ void TraceLog::Log(base::LogMessage::Severity severity, const 
char *fmt,
 void TraceLog::LogInternal(base::LogMessage::Severity severity, const char 
*fmt,
va_list ap) {
   base::Lock lock(*mutex_);
+
+  if (!mutex_->good()) return;
+
   uint32_t id = sequence_id_;
   sequence_id_ = id < kMaxSequenceId ? id + 1 : 1;
   buffer_.clear();
diff --git a/src/base/mutex.cc b/src/base/mutex.cc
index 5fa6ac55a..1627ac20b 100644
--- a/src/base/mutex.cc
+++ b/src/base/mutex.cc
@@ -20,7 +20,7 @@
 
 namespace base {
 
-Mutex::Mutex() : mutex_{} {
+Mutex::Mutex(bool abort) : abort_{abort}, mutex_{}, result_{0} {
   pthread_mutexattr_t attr;
   int result = pthread_mutexattr_init();
   if (result != 0) osaf_abort(result);
diff --git a/src/base/mutex.h b/src/base/mutex.h
index 7b3cee187..e3c54a711 100644
--- a/src/base/mutex.h
+++ b/src/base/mutex.h
@@ -31,30 +31,34 @@ namespace base {
 class Mutex {
  public:
   using NativeHandleType = pthread_mutex_t*;
-  Mutex();
+  Mutex(bool abort = true);
   ~Mutex();
   void Lock() {
-int result = pthread_mutex_lock(_);
-if (result != 0) osaf_abort(result);
+result_ = pthread_mutex_lock(_);
+if (abort_ && result_ != 0) osaf_abort(result_);
   }
   bool TryLock() {
-int result = pthread_mutex_trylock(_);
-if (result == 0) {
+result_ = pthread_mutex_trylock(_);
+if (result_ == 0) {
   return true;
-} else if (result == EBUSY) {
+} else if (result_ == EBUSY) {
   return false;
 } else {
-  osaf_abort(result);
+  if (abort_) osaf_abort(result_);
+  return false;
 }
   }
   void Unlock() {
-int result = pthread_mutex_unlock(_);
-if (result != 0) osaf_abort(result);
+result_ = pthread_mutex_unlock(_);
+if (abort_ && result_ != 0) osaf_abort(result_);
   }
   NativeHandleType native_handle() { return _; }
 
+  bool good() const {return result_ == 0;};
  private:
+  bool abort_;
   pthread_mutex_t mutex_;
+  int result_;
   DELETE_COPY_AND_MOVE_OPERATORS(Mutex);
 };
 
-- 
2.17.0


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] rded: run controller promotion code in new thread [#2857]

2018-05-23 Thread Ravi Sekhar Reddy Konda
Hi Gary,

Ack, code review only 

Regards,
Ravi
- Original Message -
From: gary@dektech.com.au
To: hans.nordeb...@ericsson.com, ravisekhar.ko...@oracle.com, 
anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net, gary@dektech.com.au
Sent: Friday, May 18, 2018 11:20:34 AM GMT +05:30 Chennai, Kolkata, Mumbai, New 
Delhi
Subject: [PATCH 1/1] rded: run controller promotion code in new thread [#2857]

Currently, the consensus code relating to node promotion
is run from the main thread. We can improve rded's
responsiveness by moving this code into another thread.
---
 src/rde/rded/rde_cb.h|  3 +-
 src/rde/rded/rde_main.cc |  6 +++-
 src/rde/rded/role.cc | 82 ++--
 src/rde/rded/role.h  |  2 ++
 4 files changed, 61 insertions(+), 32 deletions(-)

diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h
index f5ad689c3..877687341 100644
--- a/src/rde/rded/rde_cb.h
+++ b/src/rde/rded/rde_cb.h
@@ -53,7 +53,8 @@ enum RDE_MSG_TYPE {
   RDE_MSG_NEW_ACTIVE_CALLBACK = 5,
   RDE_MSG_NODE_UP = 6,
   RDE_MSG_NODE_DOWN = 7,
-  RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8
+  RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8,
+  RDE_MSG_ACTIVE_PROMOTION_SUCCESS = 9
 };
 
 struct rde_peer_info {
diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
index c5b4b8283..c59aa4536 100644
--- a/src/rde/rded/rde_main.cc
+++ b/src/rde/rded/rde_main.cc
@@ -55,7 +55,8 @@ const char *rde_msg_name[] = {"-",
   "RDE_MSG_NEW_ACTIVE_CALLBACK(5)"
   "RDE_MSG_NODE_UP(6)",
   "RDE_MSG_NODE_DOWN(7)",
-  "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)"};
+  "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)",
+  "RDE_MSG_ACTIVE_PROMOTION_SUCCESS(9)"};
 
 static RDE_CONTROL_BLOCK _rde_cb;
 static RDE_CONTROL_BLOCK *rde_cb = &_rde_cb;
@@ -186,6 +187,9 @@ static void handle_mbx_event() {
 LOG_WA("Received takeover request when not active");
   }
 } break;
+case RDE_MSG_ACTIVE_PROMOTION_SUCCESS:
+  role->NodePromoted();
+  break;
 default:
   LOG_ER("%s: discarding unknown message type %u", __FUNCTION__, 
msg->type);
   break;
diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index 1b5a6ae89..b6a5df51a 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -22,6 +22,7 @@
 #include "rde/rded/role.h"
 #include 
 #include 
+#include 
 #include "base/getenv.h"
 #include "base/logtrace.h"
 #include "base/ncs_main_papi.h"
@@ -63,6 +64,55 @@ void Role::MonitorCallback(const std::string& key, const 
std::string& new_value,
   osafassert(status == NCSCC_RC_SUCCESS);
 }
 
+void Role::PromoteNode(const uint64_t cluster_size) {
+  TRACE_ENTER();
+  SaAisErrorT rc;
+
+  Consensus consensus_service;
+
+  rc = consensus_service.PromoteThisNode(true, cluster_size);
+  if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) {
+LOG_ER("Unable to set active controller in consensus service");
+opensaf_reboot(0, nullptr,
+   "Unable to set active controller in consensus service");
+  }
+
+  if (rc == SA_AIS_ERR_EXIST) {
+LOG_WA("Another controller is already active");
+return;
+  }
+
+  RDE_CONTROL_BLOCK* cb = rde_get_control_block();
+
+  // send msg to main thread
+  rde_msg* msg = static_cast(malloc(sizeof(rde_msg)));
+  msg->type = RDE_MSG_ACTIVE_PROMOTION_SUCCESS;
+  uint32_t status;
+  status = m_NCS_IPC_SEND(>mbx, msg, NCS_IPC_PRIORITY_HIGH);
+  osafassert(status == NCSCC_RC_SUCCESS);
+}
+
+void Role::NodePromoted() {
+  ExecutePreActiveScript();
+  LOG_NO("Switched to ACTIVE from %s", to_string(role()));
+  role_ = PCS_RDA_ACTIVE;
+  rde_rda_send_role(role_);
+
+  Consensus consensus_service;
+  RDE_CONTROL_BLOCK* cb = rde_get_control_block();
+
+  // register for callback if active controller is changed
+  // in consensus service
+  if (cb->monitor_lock_thread_running == false) {
+cb->monitor_lock_thread_running = true;
+consensus_service.MonitorLock(MonitorCallback, cb->mbx);
+  }
+  if (cb->monitor_takeover_req_thread_running == false) {
+cb->monitor_takeover_req_thread_running = true;
+consensus_service.MonitorTakeoverRequest(MonitorCallback, cb->mbx);
+  }
+}
+
 Role::Role(NODE_ID own_node_id)
 : known_nodes_{},
   role_{PCS_RDA_QUIESCED},
@@ -83,36 +133,8 @@ timespec* Role::Poll(timespec* ts) {
   timeout = ts;
 } else {
   RDE_CONTROL_BLOCK* cb = rde_get_control_block();
-  SaAisErrorT rc;
-  Consensus consensus_service;
-
-  rc = consensus_service.PromoteThisNode(true, cb->cluster_members.size());
-  if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) {
-LOG_ER("Unable to set active controller in consensus service");
-opensaf_reboot(0, nullptr,
-   "Unable to set active controller in consensus service");
-  }
-
-  if (rc == SA_AIS_ERR_EXIST) {
-