[devel] [PATCH 0/1] Review Request for rded: run controller promotion code in new thread V2 [#2857]
Summary: rded: run controller promotion code in new thread V2 [#2857] Review request for Ticket(s): 2857 Peer Reviewer(s): Hans, Ravi Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-2857 Base revision: e0bcf786e0b3417d31b767073bb789ef150eb2ad Personal repository: git://git.code.sf.net/u/userid-2226215/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesn OpenSAF servicesy Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - V2 - the only change is addition of a single line: + election_end_time_ = base::kTimespecMax; This is to ensure we don't launch another thread, before SetRole. revision 047b0545824d8a6118a98b933ea8ed89ebab4a3a Author: Gary LeeDate: Thu, 24 May 2018 14:12:46 +1000 rded: run controller promotion code in new thread [#2857] Currently, the consensus code relating to node promotion is run from the main thread. We can improve rded's responsiveness by moving this code into another thread. Complete diffstat: -- src/rde/rded/rde_cb.h| 3 +- src/rde/rded/rde_main.cc | 6 +++- src/rde/rded/role.cc | 83 +++- src/rde/rded/role.h | 2 ++ 4 files changed, 62 insertions(+), 32 deletions(-) Testing Commands: - Legacy tests Testing, Expected Results: -- Pass Conditions of Submission: - Ack from any reviewer, or in 7 days Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] rded: run controller promotion code in new thread [#2857]
Currently, the consensus code relating to node promotion is run from the main thread. We can improve rded's responsiveness by moving this code into another thread. --- src/rde/rded/rde_cb.h| 3 +- src/rde/rded/rde_main.cc | 6 +++- src/rde/rded/role.cc | 83 +++- src/rde/rded/role.h | 2 ++ 4 files changed, 62 insertions(+), 32 deletions(-) diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h index f5ad689c3..877687341 100644 --- a/src/rde/rded/rde_cb.h +++ b/src/rde/rded/rde_cb.h @@ -53,7 +53,8 @@ enum RDE_MSG_TYPE { RDE_MSG_NEW_ACTIVE_CALLBACK = 5, RDE_MSG_NODE_UP = 6, RDE_MSG_NODE_DOWN = 7, - RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8 + RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8, + RDE_MSG_ACTIVE_PROMOTION_SUCCESS = 9 }; struct rde_peer_info { diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc index c5b4b8283..c59aa4536 100644 --- a/src/rde/rded/rde_main.cc +++ b/src/rde/rded/rde_main.cc @@ -55,7 +55,8 @@ const char *rde_msg_name[] = {"-", "RDE_MSG_NEW_ACTIVE_CALLBACK(5)" "RDE_MSG_NODE_UP(6)", "RDE_MSG_NODE_DOWN(7)", - "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)"}; + "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)", + "RDE_MSG_ACTIVE_PROMOTION_SUCCESS(9)"}; static RDE_CONTROL_BLOCK _rde_cb; static RDE_CONTROL_BLOCK *rde_cb = &_rde_cb; @@ -186,6 +187,9 @@ static void handle_mbx_event() { LOG_WA("Received takeover request when not active"); } } break; +case RDE_MSG_ACTIVE_PROMOTION_SUCCESS: + role->NodePromoted(); + break; default: LOG_ER("%s: discarding unknown message type %u", __FUNCTION__, msg->type); break; diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc index 1b5a6ae89..a03372413 100644 --- a/src/rde/rded/role.cc +++ b/src/rde/rded/role.cc @@ -22,6 +22,7 @@ #include "rde/rded/role.h" #include #include +#include #include "base/getenv.h" #include "base/logtrace.h" #include "base/ncs_main_papi.h" @@ -63,6 +64,55 @@ void Role::MonitorCallback(const std::string& key, const std::string& new_value, osafassert(status == NCSCC_RC_SUCCESS); } +void Role::PromoteNode(const uint64_t cluster_size) { + TRACE_ENTER(); + SaAisErrorT rc; + + Consensus consensus_service; + + rc = consensus_service.PromoteThisNode(true, cluster_size); + if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) { +LOG_ER("Unable to set active controller in consensus service"); +opensaf_reboot(0, nullptr, + "Unable to set active controller in consensus service"); + } + + if (rc == SA_AIS_ERR_EXIST) { +LOG_WA("Another controller is already active"); +return; + } + + RDE_CONTROL_BLOCK* cb = rde_get_control_block(); + + // send msg to main thread + rde_msg* msg = static_cast(malloc(sizeof(rde_msg))); + msg->type = RDE_MSG_ACTIVE_PROMOTION_SUCCESS; + uint32_t status; + status = m_NCS_IPC_SEND(>mbx, msg, NCS_IPC_PRIORITY_HIGH); + osafassert(status == NCSCC_RC_SUCCESS); +} + +void Role::NodePromoted() { + ExecutePreActiveScript(); + LOG_NO("Switched to ACTIVE from %s", to_string(role())); + role_ = PCS_RDA_ACTIVE; + rde_rda_send_role(role_); + + Consensus consensus_service; + RDE_CONTROL_BLOCK* cb = rde_get_control_block(); + + // register for callback if active controller is changed + // in consensus service + if (cb->monitor_lock_thread_running == false) { +cb->monitor_lock_thread_running = true; +consensus_service.MonitorLock(MonitorCallback, cb->mbx); + } + if (cb->monitor_takeover_req_thread_running == false) { +cb->monitor_takeover_req_thread_running = true; +consensus_service.MonitorTakeoverRequest(MonitorCallback, cb->mbx); + } +} + Role::Role(NODE_ID own_node_id) : known_nodes_{}, role_{PCS_RDA_QUIESCED}, @@ -82,37 +132,10 @@ timespec* Role::Poll(timespec* ts) { *ts = election_end_time_ - now; timeout = ts; } else { + election_end_time_ = base::kTimespecMax; RDE_CONTROL_BLOCK* cb = rde_get_control_block(); - SaAisErrorT rc; - Consensus consensus_service; - - rc = consensus_service.PromoteThisNode(true, cb->cluster_members.size()); - if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) { -LOG_ER("Unable to set active controller in consensus service"); -opensaf_reboot(0, nullptr, - "Unable to set active controller in consensus service"); - } - - if (rc == SA_AIS_ERR_EXIST) { -LOG_WA("Another controller is already active"); -return timeout; - } - - ExecutePreActiveScript(); - LOG_NO("Switched to ACTIVE from %s", to_string(role())); - role_ = PCS_RDA_ACTIVE; - rde_rda_send_role(role_); - - // register for callback if active controller is changed - // in consensus service -
Re: [devel] [PATCH 1/1] base: Destructor of TraceLog causes coredump V2 [#2860]
Hi Minh, yes you are right about the possibility for a segv, but using a std::shared_ptr instead of the naked ptr may be an option ? /Thanks Hans Från: Minh Hon ChauSkickat: den 24 maj 2018 02:34:13 Till: Hans Nordebäck; Anders Widell; Gary Lee Kopia: opensaf-devel@lists.sourceforge.net Ämne: Re: [PATCH 1/1] base: Destructor of TraceLog causes coredump V2 [#2860] Hi Hans, It is good to give an option to Mutex class not to abort. We can avoid the abort in mutex_unlock (as reported in coredump), but I feel the issue is still there. We may hit a problem (segv?) with "mutex_->good()" since the other thread is wiping out the mutex_ in destructor, it is a matter of timing to happen I guess. As we don't have (and don't want to have) any protection between two threads for the TraceLog, so the good one (I hope) is making one of those threads not to touch the TraceLog. If you don't like to remove the destructor, another way is locating the gl_trace/gl_log to the HEAP? Thanks, Minh On 23/05/18 20:50, Hans Nordeback wrote: > Change Mutex class to make it possible for caller to decide if abort > --- > src/base/logtrace_client.cc | 5 - > src/base/mutex.cc | 2 +- > src/base/mutex.h| 22 +- > 3 files changed, 18 insertions(+), 11 deletions(-) > > diff --git a/src/base/logtrace_client.cc b/src/base/logtrace_client.cc > index 0dac6d389..f597c1ae3 100644 > --- a/src/base/logtrace_client.cc > +++ b/src/base/logtrace_client.cc > @@ -76,7 +76,7 @@ bool TraceLog::Init(const char *msg_id, WriteMode mode) { > msg_id_ = base::LogMessage::MsgId{msg_id}; > log_socket_ = new base::UnixClientSocket{Osaflog::kServerSocketPath, > static_cast(mode)}; > - mutex_ = new base::Mutex{}; > + mutex_ = new base::Mutex{false}; > > return true; > } > @@ -91,6 +91,9 @@ void TraceLog::Log(base::LogMessage::Severity severity, > const char *fmt, > void TraceLog::LogInternal(base::LogMessage::Severity severity, const char > *fmt, > va_list ap) { > base::Lock lock(*mutex_); > + > + if (!mutex_->good()) return; > + > uint32_t id = sequence_id_; > sequence_id_ = id < kMaxSequenceId ? id + 1 : 1; > buffer_.clear(); > diff --git a/src/base/mutex.cc b/src/base/mutex.cc > index 5fa6ac55a..1627ac20b 100644 > --- a/src/base/mutex.cc > +++ b/src/base/mutex.cc > @@ -20,7 +20,7 @@ > > namespace base { > > -Mutex::Mutex() : mutex_{} { > +Mutex::Mutex(bool abort) : abort_{abort}, mutex_{}, result_{0} { > pthread_mutexattr_t attr; > int result = pthread_mutexattr_init(); > if (result != 0) osaf_abort(result); > diff --git a/src/base/mutex.h b/src/base/mutex.h > index 7b3cee187..e3c54a711 100644 > --- a/src/base/mutex.h > +++ b/src/base/mutex.h > @@ -31,30 +31,34 @@ namespace base { > class Mutex { >public: > using NativeHandleType = pthread_mutex_t*; > - Mutex(); > + Mutex(bool abort = true); > ~Mutex(); > void Lock() { > -int result = pthread_mutex_lock(_); > -if (result != 0) osaf_abort(result); > +result_ = pthread_mutex_lock(_); > +if (abort_ && result_ != 0) osaf_abort(result_); > } > bool TryLock() { > -int result = pthread_mutex_trylock(_); > -if (result == 0) { > +result_ = pthread_mutex_trylock(_); > +if (result_ == 0) { > return true; > -} else if (result == EBUSY) { > +} else if (result_ == EBUSY) { > return false; > } else { > - osaf_abort(result); > + if (abort_) osaf_abort(result_); > + return false; > } > } > void Unlock() { > -int result = pthread_mutex_unlock(_); > -if (result != 0) osaf_abort(result); > +result_ = pthread_mutex_unlock(_); > +if (abort_ && result_ != 0) osaf_abort(result_); > } > NativeHandleType native_handle() { return _; } > > + bool good() const {return result_ == 0;}; >private: > + bool abort_; > pthread_mutex_t mutex_; > + int result_; > DELETE_COPY_AND_MOVE_OPERATORS(Mutex); > }; > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] base: Destructor of TraceLog causes coredump V2 [#2860]
Hi Hans, It is good to give an option to Mutex class not to abort. We can avoid the abort in mutex_unlock (as reported in coredump), but I feel the issue is still there. We may hit a problem (segv?) with "mutex_->good()" since the other thread is wiping out the mutex_ in destructor, it is a matter of timing to happen I guess. As we don't have (and don't want to have) any protection between two threads for the TraceLog, so the good one (I hope) is making one of those threads not to touch the TraceLog. If you don't like to remove the destructor, another way is locating the gl_trace/gl_log to the HEAP? Thanks, Minh On 23/05/18 20:50, Hans Nordeback wrote: Change Mutex class to make it possible for caller to decide if abort --- src/base/logtrace_client.cc | 5 - src/base/mutex.cc | 2 +- src/base/mutex.h| 22 +- 3 files changed, 18 insertions(+), 11 deletions(-) diff --git a/src/base/logtrace_client.cc b/src/base/logtrace_client.cc index 0dac6d389..f597c1ae3 100644 --- a/src/base/logtrace_client.cc +++ b/src/base/logtrace_client.cc @@ -76,7 +76,7 @@ bool TraceLog::Init(const char *msg_id, WriteMode mode) { msg_id_ = base::LogMessage::MsgId{msg_id}; log_socket_ = new base::UnixClientSocket{Osaflog::kServerSocketPath, static_cast(mode)}; - mutex_ = new base::Mutex{}; + mutex_ = new base::Mutex{false}; return true; } @@ -91,6 +91,9 @@ void TraceLog::Log(base::LogMessage::Severity severity, const char *fmt, void TraceLog::LogInternal(base::LogMessage::Severity severity, const char *fmt, va_list ap) { base::Lock lock(*mutex_); + + if (!mutex_->good()) return; + uint32_t id = sequence_id_; sequence_id_ = id < kMaxSequenceId ? id + 1 : 1; buffer_.clear(); diff --git a/src/base/mutex.cc b/src/base/mutex.cc index 5fa6ac55a..1627ac20b 100644 --- a/src/base/mutex.cc +++ b/src/base/mutex.cc @@ -20,7 +20,7 @@ namespace base { -Mutex::Mutex() : mutex_{} { +Mutex::Mutex(bool abort) : abort_{abort}, mutex_{}, result_{0} { pthread_mutexattr_t attr; int result = pthread_mutexattr_init(); if (result != 0) osaf_abort(result); diff --git a/src/base/mutex.h b/src/base/mutex.h index 7b3cee187..e3c54a711 100644 --- a/src/base/mutex.h +++ b/src/base/mutex.h @@ -31,30 +31,34 @@ namespace base { class Mutex { public: using NativeHandleType = pthread_mutex_t*; - Mutex(); + Mutex(bool abort = true); ~Mutex(); void Lock() { -int result = pthread_mutex_lock(_); -if (result != 0) osaf_abort(result); +result_ = pthread_mutex_lock(_); +if (abort_ && result_ != 0) osaf_abort(result_); } bool TryLock() { -int result = pthread_mutex_trylock(_); -if (result == 0) { +result_ = pthread_mutex_trylock(_); +if (result_ == 0) { return true; -} else if (result == EBUSY) { +} else if (result_ == EBUSY) { return false; } else { - osaf_abort(result); + if (abort_) osaf_abort(result_); + return false; } } void Unlock() { -int result = pthread_mutex_unlock(_); -if (result != 0) osaf_abort(result); +result_ = pthread_mutex_unlock(_); +if (abort_ && result_ != 0) osaf_abort(result_); } NativeHandleType native_handle() { return _; } + bool good() const {return result_ == 0;}; private: + bool abort_; pthread_mutex_t mutex_; + int result_; DELETE_COPY_AND_MOVE_OPERATORS(Mutex); }; -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for base: Destructor of TraceLog causes coredump V2 [#2860]
Summary: base: Destructor of TraceLog causes coredump V2 [#2860] Review request for Ticket(s): 2860 Peer Reviewer(s): Minh, Gary, AndersW Pull request to: Affected branch(es): develop Development branch: ticket-2860 Base revision: e0bcf786e0b3417d31b767073bb789ef150eb2ad Personal repository: git://git.code.sf.net/u/hansnordeback/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesn OpenSAF servicesn Core libraries y Samples n Tests n Other n Comments (indicate scope for each "y" above): - *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision c8e1aced8519c4f77819b1dbc92bdedfcf4734ea Author: Hans NordebackDate: Wed, 23 May 2018 12:36:40 +0200 base: Destructor of TraceLog causes coredump V2 [#2860] Change Mutex class to make it possible for caller to decide if abort Complete diffstat: -- src/base/logtrace_client.cc | 5 - src/base/mutex.cc | 2 +- src/base/mutex.h| 22 +- 3 files changed, 18 insertions(+), 11 deletions(-) Testing Commands: - *** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES *** Testing, Expected Results: -- *** PASTE COMMAND OUTPUTS / TEST RESULTS *** Conditions of Submission: - *** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC *** Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y n powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] base: Destructor of TraceLog causes coredump V2 [#2860]
Change Mutex class to make it possible for caller to decide if abort --- src/base/logtrace_client.cc | 5 - src/base/mutex.cc | 2 +- src/base/mutex.h| 22 +- 3 files changed, 18 insertions(+), 11 deletions(-) diff --git a/src/base/logtrace_client.cc b/src/base/logtrace_client.cc index 0dac6d389..f597c1ae3 100644 --- a/src/base/logtrace_client.cc +++ b/src/base/logtrace_client.cc @@ -76,7 +76,7 @@ bool TraceLog::Init(const char *msg_id, WriteMode mode) { msg_id_ = base::LogMessage::MsgId{msg_id}; log_socket_ = new base::UnixClientSocket{Osaflog::kServerSocketPath, static_cast(mode)}; - mutex_ = new base::Mutex{}; + mutex_ = new base::Mutex{false}; return true; } @@ -91,6 +91,9 @@ void TraceLog::Log(base::LogMessage::Severity severity, const char *fmt, void TraceLog::LogInternal(base::LogMessage::Severity severity, const char *fmt, va_list ap) { base::Lock lock(*mutex_); + + if (!mutex_->good()) return; + uint32_t id = sequence_id_; sequence_id_ = id < kMaxSequenceId ? id + 1 : 1; buffer_.clear(); diff --git a/src/base/mutex.cc b/src/base/mutex.cc index 5fa6ac55a..1627ac20b 100644 --- a/src/base/mutex.cc +++ b/src/base/mutex.cc @@ -20,7 +20,7 @@ namespace base { -Mutex::Mutex() : mutex_{} { +Mutex::Mutex(bool abort) : abort_{abort}, mutex_{}, result_{0} { pthread_mutexattr_t attr; int result = pthread_mutexattr_init(); if (result != 0) osaf_abort(result); diff --git a/src/base/mutex.h b/src/base/mutex.h index 7b3cee187..e3c54a711 100644 --- a/src/base/mutex.h +++ b/src/base/mutex.h @@ -31,30 +31,34 @@ namespace base { class Mutex { public: using NativeHandleType = pthread_mutex_t*; - Mutex(); + Mutex(bool abort = true); ~Mutex(); void Lock() { -int result = pthread_mutex_lock(_); -if (result != 0) osaf_abort(result); +result_ = pthread_mutex_lock(_); +if (abort_ && result_ != 0) osaf_abort(result_); } bool TryLock() { -int result = pthread_mutex_trylock(_); -if (result == 0) { +result_ = pthread_mutex_trylock(_); +if (result_ == 0) { return true; -} else if (result == EBUSY) { +} else if (result_ == EBUSY) { return false; } else { - osaf_abort(result); + if (abort_) osaf_abort(result_); + return false; } } void Unlock() { -int result = pthread_mutex_unlock(_); -if (result != 0) osaf_abort(result); +result_ = pthread_mutex_unlock(_); +if (abort_ && result_ != 0) osaf_abort(result_); } NativeHandleType native_handle() { return _; } + bool good() const {return result_ == 0;}; private: + bool abort_; pthread_mutex_t mutex_; + int result_; DELETE_COPY_AND_MOVE_OPERATORS(Mutex); }; -- 2.17.0 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] rded: run controller promotion code in new thread [#2857]
Hi Gary, Ack, code review only Regards, Ravi - Original Message - From: gary@dektech.com.au To: hans.nordeb...@ericsson.com, ravisekhar.ko...@oracle.com, anders.wid...@ericsson.com Cc: opensaf-devel@lists.sourceforge.net, gary@dektech.com.au Sent: Friday, May 18, 2018 11:20:34 AM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: [PATCH 1/1] rded: run controller promotion code in new thread [#2857] Currently, the consensus code relating to node promotion is run from the main thread. We can improve rded's responsiveness by moving this code into another thread. --- src/rde/rded/rde_cb.h| 3 +- src/rde/rded/rde_main.cc | 6 +++- src/rde/rded/role.cc | 82 ++-- src/rde/rded/role.h | 2 ++ 4 files changed, 61 insertions(+), 32 deletions(-) diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h index f5ad689c3..877687341 100644 --- a/src/rde/rded/rde_cb.h +++ b/src/rde/rded/rde_cb.h @@ -53,7 +53,8 @@ enum RDE_MSG_TYPE { RDE_MSG_NEW_ACTIVE_CALLBACK = 5, RDE_MSG_NODE_UP = 6, RDE_MSG_NODE_DOWN = 7, - RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8 + RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8, + RDE_MSG_ACTIVE_PROMOTION_SUCCESS = 9 }; struct rde_peer_info { diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc index c5b4b8283..c59aa4536 100644 --- a/src/rde/rded/rde_main.cc +++ b/src/rde/rded/rde_main.cc @@ -55,7 +55,8 @@ const char *rde_msg_name[] = {"-", "RDE_MSG_NEW_ACTIVE_CALLBACK(5)" "RDE_MSG_NODE_UP(6)", "RDE_MSG_NODE_DOWN(7)", - "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)"}; + "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)", + "RDE_MSG_ACTIVE_PROMOTION_SUCCESS(9)"}; static RDE_CONTROL_BLOCK _rde_cb; static RDE_CONTROL_BLOCK *rde_cb = &_rde_cb; @@ -186,6 +187,9 @@ static void handle_mbx_event() { LOG_WA("Received takeover request when not active"); } } break; +case RDE_MSG_ACTIVE_PROMOTION_SUCCESS: + role->NodePromoted(); + break; default: LOG_ER("%s: discarding unknown message type %u", __FUNCTION__, msg->type); break; diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc index 1b5a6ae89..b6a5df51a 100644 --- a/src/rde/rded/role.cc +++ b/src/rde/rded/role.cc @@ -22,6 +22,7 @@ #include "rde/rded/role.h" #include #include +#include #include "base/getenv.h" #include "base/logtrace.h" #include "base/ncs_main_papi.h" @@ -63,6 +64,55 @@ void Role::MonitorCallback(const std::string& key, const std::string& new_value, osafassert(status == NCSCC_RC_SUCCESS); } +void Role::PromoteNode(const uint64_t cluster_size) { + TRACE_ENTER(); + SaAisErrorT rc; + + Consensus consensus_service; + + rc = consensus_service.PromoteThisNode(true, cluster_size); + if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) { +LOG_ER("Unable to set active controller in consensus service"); +opensaf_reboot(0, nullptr, + "Unable to set active controller in consensus service"); + } + + if (rc == SA_AIS_ERR_EXIST) { +LOG_WA("Another controller is already active"); +return; + } + + RDE_CONTROL_BLOCK* cb = rde_get_control_block(); + + // send msg to main thread + rde_msg* msg = static_cast(malloc(sizeof(rde_msg))); + msg->type = RDE_MSG_ACTIVE_PROMOTION_SUCCESS; + uint32_t status; + status = m_NCS_IPC_SEND(>mbx, msg, NCS_IPC_PRIORITY_HIGH); + osafassert(status == NCSCC_RC_SUCCESS); +} + +void Role::NodePromoted() { + ExecutePreActiveScript(); + LOG_NO("Switched to ACTIVE from %s", to_string(role())); + role_ = PCS_RDA_ACTIVE; + rde_rda_send_role(role_); + + Consensus consensus_service; + RDE_CONTROL_BLOCK* cb = rde_get_control_block(); + + // register for callback if active controller is changed + // in consensus service + if (cb->monitor_lock_thread_running == false) { +cb->monitor_lock_thread_running = true; +consensus_service.MonitorLock(MonitorCallback, cb->mbx); + } + if (cb->monitor_takeover_req_thread_running == false) { +cb->monitor_takeover_req_thread_running = true; +consensus_service.MonitorTakeoverRequest(MonitorCallback, cb->mbx); + } +} + Role::Role(NODE_ID own_node_id) : known_nodes_{}, role_{PCS_RDA_QUIESCED}, @@ -83,36 +133,8 @@ timespec* Role::Poll(timespec* ts) { timeout = ts; } else { RDE_CONTROL_BLOCK* cb = rde_get_control_block(); - SaAisErrorT rc; - Consensus consensus_service; - - rc = consensus_service.PromoteThisNode(true, cb->cluster_members.size()); - if (rc != SA_AIS_OK && rc != SA_AIS_ERR_EXIST) { -LOG_ER("Unable to set active controller in consensus service"); -opensaf_reboot(0, nullptr, - "Unable to set active controller in consensus service"); - } - - if (rc == SA_AIS_ERR_EXIST) { -