[devel] [PATCH 0/2] Review Request for base: Add support for OpenSAF local log file [#2306]
Summary: base: Add support to direct OpenSAF logging to local node file [#2306] Review request for Ticket(s): 2306 Peer Reviewer(s): Anders, Hans, Ravi Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-2306 Base revision: aff54ff091727f27830443332b830890668749cf Personal repository: git://git.code.sf.net/u/minh-chau/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision f7f80c0372017bb3b5b94c927e08a86adf61e286 Author: Minh ChauDate: Thu, 12 Apr 2018 09:02:46 +1000 base: Example of OSAF_LOCAL_LOG_FILE environment variable in amfd.conf file [#2306] This commit is only an example to show where OSAF_LOCAL_LOG_FILE is configured. This commit will not be pushed. The introduction of OSAF_LOCAL_LOG_FILE for all services will be in another commit. revision 9ad63ddd3259950ab270aa9797742fbb801a6594 Author: Minh Chau Date: Thu, 12 Apr 2018 08:36:05 +1000 base: Add support to direct OpenSAF logging to local node file [#2306] Unify TraceLog and MdsLog class to one class (TraceLog) so it can be used as a common log client. Add new instance TraceLog for OpenSAF logging to local file, which can be enabled/disabled via environment variable OSAF_LOCAL_NODE_LOG Complete diffstat: -- src/amf/amfd/amfd.conf | 6 ++ src/base/logtrace.cc | 167 + src/base/logtrace.h| 50 +-- src/mds/mds_log.cc | 114 +++-- 4 files changed, 146 insertions(+), 191 deletions(-) Testing Commands: - *** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES *** Testing, Expected Results: -- *** PASTE COMMAND OUTPUTS / TEST RESULTS *** Conditions of Submission: - ack from reviewers Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that
[devel] [PATCH 2/2] base: Example of OSAF_LOCAL_LOG_FILE environment variable in amfd.conf file [#2306]
This commit is only an example to show where OSAF_LOCAL_LOG_FILE is configured. This commit will not be pushed. The introduction of OSAF_LOCAL_LOG_FILE for all services will be in another commit. --- src/amf/amfd/amfd.conf | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/amf/amfd/amfd.conf b/src/amf/amfd/amfd.conf index 6ee111b..a5e4c21 100644 --- a/src/amf/amfd/amfd.conf +++ b/src/amf/amfd/amfd.conf @@ -19,3 +19,9 @@ export AVSV_HB_PERIOD=100 # Uncomment the next line to enable info level logging #args="--loglevel=info" + +# Uncomment the next line to enable this service to log to OpenSAF local node log file +# Only logging priority with equal or higher than LOG_WARNING to system log file +# All logging will be recored in new local node log file +# The log file is $PKGLOGDIR/osaf.log +export OSAF_LOCAL_NODE_LOG=1 \ No newline at end of file -- 2.7.4 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/2] base: Add support to direct OpenSAF logging to local node file [#2306]
Unify TraceLog and MdsLog class to one class (TraceLog) so it can be used as a common log client. Add new instance TraceLog for OpenSAF logging to local file, which can be enabled/disabled via environment variable OSAF_LOCAL_NODE_LOG --- src/base/logtrace.cc | 167 ++- src/base/logtrace.h | 50 +-- src/mds/mds_log.cc | 114 +++ 3 files changed, 140 insertions(+), 191 deletions(-) diff --git a/src/base/logtrace.cc b/src/base/logtrace.cc index b046fab..857e31c 100644 --- a/src/base/logtrace.cc +++ b/src/base/logtrace.cc @@ -36,15 +36,10 @@ #include #include #include -#include "base/buffer.h" -#include "base/conf.h" -#include "base/log_message.h" -#include "base/macros.h" -#include "base/mutex.h" +#include "base/getenv.h" #include "base/ncsgl_defs.h" #include "base/osaf_utility.h" #include "base/time.h" -#include "base/unix_client_socket.h" #include "dtm/common/osaflog_protocol.h" namespace global { @@ -55,65 +50,38 @@ const char *const prefix_name[] = {"EM", "AL", "CR", "ER", "WA", "NO", "IN", "T6", "T7", "T8", ">>", "<<"}; char *msg_id; int logmask; +const char* osaf_log_file = "osaf.log"; +bool enable_osaf_log = false; } // namespace global -class TraceLog { - public: - static bool Init(); - static void Log(base::LogMessage::Severity severity, const char *fmt, - va_list ap); - - private: - TraceLog(const std::string , const std::string _name, - uint32_t proc_id, const std::string _id, - const std::string _name); - void LogInternal(base::LogMessage::Severity severity, const char *fmt, - va_list ap); - static constexpr const uint32_t kMaxSequenceId = uint32_t{0x7fff}; - static TraceLog *instance_; - const base::LogMessage::HostName fqdn_; - const base::LogMessage::AppName app_name_; - const base::LogMessage::ProcId proc_id_; - const base::LogMessage::MsgId msg_id_; - uint32_t sequence_id_; - base::UnixClientSocket log_socket_; - base::Buffer<512> buffer_; - base::Mutex mutex_; - - DELETE_COPY_AND_MOVE_OPERATORS(TraceLog); -}; - -TraceLog *TraceLog::instance_ = nullptr; - -TraceLog::TraceLog(const std::string , const std::string _name, - uint32_t proc_id, const std::string _id, - const std::string _name) -: fqdn_{base::LogMessage::HostName{fqdn}}, - app_name_{base::LogMessage::AppName{app_name}}, - proc_id_{base::LogMessage::ProcId{std::to_string(proc_id)}}, - msg_id_{base::LogMessage::MsgId{msg_id}}, - sequence_id_{1}, - log_socket_{socket_name, base::UnixSocket::kBlocking}, - buffer_{}, - mutex_{} {} - -bool TraceLog::Init() { - if (instance_ != nullptr) return false; - char app_name[49]; - char pid_path[1024]; +static TraceLog gl_trace; +static TraceLog gl_osaflog; +TraceLog::TraceLog() +: sequence_id_{1}, buffer_{} { + mutex_ = nullptr; + log_socket_ = nullptr; +} + +TraceLog::~TraceLog() { + if (mutex_) delete mutex_; + if (log_socket_) delete log_socket_; +} +bool TraceLog::Init(const char *msg_id, WriteMode mode) { + char app_name[NAME_MAX]; + char pid_path[PATH_MAX]; uint32_t process_id = getpid(); char *token, *saveptr; char *pid_name = nullptr; - memset(app_name, 0, 49); - memset(pid_path, 0, 1024); + memset(app_name, 0, NAME_MAX); + memset(pid_path, 0, PATH_MAX); snprintf(pid_path, sizeof(pid_path), "/proc/%" PRIu32 "/cmdline", process_id); FILE *f = fopen(pid_path, "r"); if (f != nullptr) { size_t size; -size = fread(pid_path, sizeof(char), 1024, f); +size = fread(pid_path, sizeof(char), PATH_MAX, f); if (size > 0) { if ('\n' == pid_path[size - 1]) pid_path[size - 1] = '\0'; } @@ -131,19 +99,28 @@ bool TraceLog::Init() { } base::Conf::InitFullyQualifiedDomainName(); const std::string = base::Conf::FullyQualifiedDomainName(); - instance_ = new TraceLog{fqdn, app_name, process_id, global::msg_id, - Osaflog::kServerSocketPath}; - return instance_ != nullptr; + + fqdn_ = base::LogMessage::HostName(fqdn); + app_name_ = base::LogMessage::AppName(app_name); + proc_id_ = base::LogMessage::ProcId{std::to_string(process_id)}; + msg_id_ = base::LogMessage::MsgId{msg_id}; + log_socket_ = new base::UnixClientSocket{Osaflog::kServerSocketPath, +static_cast(mode)}; + mutex_ = new base::Mutex{}; + + return true; } void TraceLog::Log(base::LogMessage::Severity severity, const char *fmt, va_list ap) { - if (instance_ != nullptr) instance_->LogInternal(severity, fmt, ap); + if (log_socket_ != nullptr && mutex_ != nullptr) { +LogInternal(severity, fmt, ap); + } } void TraceLog::LogInternal(base::LogMessage::Severity severity, const char *fmt, va_list ap) { - base::Lock lock(mutex_); + base::Lock lock(*mutex_); uint32_t id =
Re: [devel] [PATCH 1/1] imm: fix memory leaked in immnd [#2825]
ack, review only. /Thanks HansN On 04/05/2018 04:39 AM, Vu Minh Nguyen wrote: The allocated memory is not freed before returning from the function ImmModel::setCcbErrorString(). --- src/imm/immnd/ImmModel.cc | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/imm/immnd/ImmModel.cc b/src/imm/immnd/ImmModel.cc index f7c8fc0..87ded27 100644 --- a/src/imm/immnd/ImmModel.cc +++ b/src/imm/immnd/ImmModel.cc @@ -10910,7 +10910,6 @@ SaAisErrorT ImmModel::deleteObject(ObjectMap::iterator& oi, SaUint32T reqConn, void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString, va_list vl) { int errLen = strlen(errorString) + 1; - char* fmtError = (char*)malloc(errLen); int len; va_list args; int isValidationErrString = 0; @@ -10921,6 +10920,9 @@ void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString, return; } + char* fmtError = (char*)malloc(errLen); + osafassert(fmtError); + va_copy(args, vl); len = vsnprintf(fmtError, errLen, errorString, args); va_end(args); @@ -10930,7 +10932,8 @@ void ImmModel::setCcbErrorString(CcbInfo* ccb, const char* errorString, if (len > errLen) { char* newFmtError = (char*)realloc(fmtError, len); if (newFmtError == nullptr) { - TRACE_5("realloc error ,No memory "); + TRACE_5("realloc error, no memory"); + free(fmtError); return; } else { fmtError = newFmtError; -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for imm: fix cppcheck issues in apitest [#2384]
Summary: imm: fix Cppcheck issues in imm/src/apitest [#2384] Review request for Ticket(s): 2384 Peer Reviewer(s): Vu Minh, Ravi Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-2384 Base revision: aff54ff091727f27830443332b830890668749cf Personal repository: git://git.code.sf.net/u/sri-mangipudy/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesn OpenSAF servicesn Core libraries n Samples n Tests y Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - fixed cppcheck issues revision 065a81b536830a9967bc5b65e8d7fcd950edb2ef Author: srinivasDate: Wed, 11 Apr 2018 16:51:10 +0530 imm: fix Cppcheck issues in imm/src/apitest [#2384] Complete diffstat: -- src/imm/apitest/implementer/applier.c | 4 +--- src/imm/apitest/implementer/test_SaImmOiCcb.c | 6 ++ src/imm/apitest/implementer/test_saImmOiAugmentCcbInitialize.c | 6 ++ src/imm/apitest/implementer/test_saImmOiImplementerSet.c | 3 +-- src/imm/apitest/implementer/test_saImmOiLongDn.c | 3 +-- src/imm/apitest/implementer/test_saImmOiSaStringT.c| 3 +-- src/imm/apitest/management/test_saImmOmLongDn.c| 6 ++ src/imm/apitest/management/test_saImmOmSaStringT.c | 3 +-- src/imm/apitest/management/test_saImmOmThreadInterference.c| 3 +-- 9 files changed, 12 insertions(+), 25 deletions(-) Testing Commands: - *** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES *** Testing, Expected Results: -- *** PASTE COMMAND OUTPUTS / TEST RESULTS *** Conditions of Submission: - *** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC *** Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 n n powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites,
[devel] [PATCH 1/1] imm: fix Cppcheck issues in imm/src/apitest [#2384]
--- src/imm/apitest/implementer/applier.c | 4 +--- src/imm/apitest/implementer/test_SaImmOiCcb.c | 6 ++ src/imm/apitest/implementer/test_saImmOiAugmentCcbInitialize.c | 6 ++ src/imm/apitest/implementer/test_saImmOiImplementerSet.c | 3 +-- src/imm/apitest/implementer/test_saImmOiLongDn.c | 3 +-- src/imm/apitest/implementer/test_saImmOiSaStringT.c| 3 +-- src/imm/apitest/management/test_saImmOmLongDn.c| 6 ++ src/imm/apitest/management/test_saImmOmSaStringT.c | 3 +-- src/imm/apitest/management/test_saImmOmThreadInterference.c| 3 +-- 9 files changed, 12 insertions(+), 25 deletions(-) diff --git a/src/imm/apitest/implementer/applier.c b/src/imm/apitest/implementer/applier.c index a8d5cd2..a16440e 100644 --- a/src/imm/apitest/implementer/applier.c +++ b/src/imm/apitest/implementer/applier.c @@ -220,7 +220,6 @@ static void saImmOiAdminOperationCallback( int main(int argc, char *argv[]) { - int c; struct option long_options[] = { {"parameter", required_argument, 0, 'p'}, {"operation-id", required_argument, 0, 'o'}, @@ -249,7 +248,7 @@ int main(int argc, char *argv[]) params[0] = NULL; while (1) { - c = getopt_long(argc, argv, "p:o:t:a:h", long_options, NULL); + int c = getopt_long(argc, argv, "p:o:t:a:h", long_options, NULL); if (c == -1) /* have all command-line options have been parsed? */ @@ -386,7 +385,6 @@ int main(int argc, char *argv[]) fprintf(stderr, "saImmOiDispatch returned %s\n", saf_error(error)); exit(EXIT_FAILURE); - break; } } } diff --git a/src/imm/apitest/implementer/test_SaImmOiCcb.c b/src/imm/apitest/implementer/test_SaImmOiCcb.c index 2ee5034..a768b4d 100644 --- a/src/imm/apitest/implementer/test_SaImmOiCcb.c +++ b/src/imm/apitest/implementer/test_SaImmOiCcb.c @@ -185,7 +185,6 @@ done: static void *objectImplementerThreadMain(void *arg) { struct pollfd fds[2]; - int ret; char buf[384]; const SaImmOiImplementerNameT implementerName = buf; SaSelectionObjectT selObj; @@ -210,7 +209,7 @@ static void *objectImplementerThreadMain(void *arg) /* We can receive five callbacks: create, delete, modify, completed & * apply */ while (1) { - ret = poll(fds, 2, -1); + int ret = poll(fds, 2, -1); if (ret == -1) fprintf(stderr, "poll error: %s\n", strerror(errno)); @@ -243,7 +242,6 @@ static void *objectImplementerThreadMain(void *arg) static void *classImplementerThreadMain(void *arg) { struct pollfd fds[2]; - int ret; const SaImmOiImplementerNameT implementerName = (SaImmOiImplementerNameT) __FUNCTION__; SaSelectionObjectT selObj; @@ -264,7 +262,7 @@ static void *classImplementerThreadMain(void *arg) fds[1].events = POLLIN; while (1) { - ret = poll(fds, 2, -1); + int ret = poll(fds, 2, -1); if (ret == -1) fprintf(stderr, "poll error: %s\n", strerror(errno)); diff --git a/src/imm/apitest/implementer/test_saImmOiAugmentCcbInitialize.c b/src/imm/apitest/implementer/test_saImmOiAugmentCcbInitialize.c index 64f0312..c623018 100644 --- a/src/imm/apitest/implementer/test_saImmOiAugmentCcbInitialize.c +++ b/src/imm/apitest/implementer/test_saImmOiAugmentCcbInitialize.c @@ -391,7 +391,6 @@ static void *immOiObjectDispatchThread(void *arg) SaImmOiHandleT handle; SaSelectionObjectT selObj; struct pollfd fds[2]; - int ret; TRACE_ENTER(); @@ -417,7 +416,7 @@ static void *immOiObjectDispatchThread(void *arg) objectDispatchThreadIsSet = 1; while (1) { - ret = poll(fds, 2, -1); + int ret = poll(fds, 2, -1); if (ret == -1) fprintf(stderr, "poll error: %s\n", strerror(errno)); @@ -454,7 +453,6 @@ static void *immOiClassDispatchThread(void *arg) SaImmOiHandleT handle; SaSelectionObjectT selObj; struct pollfd fds[2]; - int ret; TRACE_ENTER(); @@ -474,7 +472,7 @@ static void *immOiClassDispatchThread(void *arg) classDispatchThreadIsSet = 1; while (1) { - ret = poll(fds, 2, -1); + int ret = poll(fds, 2, -1); if (ret == -1) fprintf(stderr, "poll error: %s\n", strerror(errno)); diff --git a/src/imm/apitest/implementer/test_saImmOiImplementerSet.c b/src/imm/apitest/implementer/test_saImmOiImplementerSet.c index 549ab9f..11792e1 100644 ---
[devel] [PATCH 5/5] rded: adapt to new Consensus API [#2795]
- add 3 new internal message: RDE_MSG_NODE_UP RDE_MSG_NODE_DOWN RDE_MSG_TAKEOVER_REQUEST_CALLBACK - subscribe to AMFND service up events to keep track of the number of cluster members - listen for takeover requests in KV store --- src/rde/rded/rde_cb.h| 12 ++-- src/rde/rded/rde_main.cc | 75 ++-- src/rde/rded/rde_mds.cc | 39 + src/rde/rded/rde_rda.cc | 2 +- src/rde/rded/role.cc | 46 - src/rde/rded/role.h | 2 +- 6 files changed, 137 insertions(+), 39 deletions(-) diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h index fc100849a..f5ad689c3 100644 --- a/src/rde/rded/rde_cb.h +++ b/src/rde/rded/rde_cb.h @@ -19,12 +19,13 @@ #define RDE_RDED_RDE_CB_H_ #include +#include #include "base/osaf_utility.h" #include "mds/mds_papi.h" #include "rde/agent/rda_papi.h" +#include "rde/common/rde_rda_common.h" #include "rde/rded/rde_amf.h" #include "rde/rded/rde_rda.h" -#include "rde/common/rde_rda_common.h" /* ** RDE_CONTROL_BLOCK @@ -39,7 +40,9 @@ struct RDE_CONTROL_BLOCK { bool task_terminate; RDE_RDA_CB rde_rda_cb; RDE_AMF_CB rde_amf_cb; - bool monitor_lock_thread_running; + bool monitor_lock_thread_running{false}; + bool monitor_takeover_req_thread_running{false}; + std::set cluster_members{}; }; enum RDE_MSG_TYPE { @@ -47,7 +50,10 @@ enum RDE_MSG_TYPE { RDE_MSG_PEER_DOWN = 2, RDE_MSG_PEER_INFO_REQ = 3, RDE_MSG_PEER_INFO_RESP = 4, - RDE_MSG_NEW_ACTIVE_CALLBACK = 5 + RDE_MSG_NEW_ACTIVE_CALLBACK = 5, + RDE_MSG_NODE_UP = 6, + RDE_MSG_NODE_DOWN = 7, + RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8 }; struct rde_peer_info { diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc index 78e7256c1..b395312d6 100644 --- a/src/rde/rded/rde_main.cc +++ b/src/rde/rded/rde_main.cc @@ -17,6 +17,7 @@ */ #include +#include #include #include #include @@ -28,17 +29,16 @@ #include #include #include -#include "osaf/consensus/consensus.h" +#include "base/conf.h" #include "base/daemon.h" #include "base/logtrace.h" +#include "base/ncs_main_papi.h" #include "base/osaf_poll.h" #include "mds/mds_papi.h" -#include "base/ncs_main_papi.h" #include "nid/agent/nid_api.h" -#include +#include "osaf/consensus/consensus.h" #include "rde/rded/rde_cb.h" #include "rde/rded/role.h" -#include "base/conf.h" #define RDA_MAX_CLIENTS 32 @@ -47,13 +47,15 @@ enum { FD_TERM = 0, FD_AMF = 1, FD_MBX, FD_RDA_SERVER, FD_CLIENT_START }; static void SendPeerInfoResp(MDS_DEST mds_dest); static void CheckForSplitBrain(const rde_msg *msg); -const char *rde_msg_name[] = { -"-", -"RDE_MSG_PEER_UP(1)", -"RDE_MSG_PEER_DOWN(2)", -"RDE_MSG_PEER_INFO_REQ(3)", -"RDE_MSG_PEER_INFO_RESP(4)", -}; +const char *rde_msg_name[] = {"-", + "RDE_MSG_PEER_UP(1)", + "RDE_MSG_PEER_DOWN(2)", + "RDE_MSG_PEER_INFO_REQ(3)", + "RDE_MSG_PEER_INFO_RESP(4)", + "RDE_MSG_NEW_ACTIVE_CALLBACK(5)" + "RDE_MSG_NODE_UP(6)", + "RDE_MSG_NODE_DOWN(7)", + "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)"}; static RDE_CONTROL_BLOCK _rde_cb; static RDE_CONTROL_BLOCK *rde_cb = &_rde_cb; @@ -127,14 +129,16 @@ static void handle_mbx_event() { LOG_NO("New active controller notification from consensus service"); if (role->role() == PCS_RDA_ACTIVE) { -if (my_node.compare(active_controller) != 0) { +if (my_node.compare(active_controller) != 0 && +active_controller.empty() == false) { // we are meant to be active, but consensus service doesn't think so LOG_WA("Role does not match consensus service. New controller: %s", -active_controller.c_str()); + active_controller.c_str()); if (consensus_service.IsRemoteFencingEnabled() == false) { LOG_ER("Probable split-brain. Rebooting this node"); + opensaf_reboot(0, nullptr, - "Split-brain detected by consensus service"); + "Split-brain detected by consensus service"); } } @@ -144,6 +148,44 @@ static void handle_mbx_event() { } break; } +case RDE_MSG_NODE_UP: + rde_cb->cluster_members.insert(msg->fr_node_id); + TRACE("cluster_size %zu", rde_cb->cluster_members.size()); + break; +case RDE_MSG_NODE_DOWN: + rde_cb->cluster_members.erase(msg->fr_node_id); + TRACE("cluster_size %zu", rde_cb->cluster_members.size()); + break; +case RDE_MSG_TAKEOVER_REQUEST_CALLBACK: { + rde_cb->monitor_takeover_req_thread_running = false; + + if (role->role() == PCS_RDA_ACTIVE) { +LOG_NO("Received takeover request. Our network size is %zu", +
[devel] [PATCH 1/5] osaf: extend API to include a create key and an enhanced set key function [#2795]
- add create_key function (fails if key already exists) - add setkey_match_prev function (set value if previous value matches) - add missing quotes - add etcd3.plugin --- src/osaf/consensus/plugins/etcd.plugin | 86 +++- src/osaf/consensus/plugins/etcd3.plugin | 366 +++ src/osaf/consensus/plugins/sample.plugin | 67 +- 3 files changed, 501 insertions(+), 18 deletions(-) create mode 100644 src/osaf/consensus/plugins/etcd3.plugin diff --git a/src/osaf/consensus/plugins/etcd.plugin b/src/osaf/consensus/plugins/etcd.plugin index 586059b32..6ed85ac92 100644 --- a/src/osaf/consensus/plugins/etcd.plugin +++ b/src/osaf/consensus/plugins/etcd.plugin @@ -29,7 +29,7 @@ readonly etcd_timeout="5s" # 0 - success, is echoed to stdout # non-zero - failure get() { - readonly key=$1 + readonly key="$1" if value=$(etcdctl $etcd_options --timeout $etcd_timeout get "$directory$key" 2>&1) then @@ -49,8 +49,8 @@ get() { # 0 - success # non-zero - failure setkey() { - readonly key=$1 - readonly value=$2 + readonly key="$1" + readonly value="$2" if etcdctl $etcd_options --timeout $etcd_timeout set "$directory$key" \ "$value" >/dev/null @@ -61,6 +61,58 @@ setkey() { fi } +# create +# create and set to in key-value store. Fails if the key +# already exists +# params: +# $1 - +# $2 - +# returns: +# 0 - success +# 1 - already exists +# 2 or above - other failure +create_key() { + readonly key="$1" + readonly value="$2" + + if output=$(etcdctl $etcd_options --timeout $etcd_timeout mk "$directory$key" \ +"$value" 2>&1) + then +return 0 + else +if echo $output | grep "already exists" +then + return 1 +fi + fi + + return 2 +} + +# set +# set to in key-value store, if the existing value matches +# +# params: +# $1 - +# $2 - +# $3 - +# returns: +# 0 - success +# non-zero - failure +setkey_match_prev() { + readonly key="$1" + readonly value="$2" + readonly prev="$3" + + if etcdctl $etcd_options --timeout $etcd_timeout set "$directory$key" \ +"$value" --swap-with-value "$prev" >/dev/null + then +return 0 + else +return 1 + fi +} + # erase # erase in key-value store # params: @@ -69,7 +121,7 @@ setkey() { # 0 - success # non-zero - failure erase() { - readonly key=$1 + readonly key="$1" if etcdctl $etcd_options --timeout $etcd_timeout \ rm "$directory$key" >/dev/null 2>&1 @@ -90,8 +142,8 @@ erase() { # 2 or above - other failure # NOTE: if lock is already acquired by , then timeout is extended lock() { - readonly owner=$1 - readonly timeout=$2 + readonly owner="$1" + readonly timeout="$2" if etcdctl $etcd_options --timeout $etcd_timeout \ mk "$directory$keyname" "$owner" \ @@ -145,7 +197,7 @@ lock_owner() { # 2 or above - other failure # unlock() { - readonly owner=$1 + readonly owner="$1" readonly forced=${2:-false} if [ "$forced" = false ]; then @@ -185,7 +237,7 @@ unlock() { # 0 - success, is echoed to stdout # non-zero - failure watch() { - readonly key=$1 + readonly key="$1" if value=$(etcdctl $etcd_options --timeout $etcd_timeout \ watch "$directory$key" 2>&1) @@ -216,6 +268,22 @@ case "$1" in setkey "$2" "$3" exit $? ;; + set_if_prev) +if [ "$#" -ne 4 ]; then + echo "Usage: $0 set " + exit 1 +fi +setkey_match_prev "$2" "$3" "$4" +exit $? +;; + create) +if [ "$#" -ne 3 ]; then + echo "Usage: $0 create " + exit 1 +fi +create_key "$2" "$3" +exit $? +;; erase) if [ "$#" -ne 2 ]; then echo "Usage: $0 erase " @@ -269,7 +337,7 @@ case "$1" in exit $? ;; *) -echo "Usage: $0 {get|set|erase|lock|unlock|lock_owner|watch|watch_lock}" +echo "Usage: $0 {get|set|create|set_if_prev|erase|lock|unlock|lock_owner|watch|watch_lock}" ;; esac diff --git a/src/osaf/consensus/plugins/etcd3.plugin b/src/osaf/consensus/plugins/etcd3.plugin new file mode 100644 index 0..451440567 --- /dev/null +++ b/src/osaf/consensus/plugins/etcd3.plugin @@ -0,0 +1,366 @@ +#!/usr/bin/env bash +# -*- OpenSAF -*- +# +# (C) Copyright 2018 Ericsson AB 2018 - All Rights Reserved. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +# or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed +# under the GNU Lesser General Public License Version 2.1, February 1999. +# The complete license can be accessed from the following location: +# http://opensource.org/licenses/lgpl-license.php +# See the Copying file included with the OpenSAF distribution for full +# licensing terms. +# +# Please note: this API is subject to change and may be modified +# in a future version of OpenSAF. Future API versions may not be +# backward compatible. This plugin may need to be adapted. + +readonly
[devel] [PATCH 0/5] Review Request for split-brain: select active SC from largest network partition V3 [#2795]
Summary: split-brain: select active SC from largest network partition V3 [#2795] Review request for Ticket(s): 2795 Peer Reviewer(s): Anders, Ravi, Hans Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-2795 Base revision: 1c302a300e449e8a8527671fbd6c7f4e2b41e95d Personal repository: git://git.code.sf.net/u/userid-2226215/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesn OpenSAF servicesy Core libraries y Samples n Tests n Other n Comments (indicate scope for each "y" above): - *** Changes from V2: *** fmd: made cluster_size atomic fmd: wait 3 seconds before promoting to active, to allow topology events to be processed first osaf: add check for existing takeover request, before trying to lock etcdv3 plugin: reliablity improvements revision c7bc78656d5de11f6147727bd8612274fb6e438f Author: Gary LeeDate: Wed, 11 Apr 2018 17:16:46 +1000 rded: adapt to new Consensus API [#2795] - add 3 new internal message: RDE_MSG_NODE_UP RDE_MSG_NODE_DOWN RDE_MSG_TAKEOVER_REQUEST_CALLBACK - subscribe to AMFND service up events to keep track of the number of cluster members - listen for takeover requests in KV store revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393 Author: Gary Lee Date: Wed, 11 Apr 2018 17:16:18 +1000 fmd: adapt to new Consensus API [#2795] revision 812a315af21df06b2f9fdcc3d8fd5b7bbad3e550 Author: Gary Lee Date: Wed, 11 Apr 2018 17:15:41 +1000 amfd: adapt to new Consensus API [#2795] revision b8a37c1b8965826e5faffbfebc44a84bdb6433a1 Author: Gary Lee Date: Wed, 11 Apr 2018 17:14:39 +1000 osaf: add lock takeover request fuction [#2795] - add create and set (if previous value matches) functions to KeyValue class - add Consensus::MonitorTakeoverRequest() function for use by RDE to answer takeover requests - add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, it will create a takeover request in the KV store. An existing SC can reject the lock takeover revision 955be872ba5887b1b521eac9f7732dd3f6afc593 Author: Gary Lee Date: Wed, 11 Apr 2018 17:13:45 +1000 osaf: extend API to include a create key and an enhanced set key function [#2795] - add create_key function (fails if key already exists) - add setkey_match_prev function (set value if previous value matches) - add missing quotes - add etcd3.plugin Added Files: src/osaf/consensus/plugins/etcd3.plugin Complete diffstat: -- src/amf/amfd/role.cc | 2 +- src/fm/fmd/fm_cb.h | 2 +- src/fm/fmd/fm_main.cc| 26 +- src/fm/fmd/fm_mds.cc | 2 + src/fm/fmd/fm_rda.cc | 27 +- src/osaf/consensus/consensus.cc | 435 ++- src/osaf/consensus/consensus.h | 55 +++- src/osaf/consensus/key_value.cc | 105 +--- src/osaf/consensus/key_value.h | 19 +- src/osaf/consensus/plugins/etcd.plugin | 86 +- src/osaf/consensus/plugins/etcd3.plugin | 366 ++ src/osaf/consensus/plugins/sample.plugin | 67 - src/rde/rded/rde_cb.h| 12 +- src/rde/rded/rde_main.cc | 75 -- src/rde/rded/rde_mds.cc | 39 ++- src/rde/rded/rde_rda.cc | 2 +- src/rde/rded/role.cc | 46 +++- src/rde/rded/role.h | 2 +- 18 files changed, 1180 insertions(+), 188 deletions(-) Testing Commands: - 1) SI swap of safSi=SC-2N,safApp=OpenSAF 2) Isolate standby cluster (eg. use iptables to block port 6700 on a TCP system) 3) Isolate active cluster Testing, Expected Results: -- 1) No error 2) Standby will fail to be promoted as active as the takeover request is rejected 3) Standby will be promoted Conditions of Submission: - Ack from any reviewer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed
[devel] [PATCH 3/5] amfd: adapt to new Consensus API [#2795]
--- src/amf/amfd/role.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amf/amfd/role.cc b/src/amf/amfd/role.cc index c8aa9cf1f..790983ee7 100644 --- a/src/amf/amfd/role.cc +++ b/src/amf/amfd/role.cc @@ -1217,7 +1217,7 @@ uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb) { osaf_mutex_unlock_ordie(_reinit_mutex); Consensus consensus_service; - rc = consensus_service.PromoteThisNode(); + rc = consensus_service.PromoteThisNode(false, 0); if (rc != SA_AIS_OK) { LOG_ER("Unable to set active controller in consensus service"); osafassert(false); -- 2.14.1 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 4/5] fmd: adapt to new Consensus API [#2795]
--- src/fm/fmd/fm_cb.h| 2 +- src/fm/fmd/fm_main.cc | 26 +- src/fm/fmd/fm_mds.cc | 2 ++ src/fm/fmd/fm_rda.cc | 27 ++- 4 files changed, 30 insertions(+), 27 deletions(-) diff --git a/src/fm/fmd/fm_cb.h b/src/fm/fmd/fm_cb.h index cfa50d883..010ab735a 100644 --- a/src/fm/fmd/fm_cb.h +++ b/src/fm/fmd/fm_cb.h @@ -100,7 +100,7 @@ struct FM_CB { std::atomic peer_sc_up{false}; bool well_connected{false}; - uint64_t cluster_size{}; + std::atomic cluster_size{}; struct timespec last_well_connected{}; struct timespec node_isolation_timeout{}; SaClmHandleT clm_hdl{}; diff --git a/src/fm/fmd/fm_main.cc b/src/fm/fmd/fm_main.cc index 73c9b9ccd..3371ec5e8 100644 --- a/src/fm/fmd/fm_main.cc +++ b/src/fm/fmd/fm_main.cc @@ -551,21 +551,12 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT *fm_mbx_evt) { * trigerred quicker than the node_down event * has been received. */ - if (fm_cb->role == PCS_RDA_STANDBY) { -const std::string current_active = -consensus_service.CurrentActive(); -if (current_active.compare(osaf_extended_name_borrow( -_cb->peer_clm_node_name)) == 0) { - // update consensus service, before fencing old active controller - consensus_service.DemoteCurrentActive(); -} - } if (fm_cb->use_remote_fencing) { if (fm_cb->peer_node_terminated == false) { // if peer_sc_up is true then // the node has come up already - if (fm_cb->peer_sc_up == false && fm_cb->immnd_down == true) { + if (consensus_service.IsEnabled() == false) { opensaf_reboot(fm_cb->peer_node_id, (char *)fm_cb->peer_clm_node_name.value, "Received Node Down for peer controller"); @@ -580,8 +571,7 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT *fm_mbx_evt) { fm_cb->mutex_.Lock(); peer_node_name = fm_cb->peer_node_name; fm_cb->mutex_.Unlock(); -opensaf_reboot(fm_cb->peer_node_id, - peer_node_name.c_str(), +opensaf_reboot(fm_cb->peer_node_id, peer_node_name.c_str(), "Received Node Down for peer controller"); } if (!((fm_cb->role == PCS_RDA_ACTIVE) && @@ -632,12 +622,6 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT *fm_mbx_evt) { } Consensus consensus_service; -const std::string current_active = consensus_service.CurrentActive(); -if (current_active.compare( -osaf_extended_name_borrow(_cb->peer_clm_node_name)) == 0) { - // update consensus service, before fencing old active controller - consensus_service.DemoteCurrentActive(); -} /* Now. Try resetting other blade */ fm_cb->role = PCS_RDA_ACTIVE; @@ -645,7 +629,8 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT *fm_mbx_evt) { LOG_NO("Reseting peer controller node id: %x", unsigned(fm_cb->peer_node_id)); if (fm_cb->use_remote_fencing) { - if (fm_cb->peer_node_terminated == false) { + if (fm_cb->peer_node_terminated == false && + consensus_service.IsEnabled() == false) { opensaf_reboot(fm_cb->peer_node_id, (char *)fm_cb->peer_clm_node_name.value, "Received Node Down for peer controller"); @@ -658,8 +643,7 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT *fm_mbx_evt) { fm_cb->mutex_.Lock(); peer_node_name = fm_cb->peer_node_name; fm_cb->mutex_.Unlock(); - opensaf_reboot(fm_cb->peer_node_id, - peer_node_name.c_str(), + opensaf_reboot(fm_cb->peer_node_id, peer_node_name.c_str(), "Received Node Down for Active peer"); } fm_rda_set_role(fm_cb, PCS_RDA_ACTIVE); diff --git a/src/fm/fmd/fm_mds.cc b/src/fm/fmd/fm_mds.cc index 277a357d2..60db5dab1 100644 --- a/src/fm/fmd/fm_mds.cc +++ b/src/fm/fmd/fm_mds.cc @@ -373,6 +373,7 @@ static uint32_t fm_mds_node_evt(FM_CB *cb, case NCSMDS_NODE_DOWN: if (cb->cluster_size != 0) { --cb->cluster_size; +TRACE("cluster_size %llu", (unsigned long long)cb->cluster_size); TRACE("Node down event for node id %x, cluster size is now: %llu", node_evt->node_id, (unsigned long long)cb->cluster_size); check_for_node_isolation(cb); @@ -397,6 +398,7 @@ static uint32_t fm_mds_node_evt(FM_CB *cb, case NCSMDS_NODE_UP: ++cb->cluster_size; + TRACE("cluster_size %llu", (unsigned long long)cb->cluster_size); TRACE("Node up event for node id %x, cluster size is now: %llu", node_evt->node_id, (unsigned
[devel] [PATCH 2/5] osaf: add lock takeover request fuction [#2795]
- add create and set (if previous value matches) functions to KeyValue class - add Consensus::MonitorTakeoverRequest() function for use by RDE to answer takeover requests - add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, it will create a takeover request in the KV store. An existing SC can reject the lock takeover --- src/osaf/consensus/consensus.cc | 435 ++-- src/osaf/consensus/consensus.h | 55 - src/osaf/consensus/key_value.cc | 105 +++--- src/osaf/consensus/key_value.h | 19 +- 4 files changed, 511 insertions(+), 103 deletions(-) diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc index cc04f3518..f2245dd01 100644 --- a/src/osaf/consensus/consensus.cc +++ b/src/osaf/consensus/consensus.cc @@ -15,13 +15,17 @@ #include "osaf/consensus/consensus.h" #include #include +#include #include #include "base/conf.h" #include "base/getenv.h" #include "base/logtrace.h" #include "base/ncssysf_def.h" -SaAisErrorT Consensus::PromoteThisNode() { +const std::string Consensus::kTakeoverRequestKeyname = "takeover_request"; + +SaAisErrorT Consensus::PromoteThisNode(const bool graceful_takeover, +const uint64_t cluster_size) { TRACE_ENTER(); SaAisErrorT rc; @@ -29,6 +33,10 @@ SaAisErrorT Consensus::PromoteThisNode() { return SA_AIS_OK; } + // check if there is an existing takeover requests, we cannot + // attempt to lock until that is complete + CheckForExistingTakeoverRequest(); + uint32_t retries = 0; rc = KeyValue::Lock(base::Conf::NodeName(), kLockTimeout); while (rc == SA_AIS_ERR_TRY_AGAIN && retries < kMaxRetry) { @@ -39,33 +47,30 @@ SaAisErrorT Consensus::PromoteThisNode() { } if (rc == SA_AIS_ERR_EXIST) { +// there's a chance the lock has been released since the lock attempt // get the current active controller -std::string current_active(""); -retries = 0; -rc = KeyValue::LockOwner(current_active); -while (rc != SA_AIS_OK && retries < kMaxRetry) { - ++retries; - std::this_thread::sleep_for(kSleepInterval); - rc = KeyValue::LockOwner(current_active); -} -if (rc != SA_AIS_OK) { - LOG_ER("Failed to get current lock owner. Will attempt to lock anyway"); +std::string current_active = CurrentActive(); + +if (current_active.empty() == true) { + LOG_WA("Failed to get current lock owner. Will attempt to lock anyway"); } +bool take_over_request_created = false; LOG_NO("Current active controller is %s", current_active.c_str()); -// there's a chance the lock has been released since the lock attempt if (current_active.empty() == false) { - // remove current active controller's lock and fence it - retries = 0; - rc = KeyValue::Unlock(current_active); - while (rc == SA_AIS_ERR_TRY_AGAIN && retries < kMaxRetry) { -LOG_IN("Trying to unlock"); -++retries; -std::this_thread::sleep_for(kSleepInterval); -rc = KeyValue::Unlock(current_active); + if (graceful_takeover == true) { +rc = CreateTakeoverRequest(current_active, base::Conf::NodeName(), + cluster_size); +if (rc != SA_AIS_OK) { + LOG_WA("Takeover request failed (%d)", rc); + return rc; +} +take_over_request_created = true; } + // remove current active controller's lock and fence it + rc = Demote(current_active); if (rc == SA_AIS_OK) { FenceNode(current_active); } else { @@ -82,6 +87,23 @@ SaAisErrorT Consensus::PromoteThisNode() { std::this_thread::sleep_for(kSleepInterval); rc = KeyValue::Lock(base::Conf::NodeName(), kLockTimeout); } + +if (take_over_request_created == true) { + SaAisErrorT rc1; + + // remove takeover request + rc1 = KeyValue::Erase(kTakeoverRequestKeyname); + retries = 0; + while (rc1 != SA_AIS_OK && retries < kMaxRetry) { +++retries; +std::this_thread::sleep_for(kSleepInterval); +rc1 = KeyValue::Erase(kTakeoverRequestKeyname); + } + + if (rc1 != SA_AIS_OK) { +LOG_WA("Could not remove takeover request"); + } +} } if (rc == SA_AIS_OK) { @@ -93,43 +115,23 @@ SaAisErrorT Consensus::PromoteThisNode() { return rc; } -SaAisErrorT Consensus::Demote(const std::string& node = "") { +SaAisErrorT Consensus::Demote(const std::string& node) { TRACE_ENTER(); if (use_consensus_ == false) { return SA_AIS_OK; } - SaAisErrorT rc = SA_AIS_ERR_FAILED_OPERATION; - uint32_t retries = 0; - - // check current active node - std::string current_active; - rc = KeyValue::LockOwner(current_active); - while (rc != SA_AIS_OK && retries < kMaxRetry) { -++retries; -std::this_thread::sleep_for(kSleepInterval); -rc = KeyValue::LockOwner(current_active); - } - - if (rc != SA_AIS_OK) { -LOG_ER("Failed to get