[devel] [PATCH 1/1] base: fix creation of msg queues [#3107]
Message queues stop working correctly after queue file is removed from /tmp. Message queue API uses "ftok" which relies on the file being permanent. The behaviour is undefined if the file is removed. Many systems clean out /tmp periodically, so this can break if the message queue is long lived. Create the queue file in /var/run. --- src/base/os_defs.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/src/base/os_defs.c b/src/base/os_defs.c index da38cd71c..83458c208 100644 --- a/src/base/os_defs.c +++ b/src/base/os_defs.c @@ -55,6 +55,8 @@ #include "base/osaf_time.h" #include "base/logtrace.h" +#include "osaf/configmake.h" + NCS_OS_LOCK gl_ncs_atomic_mtx; #ifndef NDEBUG bool gl_ncs_atomic_mtx_initialise = false; @@ -658,7 +660,7 @@ uint32_t ncs_os_posix_mq(NCS_OS_POSIX_MQ_REQ_INFO *req) memset(, 0, sizeof(struct msqid_ds)); - sprintf(filename, "/tmp/%s%d", req->info.open.qname, + sprintf(filename, PKGPIDDIR "/%s%d", req->info.open.qname, req->info.open.node); if (req->info.open.iflags & O_CREAT) { @@ -669,6 +671,13 @@ uint32_t ncs_os_posix_mq(NCS_OS_POSIX_MQ_REQ_INFO *req) return NCSCC_RC_FAILURE; key = ftok(filename, 1); + + if (key < 0) { + LOG_ER("ftok failed for %s: %i", filename, + errno); + return NCSCC_RC_FAILURE; + } + os_req.info.create.i_key = if (fclose(file) != 0) @@ -678,6 +687,12 @@ uint32_t ncs_os_posix_mq(NCS_OS_POSIX_MQ_REQ_INFO *req) os_req.req = NCS_OS_MQ_REQ_OPEN; key = ftok(filename, 1); + + if (key < 0) { + LOG_ER("ftok failed for %s: %i", filename, + errno); + return NCSCC_RC_FAILURE; + } os_req.info.open.i_key = } @@ -721,7 +736,7 @@ uint32_t ncs_os_posix_mq(NCS_OS_POSIX_MQ_REQ_INFO *req) char filename[264]; memset(filename, 0, sizeof(filename)); - sprintf(filename, "/tmp/%s%d", req->info.unlink.qname, + sprintf(filename, PKGPIDDIR "%s%d", req->info.unlink.qname, req->info.unlink.node); if (unlink(filename) != 0) -- 2.21.1 --- Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. --- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for base: fix creation of msg queues [#3107]
Summary: base: fix creation of msg queues [#3107] Review request for Ticket(s): 3107 Peer Reviewer(s): Mathi Pull request to: Affected branch(es): develop Development branch: ticket-3107 Base revision: b8ab2c8a180b5b1ba110a02ecd60a1001ebddbc6 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesn OpenSAF servicesn Core libraries y Samples n Tests n Other n Comments (indicate scope for each "y" above): - *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision 74da122b9d8b5536ed81e31da0c8164468f4d5f9 Author: Alex Jones Date: Thu, 13 Feb 2020 08:39:46 -0500 base: fix creation of msg queues [#3107] Message queues stop working correctly after queue file is removed from /tmp. Message queue API uses "ftok" which relies on the file being permanent. The behaviour is undefined if the file is removed. Many systems clean out /tmp periodically, so this can break if the message queue is long lived. Create the queue file in /var/run. Complete diffstat: -- src/base/os_defs.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) Testing Commands: - 1) create some message queues 2) remove the /tmp/ files 3) restart one of the message queues Testing, Expected Results: -- 1) the restarted message queue should be using its own queue, and not another's Conditions of Submission: - Feb 19 or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. --- Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete al
[devel] [PATCH 0/1] Review Request for amfd: fix calculating standby rank for SIrankedSU with non-unique rank [#3149]
Summary: amfd: fix calculating standby rank for SIrankedSU with non-unique rank [#3149] Review request for Ticket(s): 3149 Peer Reviewer(s): Gary, Thuan Pull request to: Affected branch(es): develop Development branch: ticket-3149 Base revision: 7f9aadab289cf71ac5baa847b5b6559d6c0c9762 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision d21dd0c020e33fd8932481976571d3ed22580ef5 Author: Alex Jones Date: Fri, 7 Feb 2020 13:52:12 -0500 amfd: fix calculating standby rank for SIrankedSU with non-unique rank [#3149] Standby rank which is passed to CSI set and protection group callbacks may not be accurate. If SIrankedSUs exist with non-unique ranks, AVD_SI::get_sisu_rank() is not traversing all the SUs at that rank to determine the standby rank. AVD_SI::get_sisu_rank() needs to traverse all the SUs at the particular rank. Complete diffstat: -- src/amf/amfd/si.cc | 24 1 file changed, 20 insertions(+), 4 deletions(-) Testing Commands: - 1) create N-Way SG with 4 SUs, so you have 1 active and 3 standbys 2) create SaAmfRankedSUs with ranks at 1 for SU1 and SU2, and rank 2 for SU3 and SU4 Testing, Expected Results: -- 1) standby assignments should never have the same standby rank. Conditions of Submission: - Feb 13 or ack from developer. Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. --- Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly
[devel] [PATCH 1/1] amfd: fix calculating standby rank for SIrankedSU with non-unique rank [#3149]
Standby rank which is passed to CSI set and protection group callbacks may not be accurate. If SIrankedSUs exist with non-unique ranks, AVD_SI::get_sisu_rank() is not traversing all the SUs at that rank to determine the standby rank. AVD_SI::get_sisu_rank() needs to traverse all the SUs at the particular rank. --- src/amf/amfd/si.cc | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/src/amf/amfd/si.cc b/src/amf/amfd/si.cc index cd8be9479..df9d511f3 100644 --- a/src/amf/amfd/si.cc +++ b/src/amf/amfd/si.cc @@ -339,7 +339,7 @@ void AVD_SI::update_sisu_rank(const std::string& suname, uint32_t newRank) { } uint32_t AVD_SI::get_sisu_rank(const std::string& suname) const { - uint32_t rank{}; + uint32_t rank{}, currentRank{}; TRACE_ENTER2("%s", suname.c_str()); @@ -348,11 +348,27 @@ uint32_t AVD_SI::get_sisu_rank(const std::string& suname) const { susi->su->name.c_str(), susi->si->name.c_str(), susi->state); -if (susi->state == SA_AMF_HA_STANDBY) +if (susi->state == SA_AMF_HA_STANDBY) { + // if there are SUs with the same rank we need to go through all of them + if (currentRank) { +const AVD_SIRANKEDSU *sirankedsu{get_si_ranked_su(susi->su->name)}; +if (!sirankedsu || +(sirankedsu && sirankedsu->get_sa_amf_rank() != currentRank)) { + break; +} + } + rank++; +} -if (suname == susi->su->name) - break; +if (suname == susi->su->name) { + // see if there are any other SUs at this same rank + const AVD_SIRANKEDSU *sirankedsu{get_si_ranked_su(susi->su->name)}; + if (sirankedsu) +currentRank = sirankedsu->get_sa_amf_rank(); + else +break; +} } TRACE_LEAVE(); -- 2.21.1 --- Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. --- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 5/5] build: fix compile errors with gcc 9.x [#3134]
Hi ThuanTr, I will add fclose(). Good catch. We can't leave the original code in SmfUtils.cc because it fails to compile in gcc 9.x. The compiler complains that you are only copying the length of the string, so the output is not null terminated (even though the next line null terminates it). We could change the code to use memcpy instead. That would make it clearer that we are not intending to null terminate with the function call, and are doing it ourselves in the next line. Alex On 2/3/20 9:28 PM, Tran Thuan wrote: __ NOTICE: This email was received from an EXTERNAL sender __ Hi Alex, About test_ntf_imcn.cc, please update following too Since you add "return" then static code check report leak " f ". @@ -6202,6 +6202,7 @@ __attribute__((constructor)) static void ntf_imcn_constructor(void) { snprintf(cp_cmd, sizeof(cp_cmd), "cp "); if ((strlen(line) - 1) > (sizeof(cp_cmd) - sizeof("cp "))) { printf("line: %s too long", line); + fclose(f); return; } About SmfUtils.cc: - strncpy(*((SaStringT *)*i_value), i_str, len - 1); + strncpy(*((SaStringT *)*i_value), i_str, len + 1); (*((SaStringT *)*i_value))[len] = '\0'; => strncpy with "len + 1" then later overwrite with `\0'. I suggest strncpy with "len" as original code to avoid redundant changes. Best Regards, ThuanTr From: Alex Jones [1] Sent: Monday, February 3, 2020 10:39 PM To: [2]thuan.t...@dektech.com.au; [3]vu.m.ngu...@dektech.com.au Cc: [4]opensaf-devel@lists.sourceforge.net; Alex Jones [5] Subject: [PATCH 5/5] build: fix compile errors with gcc 9.x [#3134] Rework fixes in NTF and SMF. --- src/ntf/apitest/test_ntf_imcn.cc | 2 +- src/smf/smfd/SmfUtils.cc | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/ntf/apitest/test_ntf_imcn.cc b/src/ntf/apitest/test_ntf_imcn.cc index 51b9076c6..04f155074 100644 --- a/src/ntf/apitest/test_ntf_imcn.cc +++ b/src/ntf/apitest/test_ntf_imcn.cc @@ -1140,7 +1140,7 @@ static SaAisErrorT set_add_info( >additionalInfo[idx].infoValue); if (error == SA_AIS_OK) { strcpy(reinterpret_cast(temp), infoValue); - temp[strlen(infoValue) - 1] = '\0'; + //temp[strlen(infoValue)] = '\0'; nHeader->additionalInfo[idx].infoId = infoId; nHeader->additionalInfo[idx].infoType = SA_NTF_VALUE_STRING; } diff --git a/src/smf/smfd/SmfUtils.cc b/src/smf/smfd/SmfUtils.cc index 2d539e7c2..f1593b4cf 100644 --- a/src/smf/smfd/SmfUtils.cc +++ b/src/smf/smfd/SmfUtils.cc @@ -993,7 +993,7 @@ bool smf_stringToValue(SaImmValueTypeT i_type, SaImmAttrValueT *i_value, len = strlen(i_str); *i_value = malloc(sizeof(SaStringT)); *((SaStringT *)*i_value) = (SaStringT)malloc(len + 1); - strncpy(*((SaStringT *)*i_value), i_str, len - 1); + strncpy(*((SaStringT *)*i_value), i_str, len + 1); (*((SaStringT *)*i_value))[len] = '\0'; break; case SA_IMM_ATTR_SAANYT: -- 2.21.1 ___ Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. ___ References 1. mailto:ajo...@rbbn.com 2. mailto:thuan.t...@dektech.com.au 3. mailto:vu.m.ngu...@dektech.com.au 4. mailto:opensaf-devel@lists.sourceforge.net 5. mailto:ajo...@rbbn.com 0x0023444D652FA1D5.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 3/5] build: fix gcc-9.x compiler problems [#3134]
more fixes --- src/ntf/apitest/test_ntf_imcn.cc | 53 +++- src/plm/plmcd/plmc_read_config.c | 2 +- 2 files changed, 40 insertions(+), 15 deletions(-) diff --git a/src/ntf/apitest/test_ntf_imcn.cc b/src/ntf/apitest/test_ntf_imcn.cc index b1a1e87b4..51b9076c6 100644 --- a/src/ntf/apitest/test_ntf_imcn.cc +++ b/src/ntf/apitest/test_ntf_imcn.cc @@ -47,6 +47,14 @@ static SaImmOiHandleT immOiHnd = 0; static char extended_name_string_01[DEFAULT_EXT_NAME_LENGTH]; static char extended_name_string_02[DEFAULT_EXT_NAME_LENGTH]; +static char NAME1_STR[sizeof(NAME1) + 1] = { '\0' }; +static char NAME2_STR[sizeof(NAME2) + 1] = { '\0' }; +static char NAME3_STR[sizeof(NAME3) + 1] = { '\0' }; + +static char BUF1_STR[sizeof(BUF1) + 1] = { '\0' }; +static char BUF2_STR[sizeof(BUF2) + 1] = { '\0' }; +static char BUF3_STR[sizeof(BUF3) + 1] = { '\0' }; + /** * Callback routine, called when subscribed notification arrives. */ @@ -1131,7 +1139,8 @@ static SaAisErrorT set_add_info( reinterpret_cast(), >additionalInfo[idx].infoValue); if (error == SA_AIS_OK) { -strncpy(reinterpret_cast(temp), infoValue, strlen(infoValue) + 1); +strcpy(reinterpret_cast(temp), infoValue); +temp[strlen(infoValue) - 1] = '\0'; nHeader->additionalInfo[idx].infoId = infoId; nHeader->additionalInfo[idx].infoType = SA_NTF_VALUE_STRING; } @@ -1154,7 +1163,7 @@ static SaAisErrorT set_attr_str( _exp->c_d_notif_ptr->objectAttributes[idx] .attributeValue); if (error == SA_AIS_OK) { - strncpy(reinterpret_cast(temp), attrValue, strlen(attrValue) + 1); + strcpy(reinterpret_cast(temp), attrValue); n_exp->c_d_notif_ptr->objectAttributes[idx] .attributeId = attrId; n_exp->c_d_notif_ptr->objectAttributes[idx] @@ -1285,7 +1294,7 @@ static SaAisErrorT set_attr_change_str( _exp->a_c_notif_ptr->changedAttributes[idx] .newAttributeValue); if (error == SA_AIS_OK) { - strncpy(reinterpret_cast(temp), newValue, strlen(newValue) + 1); + strcpy(reinterpret_cast(temp), newValue); n_exp->a_c_notif_ptr->changedAttributes[idx] .attributeId = attrId; n_exp->a_c_notif_ptr->changedAttributes[idx] @@ -3155,7 +3164,7 @@ void objectCreateTest_20(void) { /* create an object */ snprintf(command, MAX_DATA, "immcfg -t 20 -c OsafNtfCmTestCFG %s" " -a testNameCfg=%s -a testStringCfg=%s -a testAnyCfg=%s", - DNTESTCFG, NAME1, STRINGVAR1, BUF1); + DNTESTCFG, NAME1_STR, STRINGVAR1, BUF1_STR); assert(system(command) != -1); /* @@ -3364,7 +3373,7 @@ void objectModifyTest_22(void) { /* modify an object */ snprintf(command, MAX_DATA, "immcfg -t 20 -a testNameCfg=%s " - "-a testAnyCfg=%s %s", NAME2, BUF2, DNTESTCFG); + "-a testAnyCfg=%s %s", NAME2_STR, BUF2_STR, DNTESTCFG); assert(system(command) != -1); /* @@ -4044,7 +4053,9 @@ void objectModifyTest_31(void) { memcpy(oldvar.value, NAME2, sizeof(NAME2)); SaNameT addvar = {.length = sizeof(NAME3)}; memcpy(addvar.value, NAME3, sizeof(NAME3)); - snprintf(command, MAX_DATA, "immcfg -a testNameCfg+=%s %s", NAME3, DNTESTCFG); + + snprintf(command, MAX_DATA, "immcfg -a testNameCfg+=%s %s", NAME3_STR, + DNTESTCFG); assert(system(command) != -1); /* @@ -4120,8 +4131,12 @@ void objectModifyTest_32(void) { .bufferAddr = const_cast(BUF2)}; SaAnyT addvar = {.bufferSize = sizeof(BUF3), .bufferAddr = const_cast(BUF3)}; + + char buf3[SA_MAX_NAME_LENGTH] = { '\0' }; + memcpy(buf3, BUF3, sizeof(BUF3)); + snprintf(command, MAX_DATA, "immcfg -t 20 -a testAnyCfg+=%s %s", - BUF3, DNTESTCFG); + BUF3_STR, DNTESTCFG); assert(system(command) != -1); /* @@ -4546,7 +4561,7 @@ void objectModifyTest_37(void) { " -a testTimeCfg+=%lld -a testStringCfg+=%s" " -a testNameCfg+=%s -a testAnyCfg+=%s %s", i32var11, ui32var2, i64var333, ui64var444, fvar5, dvar66, -tvar77, svar8, NAME1, BUF1, DNTESTCFG); +tvar77, svar8, NAME1_STR, BUF1_STR, DNTESTCFG); assert(system(command) != -1); /* @@ -5821,7 +5836,7 @@ void objectCreateTest_3505(void) { /* create an object */ snprintf(command, MAX_DATA, "immcfg -t 20 -c OsafNtfCmTestCFG %s" " -a testNameCfg=%s -a testStringCfg=%s -a testAnyCfg=%s", - DNTESTCFG, extended_name_string_01, STRINGVAR1, BUF1); + DNTESTCFG, extended_name_string_01, STRINGVAR1, BUF1_STR); assert(system(command) != -1); /* @@ -5955,7 +5970,7 @@ void objectModifyTest_3506(void) { /* modify an object */ snprintf(command, MAX_DATA, "immcfg -t 20 -a testNameCfg=%s" - " -a testAnyCfg=%s %s", extended_name_string_02, BUF2, DNTESTCFG); + " -a testAnyCfg=%s %s", extended_name_string_02, BUF2_STR, DNTESTCFG); assert(system(command) != -1); /* @@ -6185,10 +6200,12 @@ __attribute__((constructor)) static
[devel] [PATCH 4/5] build: fix compile errors from gcc-9.x [#3134]
more issues --- src/imm/immloadd/imm_pbe_load.cc | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/src/imm/immloadd/imm_pbe_load.cc b/src/imm/immloadd/imm_pbe_load.cc index 72b926383..5f5aefcec 100644 --- a/src/imm/immloadd/imm_pbe_load.cc +++ b/src/imm/immloadd/imm_pbe_load.cc @@ -449,7 +449,6 @@ bool loadObjectFromPbe(void *pbeHandle, SaImmHandleT immHandle, sqlite3 *dbHandle = (sqlite3 *)pbeHandle; sqlite3_stmt *stmt = NULL; int rc = 0; - char *zErr = NULL; int ncols = 0; int c; std::string sqlF("select \""); @@ -506,9 +505,8 @@ bool loadObjectFromPbe(void *pbeHandle, SaImmHandleT immHandle, rc = sqlite3_step(stmt); if (rc != SQLITE_ROW && rc != SQLITE_DONE) { -LOG_IN("Could not access table '%s', error:%s", - class_info->className.c_str(), zErr); -sqlite3_free(zErr); +LOG_IN("Could not access table '%s'", + class_info->className.c_str()); goto bailout; } @@ -575,7 +573,6 @@ bool loadObjectFromPbe(void *pbeHandle, SaImmHandleT immHandle, rc = sqlite3_step(stmt); if (rc != SQLITE_DONE) { LOG_ER("Expected 1 row got more rows"); -sqlite3_free(zErr); goto bailout; } -- 2.21.1 --- Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. --- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/5] build: fix errors from gcc 9.x [#3134]
Mostly strncpy and strncat problems. --- src/base/daemon.c | 1 + src/ckpt/ckptd/cpd_imm.c | 4 ++-- src/ckpt/ckptnd/cpnd_res.c| 2 +- src/clm/clmd/clms_imm.cc | 2 +- src/dtm/dtmnd/dtm_intra_svc.cc| 2 +- src/evt/evtd/eds_ll.c | 4 ++-- src/imm/agent/imma_oi_api.cc | 3 +-- src/imm/agent/imma_om_api.cc | 14 --- src/imm/apitest/management/populate.c | 2 +- .../management/test_saImmOmClassCreate_2.c| 16 ++--- src/imm/immd/immd_amf.c | 2 +- src/imm/immloadd/imm_loader.cc| 10 src/imm/immnd/immnd_amf.c | 2 +- src/imm/immnd/immnd_evt.c | 8 +++ src/imm/tools/imm_cfg.c | 2 +- src/imm/tools/imm_import.cc | 16 + src/lck/lckd/gld_imm.c| 2 +- src/log/agent/lga_agent.cc| 2 +- src/log/apitest/logtest.c | 2 +- src/log/apitest/tet_LogOiOps.c| 4 ++-- src/log/logd/lgs_dest.cc | 6 ++--- src/log/logd/lgs_util.cc | 10 src/mds/mds_c_api.c | 4 ++-- src/msg/common/mqsv_common.c | 1 + src/msg/msgnd/mqnd_evt.c | 1 + src/msg/msgnd/mqnd_imm.c | 5 ++-- src/msg/msgnd/mqnd_proc.c | 1 + src/plm/apitest/test_saPlmReadinessTrack.c| 24 +-- src/plm/plmcd/plmc_read_config.c | 22 - src/plm/plmcd/plmcd.c | 2 +- src/plm/plmd/plms_imm.c | 5 ++-- src/rde/rded/rde_rda.cc | 2 +- src/smf/smfd/SmfUtils.cc | 14 --- src/smf/smfd/smfd_amf.cc | 2 +- 34 files changed, 95 insertions(+), 104 deletions(-) diff --git a/src/base/daemon.c b/src/base/daemon.c index e24eaaaf0..f8e284fa1 100644 --- a/src/base/daemon.c +++ b/src/base/daemon.c @@ -510,6 +510,7 @@ void daemonize(int argc, char *argv[]) void daemonize_as_user(const char *username, int argc, char *argv[]) { strncpy(__runas_username, username, sizeof(__runas_username)); + __runas_username[sizeof(__runas_username) - 1] = '\0'; daemonize(argc, argv); } diff --git a/src/ckpt/ckptd/cpd_imm.c b/src/ckpt/ckptd/cpd_imm.c index af5cc29ec..e2dee0c2b 100644 --- a/src/ckpt/ckptd/cpd_imm.c +++ b/src/ckpt/ckptd/cpd_imm.c @@ -138,8 +138,8 @@ cpd_saImmOiRtAttrUpdateCallback(SaImmOiHandleT immOiHandle, ckpt_name = strdup(object_name); } - TRACE_4("ckpt_name: %s", ckpt_name); - TRACE_4("node_name: %s", node_name); + TRACE_4("ckpt_name: %s", ckpt_name ? ckpt_name : "n/a"); + TRACE_4("node_name: %s", node_name ? node_name : "n/a"); cpd_ckpt_map_node_get(>ckpt_map_tree, ckpt_name, _info); diff --git a/src/ckpt/ckptnd/cpnd_res.c b/src/ckpt/ckptnd/cpnd_res.c index 3d69f3f3f..3e97495a9 100644 --- a/src/ckpt/ckptnd/cpnd_res.c +++ b/src/ckpt/ckptnd/cpnd_res.c @@ -422,7 +422,7 @@ void *cpnd_restart_shm_create(NCS_OS_POSIX_SHM_REQ_INFO *cpnd_open_req, cpnd_open_req->info.open.i_flags = O_CREAT | O_RDWR; rc = ncs_os_posix_shm(cpnd_open_req); if (NCSCC_RC_FAILURE == rc) { - LOG_ER("cpnd open request fail for RDWR mode %s", buf); + LOG_ER("cpnd open request fail for RDWR mode %s", buffer); m_MMGR_FREE_CPND_DEFAULT(buffer); return NULL; } diff --git a/src/clm/clmd/clms_imm.cc b/src/clm/clmd/clms_imm.cc index 017607d74..46b045faa 100644 --- a/src/clm/clmd/clms_imm.cc +++ b/src/clm/clmd/clms_imm.cc @@ -227,7 +227,7 @@ CLMS_CLUSTER_NODE *clms_node_new(SaNameT *name, } else if (!strcmp(attr->attrName, "saClmNodeAddress")) { node->node_addr.length = (SaUint16T)strlen(*((char **)value)); strncpy((char *)node->node_addr.value, *((char **)value), - node->node_addr.length); + node->node_addr.length + 1); } else if (!strcmp(attr->attrName, "saClmNodeEE")) { SaNameT *name = (SaNameT *)value; size_t nameLen = osaf_extended_name_length(name); diff --git a/src/dtm/dtmnd/dtm_intra_svc.cc b/src/dtm/dtmnd/dtm_intra_svc.cc index 1affd65d3..cf38e4544 100644 --- a/src/dtm/dtmnd/dtm_intra_svc.cc +++ b/src/dtm/dtmnd/dtm_intra_svc.cc @@ -1523,7 +1523,7 @@ uint32_t dtm_intranode_process_node_up(NODE_ID node_id, char *node_name, uint8_t buffer[DTM_LIB_NODE_UP_MSG_SIZE_FULL]; node_up_msg.node_id = node_id; node_up_msg.i_addr_family = i_addr_family; -strncpy(node_up_msg.node_ip, node_ip, INET6_ADDRSTRLEN); +strncpy(node_up_msg.node_ip, node_ip,
[devel] [PATCH 5/5] build: fix compile errors with gcc 9.x [#3134]
Rework fixes in NTF and SMF. --- src/ntf/apitest/test_ntf_imcn.cc | 2 +- src/smf/smfd/SmfUtils.cc | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/ntf/apitest/test_ntf_imcn.cc b/src/ntf/apitest/test_ntf_imcn.cc index 51b9076c6..04f155074 100644 --- a/src/ntf/apitest/test_ntf_imcn.cc +++ b/src/ntf/apitest/test_ntf_imcn.cc @@ -1140,7 +1140,7 @@ static SaAisErrorT set_add_info( >additionalInfo[idx].infoValue); if (error == SA_AIS_OK) { strcpy(reinterpret_cast(temp), infoValue); -temp[strlen(infoValue) - 1] = '\0'; +//temp[strlen(infoValue)] = '\0'; nHeader->additionalInfo[idx].infoId = infoId; nHeader->additionalInfo[idx].infoType = SA_NTF_VALUE_STRING; } diff --git a/src/smf/smfd/SmfUtils.cc b/src/smf/smfd/SmfUtils.cc index 2d539e7c2..f1593b4cf 100644 --- a/src/smf/smfd/SmfUtils.cc +++ b/src/smf/smfd/SmfUtils.cc @@ -993,7 +993,7 @@ bool smf_stringToValue(SaImmValueTypeT i_type, SaImmAttrValueT *i_value, len = strlen(i_str); *i_value = malloc(sizeof(SaStringT)); *((SaStringT *)*i_value) = (SaStringT)malloc(len + 1); - strncpy(*((SaStringT *)*i_value), i_str, len - 1); + strncpy(*((SaStringT *)*i_value), i_str, len + 1); (*((SaStringT *)*i_value))[len] = '\0'; break; case SA_IMM_ATTR_SAANYT: -- 2.21.1 --- Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. --- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/5] Review Request for build: fix errors from gcc 9.x [#3134]
Summary: build: fix errors from gcc 9.x [#3134] Review request for Ticket(s): 3134 Peer Reviewer(s): Tran Pull request to: Affected branch(es): develop Development branch: ticket-3134 Base revision: 876fbce762044d49da8edbd6bfcb059ee59e748e Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemy RPM/packaging n Configuration files n Startup scripts n SAF servicesn OpenSAF servicesn Core libraries n Samples n Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision 1c9c9c9aa23f95939597b0e29055c94c24e2815a Author: Alex Jones Date: Mon, 3 Feb 2020 10:32:17 -0500 build: fix compile errors with gcc 9.x [#3134] Rework fixes in NTF and SMF. revision 560b3243c3bcd821ca67839de8a4ee2825422966 Author: Alex Jones Date: Mon, 3 Feb 2020 10:32:17 -0500 build: fix compile errors from gcc-9.x [#3134] more issues revision 2ccf53568405ea69bb5a1faf1a1eae9644702ab4 Author: Alex Jones Date: Mon, 3 Feb 2020 10:32:17 -0500 build: fix gcc-9.x compiler problems [#3134] more fixes revision 2bec1a88c54b6de9a8d49f98f3d2c1d97cc537a4 Author: Alex Jones Date: Mon, 3 Feb 2020 10:32:17 -0500 build: fix errors from gcc 9.x [#3134] More compiler fixes revision 17a27e953d743bb712f0c377091ee14c2e659b25 Author: Alex Jones Date: Mon, 3 Feb 2020 10:32:17 -0500 build: fix errors from gcc 9.x [#3134] Mostly strncpy and strncat problems. Complete diffstat: -- src/base/daemon.c | 1 + src/ckpt/ckptd/cpd_imm.c | 4 +- src/ckpt/ckptnd/cpnd_res.c | 2 +- src/clm/clmd/clms_imm.cc | 2 +- src/dtm/dtmnd/dtm_intra_svc.cc | 2 +- src/evt/evtd/eds_ll.c | 4 +- src/imm/agent/imma_oi_api.cc | 3 +- src/imm/agent/imma_om_api.cc | 14 ++ src/imm/apitest/management/populate.c | 2 +- .../apitest/management/test_saImmOmClassCreate_2.c | 16 +++ src/imm/common/immpbe_dump.cc | 2 +- src/imm/immd/immd_amf.c| 2 +- src/imm/immloadd/imm_loader.cc | 10 ++-- src/imm/immloadd/imm_pbe_load.cc | 7 +-- src/imm/immnd/immnd_amf.c | 2 +- src/imm/immnd/immnd_evt.c | 8 ++-- src/imm/tools/imm_cfg.c| 2 +- src/imm/tools/imm_import.cc| 16 +++ src/lck/lckd/gld_imm.c | 2 +- src/log/agent/lga_agent.cc | 2 +- src/log/apitest/logtest.c | 2 +- src/log/apitest/tet_LogOiOps.c | 4 +- src/log/logd/lgs_dest.cc | 6 +-- src/log/logd/lgs_util.cc | 10 ++-- src/mds/mds_c_api.c| 4 +- src/msg/common/mqsv_common.c | 1 + src/msg/msgnd/mqnd_evt.c | 1 + src/msg/msgnd/mqnd_imm.c | 5 +- src/msg/msgnd/mqnd_proc.c | 1 + src/ntf/apitest/test_ntf_imcn.cc | 53 -- src/plm/apitest/test_saPlmReadinessTrack.c | 24 +- src/plm/plmcd/plmc_read_config.c | 22 - src/plm/plmcd/plmcd.c | 2 +- src/plm/plmd/plms_imm.c| 5 +- src/rde/rded/rde_rda.cc| 2 +- src/smf/smfd/SmfUtils.cc | 14 ++ src/smf/smfd/smfd_amf.cc | 2 +- 37 files changed, 137 insertions(+), 124 deletions(-) Testing Commands: - *** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES *** Testing, Expected Results: -- *** PASTE COMMAND OUTPUTS / TEST RESULTS *** Conditions of Submission: - *** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC *** Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 n n powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper d
[devel] [PATCH 2/5] build: fix errors from gcc 9.x [#3134]
More compiler fixes --- src/imm/common/immpbe_dump.cc| 2 +- src/plm/plmcd/plmc_read_config.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/imm/common/immpbe_dump.cc b/src/imm/common/immpbe_dump.cc index 3bde78a3f..175bd0484 100644 --- a/src/imm/common/immpbe_dump.cc +++ b/src/imm/common/immpbe_dump.cc @@ -979,7 +979,7 @@ void *pbeRepositoryInit(const char *filePath, bool create, exit(1); } } - TRACE("TMP DIR:%s", localTmpDir); + TRACE("TMP DIR:%s", localTmpDir ? localTmpDir : "n/a"); if (localTmpDir) { TRACE("IMMSV_PBE_TMP_DIR:%s", localTmpDir); localTmpFilename.append(localTmpDir); diff --git a/src/plm/plmcd/plmc_read_config.c b/src/plm/plmcd/plmc_read_config.c index acda7c72e..30daa1815 100644 --- a/src/plm/plmcd/plmc_read_config.c +++ b/src/plm/plmcd/plmc_read_config.c @@ -42,7 +42,7 @@ static int checkfile(char *buf) int ii; char cmd[PLMC_MAX_TAG_LEN]; - strncpy(cmd, buf, PLMC_MAX_TAG_LEN - 1); + strncpy(cmd, buf, PLMC_MAX_TAG_LEN); for (ii = 0; ii < strlen(cmd); ii++) if (cmd[ii] == ' ') cmd[ii] = '\0'; -- 2.21.1 --- Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. --- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for mfnd: don't quiesce comp which is in TERMINATION_FAILED state [#3147]
Summary: amfnd: don't quiesce comp which is in TERMINATION_FAILED state [#3147] (untested) Review request for Ticket(s): 3147 Peer Reviewer(s): Hans, Gary, Nagu Pull request to: Affected branch(es): develop Development branch: ticket-3147 Base revision: 3d05bc1f2f46d9c855f001bc56c1fd2f9812f5f4 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n NOTE: Patch(es) are UNTESTED Comments (indicate scope for each "y" above): - *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision fd81f84a655def349896e175c4615023f1f99151 Author: Alex Jones Date: Thu, 30 Jan 2020 10:58:28 -0500 amfnd: don't quiesce comp which is in TERMINATION_FAILED state [#3147] When SU goes into TERMINATION_FAILED because one of its components went to TERMINATION_FAILED, amfnd will still send QUIESCED to those components, even though they are already terminating. This can cause the SG to go into unstable state, and get stuck. IsCompQualifiedAssignment does not check for TERMINATION_FAILED state, so it allows the CSI assignment to go even though the comp is already terminating. Check for TERMINATION_FAILED state in IsCompQualifiedAssignment, and return false if so. Complete diffstat: -- src/amf/amfnd/comp.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Testing Commands: - 1) Create an SU with many comps (at least 26), both PI and NPI 2) Make one of the PI comps fail health check, and then fail cleanup Testing, Expected Results: -- 1) All PI comps should get terminated, but not get QUIESCED assignments 2) all NPI comps should get QUIESECED and terminated Conditions of Submission: - Feb 5 or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. --- Notice: This e-mail together with any attachments may contain information
[devel] [PATCH 1/1] amfnd: don't quiesce comp which is in TERMINATION_FAILED state [#3147]
When SU goes into TERMINATION_FAILED because one of its components went to TERMINATION_FAILED, amfnd will still send QUIESCED to those components, even though they are already terminating. This can cause the SG to go into unstable state, and get stuck. IsCompQualifiedAssignment does not check for TERMINATION_FAILED state, so it allows the CSI assignment to go even though the comp is already terminating. Check for TERMINATION_FAILED state in IsCompQualifiedAssignment, and return false if so. --- src/amf/amfnd/comp.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/amf/amfnd/comp.cc b/src/amf/amfnd/comp.cc index 10c77a462..8a11d75fb 100644 --- a/src/amf/amfnd/comp.cc +++ b/src/amf/amfnd/comp.cc @@ -1492,7 +1492,8 @@ bool IsCompQualifiedAssignment(const AVND_COMP *comp) { LOG_IN("Ignoring Unregistered comp:'%s'", comp->name.c_str()); rc = false; } else if (!m_AVND_COMP_PRES_STATE_IS_INSTANTIATED(comp) && - comp->su->pres == SA_AMF_PRESENCE_INSTANTIATION_FAILED && + (comp->su->pres == SA_AMF_PRESENCE_INSTANTIATION_FAILED || + comp->su->pres == SA_AMF_PRESENCE_TERMINATION_FAILED) && !m_AVND_COMP_PRES_STATE_IS_ORPHANED(comp)) { LOG_IN( "Ignoring comp with invalid presence state:'%s', comp_flag %x, comp_pres=%u, su_pres=%u", -- 2.21.1 --- Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. --- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] uml: add support for plm to run under uml [#2922]
Add support for plm to run under uml. --- src/plm/config/openhpi.conf| 18 tools/cluster_sim_uml/archive/scripts/40opensaf.rc | 30 +++ tools/cluster_sim_uml/build_uml| 95 -- 3 files changed, 138 insertions(+), 5 deletions(-) create mode 100644 src/plm/config/openhpi.conf diff --git a/src/plm/config/openhpi.conf b/src/plm/config/openhpi.conf new file mode 100644 index 0..b811de134 --- /dev/null +++ b/src/plm/config/openhpi.conf @@ -0,0 +1,18 @@ +OPENHPI_AUTOINSERT_TIMEOUT = 50 +OPENHPI_AUTOINSERT_TIMEOUT_READONLY = "NO" + +# Section for dynamic_simulator plugin +handler libdyn_simulator { +entity_root = "{ADVANCEDTCA_CHASSIS,2}" +# Location of the simulation data file +# Normally an example file is installed in the same directory as openhpi.conf. +# Please change the following entry if you have configured another install +# directory or will use your own simulation.data. +file = "/etc/openhpi/opensaf-plm-sim.txt" +# infos goes to logfile and stdout +# the logfile are log00.log, log01.log ... +#logflags = "file stdout" +#logfile = "dynsim" +# if #logfile_max reached replace the oldest one +#logfile_max = "5" +} diff --git a/tools/cluster_sim_uml/archive/scripts/40opensaf.rc b/tools/cluster_sim_uml/archive/scripts/40opensaf.rc index 7df4cfee6..9057d680b 100644 --- a/tools/cluster_sim_uml/archive/scripts/40opensaf.rc +++ b/tools/cluster_sim_uml/archive/scripts/40opensaf.rc @@ -76,4 +76,34 @@ echo "$node_name" > /etc/opensaf/node_name echo "/tmp/core_%t_%e_%p" > /proc/sys/kernel/core_pattern ulimit -c unlimited +if test -e /etc/plmcd.conf; then +sc_1_ip=$(grep "SC-1" /etc/hosts | cut -d' ' -f 1) +sc_2_ip=$(grep "SC-2" /etc/hosts | cut -d' ' -f 1) +if [ "$node_name" == "SC-1" ]; then + ee="Linux_os_hosting_clm_node,safHE=f120_slot_1" + path="my_entity = \"{ADVANCEDTCA_CHASSIS,2}{PHYSICAL_SLOT,1}{SWITCH_BLADE,0}\"" +elif [ "$node_name" == "SC-2" ]; then + ee="Linux_os_hosting_clm_node,safHE=f120_slot_16" + path="my_entity = \"{ADVANCEDTCA_CHASSIS,2}{PHYSICAL_SLOT,16}{SWITCH_BLADE,0}\"" +else + ee="$node_name" +fi +sed -i -e "s/10.105.1.3/$sc_1_ip/" \ +-e "s/10.105.1.6/$sc_2_ip/" \ +-e "s/0020f/safEE=$ee,safDomain=domain_1/" \ +-e "s/1;os;Fedora;2.6.31/1;os;SUSE;2.6/" \ +-e "/^\/etc\/init.d/s/^/#/" \ +/etc/plmcd.conf +cp /etc/openhpi/openhpi.conf /var/opt +chmod go-rwx /var/opt/openhpi.conf +echo "$path" > /etc/openhpi/openhpiclient.conf + +/usr/sbin/openhpid -c /var/opt/openhpi.conf + +# wait for hpi to read in hardware info +sleep 10 + +/usr/local/sbin/plmcd& +fi + /etc/init.d/opensafd start& diff --git a/tools/cluster_sim_uml/build_uml b/tools/cluster_sim_uml/build_uml index 16d49d03e..e54e45753 100755 --- a/tools/cluster_sim_uml/build_uml +++ b/tools/cluster_sim_uml/build_uml @@ -121,6 +121,73 @@ cmd_install_testprog() { cmd_mkcpio } +cmd_build_container_testprog() { +src=$opensaf_home/samples/amf/container +libd=$root/usr/local/$lib_dir +installd=$root/opt/amf_demo + +mkdir -p "$installd" +cp $src/amf_container_script $installd +gcc -g -O2 -Wall -fPIC -I$opensaf_home/src/amf/saf \ + -I$opensaf_home/src/ais/include \ + -DSA_EXTENDED_NAME_SOURCE \ + -o $installd/amf_container_demo $src/amf_container_demo.c \ + -Wl,--as-needed "-Wl,-rpath-link,$libd:$libd/opensaf" "-L$libd" -lSaAmf -lopensaf_core + +echo "Creating [$root/root.cpio] ..." +cmd_mkcpio +} + +## install_container_testprog +## Build and install the AMF container demo program. +## +cmd_install_container_testprog() { +src=$opensaf_home/samples/amf/container +libd=$root/usr/local/$lib_dir +installd=$root/opt/amf_demo +immxml=$root/etc/opensaf/imm.xml +containedXml=$src/AppConfig-contained-2N.xml +containerXml=$src/AppConfig-container.xml + +mkdir -p $installd +cp $src/amf_container_script $installd +gcc -g -O2 -Wall -fPIC -I$opensaf_home/src/amf/saf \ + -I$opensaf_home/src/ais/include \ + -DSA_EXTENDED_NAME_SOURCE \ + -o $installd/amf_container_demo $src/amf_container_demo.c \ + -Wl,--as-needed "-Wl,-rpath-link,$libd:$libd/opensaf" "-L$libd" -lSaAmf + +test -r $immxml.orig || cp $immxml $immxml.orig +$opensaf_home/src/imm/tools/immxml-merge \ + $immxml.orig $containedXml $containerXml > $immxml +$opensaf_home/src/imm/tools/immxml-validate $immxml +echo "Creating [$root/root.cpio] ..." +cmd_mkcpio +} + +## install_plmtests +## Install the PLM tests +## +cmd_install_plm_tests() { +src=$opensaf_home/src/plm/config +immxml=$root/etc/opensaf/imm.xml +plmXml=$src/plm-sim-imm.xml + +test -r $immxml.orig || cp $immxml $immxml.orig +$opensaf_home/src/imm/tools/immxml-merge \ + $immxml.orig
[devel] [PATCH 0/1] Review Request for uml: add support for plm to run under uml [#2922]
Summary: uml: add support for plm to run under uml [#2922] Review request for Ticket(s): 2922 Peer Reviewer(s): Hans Pull request to: Affected branch(es): develop Development branch: ticket-2922 Base revision: c1a5a9d9353fd45152ec7604d7133361dd243614 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemy RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - revision 84ddb28a1b5fd0b9b24795196c523b5b050effbe Author: Alex Jones Date: Mon, 17 Sep 2018 15:42:04 -0400 uml: add support for plm to run under uml [#2922] Add support for plm to run under uml. Added Files: src/plm/config/openhpi.conf Complete diffstat: -- src/plm/config/openhpi.conf| 18 tools/cluster_sim_uml/archive/scripts/40opensaf.rc | 30 +++ tools/cluster_sim_uml/build_uml| 95 -- 3 files changed, 138 insertions(+), 5 deletions(-) Testing Commands: - Run build_uml with PLM disabled. Testing, Expected Results: -- Make sure it still works. Conditions of Submission: - Sep 24, or ack from developer. Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 0/2] Review Request for plma: align the function headers [#199]
Ack. I will push it. Alex On 09/11/2018 08:55 AM, Meenakshi TK wrote: __ NOTICE: This email was received from an EXTERNAL sender __ Summary: plma: align the function headers [#199] Review request for Ticket(s): 199 Peer Reviewer(s): Alex,Mathi Pull request to: Alex,Mathi Affected branch(es): develop Development branch: ticket-199 Base revision: 1315ade5d2223ecb22cc3076da00d4cee09ec7f7 Personal repository: [1]git://git.code.sf.net/u/meenatk-hasoln/review Impacted area Impact y/n Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services y OpenSAF services n Core libraries n Samples n Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision 891a217572193e9f199d523f4cd8b4e357aa2a16 Author: Meenakshi TK [2] Date: Tue, 11 Sep 2018 14:55:22 +0530 plma: add and modify traces [#199] revision c83896b2219c17b06b6549d1bbc100e8eac69e63 Author: Meenakshi TK [3] Date: Tue, 11 Sep 2018 12:47:53 +0530 plma: align the function headers [#199] Complete diffstat: -- src/plm/agent/plma_api.c | 1874 +++-- src/plm/agent/plma_comm.c | 68 +- src/plm/agent/plma_init.c | 301 ++-- src/plm/agent/plma_mds.c | 174 + 4 files changed, 371 insertions(+), 2046 deletions(-) Testing Commands: - Compiled Testing, Expected Results: -- Compiled Conditions of Submission: - Ack from maintainers Arch Built Started Linux distro --- mips n n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. References 1. https://protect-us.mimecast.com/s/WHyvCR6Lz6C4O9WtNDUON 2. mailto:meenak...@hasolutions.in 3. mailto:meenak...@hasolutions.in signature.asc Description: OpenPGP digital signature ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] plm: fix return codes for saPlmReadinessTrackResponse [#200]
Hi Meenakshi, Good catch. Let's create a separate ticket for this. There are a lot of traces in the PLM code which use '\' in the messages (which translates to a lot of white space.) I think it would be nice to clean that up. Alex On 09/15/2018 05:23 AM, [1]meenak...@hasolutions.in wrote: __ NOTICE: This email was received from an EXTERNAL sender __ Hi Alex, Sorry for late response. I tested the following scenarios and it went well. 1. Passed 5 as response in saPlmReadinessTrackResponse and got invalid as return type in the start step. 2. Passed SA_PLM_CALLBACK_RESPONSE_REJECTED as a response in saPlmReadinessTrackResponse and got invalid as return type in the start step. 3. Passed SA_PLM_CALLBACK_RESPONSE_OK as a response in saPlmReadinessTrackResponse and got OK as return type in the start step. 4.Passed SA_PLM_CALLBACK_RESPONSE_ERROR as a response in saPlmReadinessTrackResponse and got OK as return type in the start step. The one minor problem which I see is the log that callback and other is printing together. ER Response can not be rejected for callbackother than VALIDATE. The reason is there is no space between callback and other. + LOG_ER("Response can not be rejected for callback" + "other than VALIDATE."); It should be like + LOG_ER("Response can not be rejected for callback " + "other than VALIDATE."); The same issue is at the following lines also: + LOG_ER("Response can not be processed as the group" + "corresponding to grp_handle %llu not found in plms" + "datebase.",res->grp_handle); + LOG_ER("Invocation id mentioned in the resp, is not" + "found in the grp->inocation_list. inv_id: %llu", res->track_cbk_res.invocation_id); + LOG_ER("Change step can not be anything other than" + "START/VALIDATE. change_step: %d", trk_info->change_step); One typo above is datebase which should be database. Thanks, Meenakshi High Availability Solutions Pvt. Ltd. [2]www.hasolutions.in - Original Message - Subject: [devel] [PATCH 1/1] plm: fix return codes for saPlmReadinessTrackResponse [#200] From: "Alex Jones" [3] Date: 9/7/18 7:13 pm To: [4]mathi.np@gmail.com, [5]ravisekhar.ko...@oracle.com Cc: "Alex Jones" [6], [7]opensaf-devel@lists.sourceforge.net saPlmReadinessTrackResponse sometimes returns SA_AIS_OK, when invalid parameters are passed. SaPlmReadinessTrackResponseT parameter is not checked for range. Also, the msg is sent asynchronously from the agent to plmd, so that errors from plmd cannot be passed back to the agent. Check the SaPlmReadinessTrackResponseT parameter when passed in, and change the message from asynch to sync, so that errors can be passed back. --- src/plm/agent/plma_api.c | 29 ++--- src/plm/common/plms_common_utils.c | 1 + src/plm/common/plms_edu.c | 1 + src/plm/common/plms_evt.h | 3 ++- src/plm/plmd/plms_adm_fsm.c | 52 -- 5 files changed, 62 insertions(+), 24 deletions(-) diff --git a/src/plm/agent/plma_api.c b/src/plm/agent/plma_api.c index 596175e51..3ca8a8c71 100644 --- a/src/plm/agent/plma_api.c +++ b/src/plm/agent/plma_api.c @@ -2974,6 +2974,7 @@ SaAisErrorT saPlmReadinessTrackResponse(SaPlmEntityGroupHandleT entityGrpHdl, { PLMA_CB *plma_cb = plma_ctrlblk; PLMS_EVT plm_in_evt; + PLMS_EVT *plm_out_res = NULL; SaAisErrorT rc = SA_AIS_OK; uint32_t proc_rc = NCSCC_RC_SUCCESS; PLMA_ENTITY_GROUP_INFO *group_info; @@ -2994,6 +2995,12 @@ SaAisErrorT saPlmReadinessTrackResponse(SaPlmEntityGroupHandleT entityGrpHdl, rc = SA_AIS_ERR_INVALID_PARAM; goto end; } + if (response < SA_PLM_CALLBACK_RESPONSE_OK || + response > SA_PLM_CALLBACK_RESPONSE_ERROR) { + TRACE("response parameter is invalid"); + rc = SA_AIS_ERR_INVALID_PARAM; + goto end; + } if (!plma_cb->plms_svc_up) { LOG_ER("PLMA : PLM SERVICE DOWN"); rc = SA_AIS_ERR_TRY_AGAIN; @@ -3027,10 +3034,10 @@ SaAisErrorT saPlmReadinessTrackResponse(SaPlmEntityGroupHandleT entityGrpHdl, plm_in_evt.req_evt.agent_track.track_cbk_res.invocation_id = invocation; plm_in_evt.req_evt.agent_track.track_cbk_res.response = response; - /* Send a mds async msg to PLMS to obtain group handle for this */ - proc_rc = plms_mds_normal_send(plma_cb->mds_hdl, NCSMDS_SVC_ID_PLMA, - _in_evt, plma_cb->plms_mdest_id, - NCSMDS_SVC_ID_PLMS); + proc_rc = plm_mds_msg_sync_send(plma_cb->mds_hdl, NCSMDS_SVC_ID_PLMA, +
[devel] [PATCH 0/1] Review Request for plmd: fix adding and removing of invocation id to list [#197]
Summary: plmd: fix adding and removing of invocation id to list [#197] Review request for Ticket(s): 197 Peer Reviewer(s): Mathi, Ravi Pull request to: Affected branch(es): develop Development branch: ticket-197 Base revision: 9310db55886092748469c6d3e09f6b3bb021886f Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision c0e8a1d9b6e1a8e53f8f0ffbff9b86c40ee0d6b6 Author: Alex Jones Date: Thu, 13 Sep 2018 15:31:17 -0400 plmd: fix adding and removing of invocation id to list [#197] Jan 22 11:09:03 localhost osafplmd[3988]: Invocation id mentioned in the resp, is not found in the grp->inocation_list. inv_id: 9 If multiple entities are part of the same entity group, and START or VALIDATE tracking is requested, if an admin operation is done on these entities, once one response is sent the other responses are ignored. But, the entities that didn't return a successful response all report "Admin operation can not be performed" because they failed to process the tracking response. This is because when the first invocation id is removed from the list, all the others are removed, too. Now those entities are stuck in this bad state. Fix the remove routines so that only the invocation in the response is removed from the list. Complete diffstat: -- src/plm/plmd/plms_utils.c | 54 --- 1 file changed, 28 insertions(+), 26 deletions(-) Testing Commands: - See ticket. Testing, Expected Results: -- All entities shutdown when tracking response is sent, and no errors show up in messages log. Conditions of Submission: - Sep 20, or ack from developer. Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourcef
[devel] [PATCH 1/1] plmd: fix adding and removing of invocation id to list [#197]
Jan 22 11:09:03 localhost osafplmd[3988]: Invocation id mentioned in the resp, is not found in the grp->inocation_list. inv_id: 9 If multiple entities are part of the same entity group, and START or VALIDATE tracking is requested, if an admin operation is done on these entities, once one response is sent the other responses are ignored. But, the entities that didn't return a successful response all report "Admin operation can not be performed" because they failed to process the tracking response. This is because when the first invocation id is removed from the list, all the others are removed, too. Now those entities are stuck in this bad state. Fix the remove routines so that only the invocation in the response is removed from the list. --- src/plm/plmd/plms_utils.c | 54 --- 1 file changed, 28 insertions(+), 26 deletions(-) diff --git a/src/plm/plmd/plms_utils.c b/src/plm/plmd/plms_utils.c index 5637cdf08..5dbfdb28a 100644 --- a/src/plm/plmd/plms_utils.c +++ b/src/plm/plmd/plms_utils.c @@ -1516,21 +1516,22 @@ void plms_inv_to_trk_grp_add(PLMS_INVOCATION_TO_TRACK_INFO **list, void plms_inv_to_cbk_in_grp_trk_rmv(PLMS_ENTITY_GROUP_INFO *grp, PLMS_TRACK_INFO *trk_info) { - PLMS_INVOCATION_TO_TRACK_INFO **inv_list, **prev; - - inv_list = &(grp->invocation_list); - prev = &(grp->invocation_list); - while (*inv_list) { - if (trk_info == (*inv_list)->track_info) { - (*prev)->next = (*inv_list)->next; - (*inv_list)->track_info = NULL; - (*inv_list)->next = NULL; - free(*inv_list); - *inv_list = NULL; + PLMS_INVOCATION_TO_TRACK_INFO *inv_list, *prev; + + inv_list = grp->invocation_list; + prev = grp->invocation_list; + while (inv_list) { + if (trk_info == inv_list->track_info) { + if (prev == inv_list) { + /* this is the first entry */ + grp->invocation_list = inv_list->next; + } + prev->next = inv_list->next; + free(inv_list); return; } - *prev = *inv_list; - *inv_list = (*inv_list)->next; + prev = inv_list; + inv_list = inv_list->next; } return; @@ -1545,21 +1546,22 @@ void plms_inv_to_cbk_in_grp_trk_rmv(PLMS_ENTITY_GROUP_INFO *grp, void plms_inv_to_cbk_in_grp_inv_rmv(PLMS_ENTITY_GROUP_INFO *grp, SaInvocationT inv_id) { - PLMS_INVOCATION_TO_TRACK_INFO **inv_list, **prev; - - inv_list = &(grp->invocation_list); - prev = &(grp->invocation_list); - while (*inv_list) { - if (inv_id == (*inv_list)->invocation) { - (*prev)->next = (*inv_list)->next; - (*inv_list)->track_info = NULL; - (*inv_list)->next = NULL; - free(*inv_list); - *inv_list = NULL; + PLMS_INVOCATION_TO_TRACK_INFO *inv_list, *prev; + + inv_list = grp->invocation_list; + prev = grp->invocation_list; + while (inv_list) { + if (inv_id == inv_list->invocation) { + if (prev == inv_list) { + /* this is the first entry */ + grp->invocation_list = inv_list->next; + } + prev->next = inv_list->next; + free(inv_list); return; } - *prev = *inv_list; - *inv_list = (*inv_list)->next; + prev = inv_list; + inv_list = inv_list->next; } return; -- 2.14.4 ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] plm: fix return codes for saPlmReadinessTrackResponse [#200]
saPlmReadinessTrackResponse sometimes returns SA_AIS_OK, when invalid parameters are passed. SaPlmReadinessTrackResponseT parameter is not checked for range. Also, the msg is sent asynchronously from the agent to plmd, so that errors from plmd cannot be passed back to the agent. Check the SaPlmReadinessTrackResponseT parameter when passed in, and change the message from asynch to sync, so that errors can be passed back. --- src/plm/agent/plma_api.c | 29 ++--- src/plm/common/plms_common_utils.c | 1 + src/plm/common/plms_edu.c | 1 + src/plm/common/plms_evt.h | 3 ++- src/plm/plmd/plms_adm_fsm.c| 52 -- 5 files changed, 62 insertions(+), 24 deletions(-) diff --git a/src/plm/agent/plma_api.c b/src/plm/agent/plma_api.c index 596175e51..3ca8a8c71 100644 --- a/src/plm/agent/plma_api.c +++ b/src/plm/agent/plma_api.c @@ -2974,6 +2974,7 @@ SaAisErrorT saPlmReadinessTrackResponse(SaPlmEntityGroupHandleT entityGrpHdl, { PLMA_CB *plma_cb = plma_ctrlblk; PLMS_EVT plm_in_evt; + PLMS_EVT *plm_out_res = NULL; SaAisErrorT rc = SA_AIS_OK; uint32_t proc_rc = NCSCC_RC_SUCCESS; PLMA_ENTITY_GROUP_INFO *group_info; @@ -2994,6 +2995,12 @@ SaAisErrorT saPlmReadinessTrackResponse(SaPlmEntityGroupHandleT entityGrpHdl, rc = SA_AIS_ERR_INVALID_PARAM; goto end; } + if (response < SA_PLM_CALLBACK_RESPONSE_OK || +response > SA_PLM_CALLBACK_RESPONSE_ERROR) { + TRACE("response parameter is invalid"); + rc = SA_AIS_ERR_INVALID_PARAM; + goto end; + } if (!plma_cb->plms_svc_up) { LOG_ER("PLMA : PLM SERVICE DOWN"); rc = SA_AIS_ERR_TRY_AGAIN; @@ -3027,10 +3034,10 @@ SaAisErrorT saPlmReadinessTrackResponse(SaPlmEntityGroupHandleT entityGrpHdl, plm_in_evt.req_evt.agent_track.track_cbk_res.invocation_id = invocation; plm_in_evt.req_evt.agent_track.track_cbk_res.response = response; - /* Send a mds async msg to PLMS to obtain group handle for this */ - proc_rc = plms_mds_normal_send(plma_cb->mds_hdl, NCSMDS_SVC_ID_PLMA, - _in_evt, plma_cb->plms_mdest_id, - NCSMDS_SVC_ID_PLMS); + proc_rc = plm_mds_msg_sync_send(plma_cb->mds_hdl, NCSMDS_SVC_ID_PLMA, + NCSMDS_SVC_ID_PLMS, + plma_cb->plms_mdest_id, _in_evt, + _out_res, PLMS_MDS_SYNC_TIME); if (NCSCC_RC_SUCCESS != proc_rc) { LOG_ER( @@ -3038,7 +3045,21 @@ SaAisErrorT saPlmReadinessTrackResponse(SaPlmEntityGroupHandleT entityGrpHdl, rc = SA_AIS_ERR_TRY_AGAIN; goto end; } + + /* Verify if the response if ok */ + if (!plm_out_res) { + rc = SA_AIS_ERR_TRY_AGAIN; + goto end; + } + if (plm_out_res->res_evt.error != SA_AIS_OK) { + rc = plm_out_res->res_evt.error; + goto end; + } + end: + if (plm_out_res) + plms_free_evt(plm_out_res); + TRACE_LEAVE(); return rc; } diff --git a/src/plm/common/plms_common_utils.c b/src/plm/common/plms_common_utils.c index c56093747..9837b8480 100644 --- a/src/plm/common/plms_common_utils.c +++ b/src/plm/common/plms_common_utils.c @@ -148,6 +148,7 @@ SaUint32T plms_free_evt(PLMS_EVT *evt) case PLMS_AGENT_GRP_DEL_RES: case PLMS_AGENT_TRACK_START_RES: case PLMS_AGENT_TRACK_STOP_RES: + case PLMS_AGENT_TRACK_RESP_RES: free(evt); break; case PLMS_AGENT_TRACK_READINESS_IMPACT_RES: diff --git a/src/plm/common/plms_edu.c b/src/plm/common/plms_edu.c index 1b0a2a8ec..d5f445cb4 100644 --- a/src/plm/common/plms_edu.c +++ b/src/plm/common/plms_edu.c @@ -717,6 +717,7 @@ uint32_t plms_evt_test_res_type(NCSCONTEXT arg) case PLMS_AGENT_GRP_ADD_RES: case PLMS_AGENT_GRP_DEL_RES: case PLMS_AGENT_TRACK_STOP_RES: + case PLMS_AGENT_TRACK_RESP_RES: case PLMS_AGENT_TRACK_READINESS_IMPACT_RES: return PLMS_EDU_PLMS_COMMON_RESP; ; diff --git a/src/plm/common/plms_evt.h b/src/plm/common/plms_evt.h index e87c6325e..557579968 100644 --- a/src/plm/common/plms_evt.h +++ b/src/plm/common/plms_evt.h @@ -245,7 +245,8 @@ typedef enum { PLMS_AGENT_GRP_DEL_RES, PLMS_AGENT_TRACK_START_RES, PLMS_AGENT_TRACK_STOP_RES, - PLMS_AGENT_TRACK_READINESS_IMPACT_RES + PLMS_AGENT_TRACK_READINESS_IMPACT_RES, + PLMS_AGENT_TRACK_RESP_RES } PLMS_EVT_RES_TYPE; diff --git a/src/plm/plmd/plms_adm_fsm.c b/src/plm/plmd/plms_adm_fsm.c index a29dc28e0..84b42efde 100644 --- a/src/plm/plmd/plms_adm_fsm.c +++ b/src/plm/plmd/plms_adm_fsm.c
[devel] [PATCH 0/1] Review Request for plm: fix return codes for saPlmReadinessTrackResponse [#200]
Summary: plm: fix return codes for saPlmReadinessTrackResponse [#200] Review request for Ticket(s): 200 Peer Reviewer(s): Mathi, Ravi Pull request to: Affected branch(es): develop Development branch: ticket-200 Base revision: 0178558257672a2c6cc589e7e6cfc2f36bc7e3c0 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision c87593a8180c59b4c3e7f0bd0b8789dac72b0415 Author: Alex Jones Date: Fri, 7 Sep 2018 09:34:15 -0400 plm: fix return codes for saPlmReadinessTrackResponse [#200] saPlmReadinessTrackResponse sometimes returns SA_AIS_OK, when invalid parameters are passed. SaPlmReadinessTrackResponseT parameter is not checked for range. Also, the msg is sent asynchronously from the agent to plmd, so that errors from plmd cannot be passed back to the agent. Check the SaPlmReadinessTrackResponseT parameter when passed in, and change the message from asynch to sync, so that errors can be passed back. Complete diffstat: -- src/plm/agent/plma_api.c | 29 ++--- src/plm/common/plms_common_utils.c | 1 + src/plm/common/plms_edu.c | 1 + src/plm/common/plms_evt.h | 3 ++- src/plm/plmd/plms_adm_fsm.c| 52 -- 5 files changed, 62 insertions(+), 24 deletions(-) Testing Commands: - See ticket. Testing, Expected Results: -- See ticket. Conditions of Submission: - Sep 13, or ack from developer. Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]
Hi Nagu, Here's a patch that fixes your issue in test #1. For the other code review issues, is it OK if I just add them when I push the final patch. Or do you want to review them now? Alex On 08/30/2018 01:44 AM, [1]nagen...@hasolutions.in wrote: __ NOTICE: This email was received from an EXTERNAL sender __ Hi Alex, Thanks for your response. For Test #2, I had configured all SUs on the single node SC-1. So, 2 container SUs and 2 contained SUs are on the same node. In such cases, we can have the implementation as having only one SU of that node(higher rank SUs may be) to be the container for all the contained SUs of that node. Thanks, Nagendra, 91-9866424860 High Availability Solutions Pvt. Ltd. ([2]www.hasolutions.in) - OpenSAF Support and Services - Original Message - Subject: Re: [PATCH 1/1] amf: add support for container/contained [#70] From: "Alex Jones" [3] Date: 8/29/18 9:29 pm To: [4]nagen...@hasolutions.in, "Gary Lee" [5], [6]hans.nordeb...@ericsson.com, [7]ravisekhar.ko...@oracle.com Cc: [8]opensaf-devel@lists.sourceforge.net Hi Nagu, I have a fix for your issue test #1. I will send out a patch along with changes for code review #1 and #2. For issue test #2, I think this needs to be handled in the configuration. In this case because there is no explicit node set for the contained SUs, su.cc:map_su_to_node will assign a node in the node group. The code is assigning it to SC-2 in this case, because another SU has been assigned to SC-1, even though there is no container on SC-2. I'm not sure how we can get around this without explicitly setting the contained host node in the configuration. Since the container csi has not yet been assigned, we can't map it to a container, and so we can't figure out which container we should be on the same node as. Am I right here? Alex On 08/28/2018 09:56 AM, [9]nagen...@hasolutions.in wrote: ___ NOTICE: This email was received from an EXTERNAL sender ___ Hi Alex, Code review: 1. Header for few functions are missing. 2. Clc.cc: Need to add '0' in place avnd_comp_clc_inst_try_again_hdler in other fsm states. Testing: 1. Uploaded AppConfig-container.xml and AppConfig-contained-2N.xml Performed: amf-adm unlock-in safSu=SU1,safSg=Container,safApp=Container amf-adm unlock safSu=SU1,safSg=Container,safApp=Container Even I don't perform the following, the contained components are instantiated. amf-adm unlock-in safSu=SU1,safSg=Contained_2N,safApp=Contained_2N amf-adm unlock safSu=SU1,safSg=Contained_2N,safApp=Contained_2N Aug 28 19:15:11 nags-VirtualBox osafamfnd[28278]: NO 'safSu=SU1,safSg=Contained_2N,safApp=Contained_2N' Presence State UNINSTANTIATED => INSTANTIATING immlist safSu=SU1,safSg=Contained_2N,safApp=Contained_2N will show saAmfSUPresenceState 3(instantiated) and saAmfSUAdminState 3(locked-in) Now further admin operation on safSu=SU1,safSg=Contained_2N,safApp=Contained_2N will fail: root@nags-VirtualBox:/home/nags/views/ajones-review/samples/amf/contain er# amf-adm unlock-in safSu=SU1,safSg=Contained_2N,safApp=Contained_2N error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: SA_AIS_ERR_BAD_OPERATION (20) error-string: Can't instantiate 'safSu=SU1,safSg=Contained_2N,safApp=Contained_2N', whose presence state is '3' 2.This is related to Specs 6.2.2 Assignment of the Container CSI: "If there are multiple container components on a node which have the active HA state for a particular container CSI, and one or more service units on the same node whose contained components are configured with the same container CSI, it is implementation- defined how the Availability Management Framework selects container components to handle the life cycle of the contained components of these service units. However, all contained components of a service unit must have the same associated container component." Uploaded AppConfig-container.xml and AppConfig-contained-2N.xml with once difference that all SUs of container and contained are configured on SC-1. Perform the following operations, but safSu=SU2,safSg=Contained_2N,safApp=Contained_2N will not get assignments. amf-adm unlock-in safSu=SU1,safSg=Contained_2N,safApp=Contained_2N amf-adm unlock safSu=SU1,safSg=Contained_2N,safApp=Contained_2N amf-adm unlock-in safSu=SU2,safSg=Contained_2N,safApp=Contained_2N amf-adm unlock safSu=SU2,safSg=Contained_2N,safA
Re: [devel] [PATCH 1/1] plm: remove unused function plms_hsm_finalize [#210]
Ack. I will push it. Alex On 09/06/2018 04:40 AM, Meenakshi TK wrote: __ NOTICE: This email was received from an EXTERNAL sender __ --- src/plm/common/plms_hsm.c | 27 --- src/plm/common/plms_hsm.h | 1 - 2 files changed, 28 deletions(-) diff --git a/src/plm/common/plms_hsm.c b/src/plm/common/plms_hsm.c index f3bc478..f8f7b62 100644 --- a/src/plm/common/plms_hsm.c +++ b/src/plm/common/plms_hsm.c @@ -88,7 +88,6 @@ PLMS_HSM_CB *hsm_cb = &_hsm_cb; * FUNCTION PROTOTYPES *** / SaUint32T plms_hsm_initialize(PLMS_HPI_CONFIG *hpi_cfg); -SaUint32T plms_hsm_finalize(void); SaUint32T plms_get_hotswap_model(const SaHpiEntityPathT *, PLMS_HPI_STATE_MODEL *); static SaUint32T hsm_get_hotswap_model(SaHpiRptEntryT *rpt_entry, @@ -266,32 +265,6 @@ SaUint32T plms_hsm_initialize(PLMS_HPI_CONFIG *hpi_cfg) TRACE_LEAVE(); return NCSCC_RC_SUCCESS; } -/* ** - * @brief Closes HPI session and terminates HSM thread - * - * @param[in] - * - * @return NCSCC_RC_SUCCESS/NCSCC_RC_FAILURE - *** / -SaUint32T plms_hsm_finalize(void) -{ - PLMS_HSM_CB *cb = hsm_cb; - SaErrorT rc; - - /* Close the HPI session */ - rc = saHpiSessionClose(cb->session_id); - if (SA_OK != rc) - LOG_ER("HSM:Close session return error: %d:\n", rc); - /* Close connection to NTF */ - rc = saNtfFinalize(cb->plm_ntf_hdl); - if (SA_OK != rc) - LOG_ER("HSM: saNtfFinalize return error: %d:\n", rc); - - /* Kill the HSM thread */ - pthread_cancel(cb->threadid); - - return NCSCC_RC_SUCCESS; -} SaUint32T plms_get_hotswap_model(const SaHpiEntityPathT *epath_ptr, PLMS_HPI_STATE_MODEL *model) diff --git a/src/plm/common/plms_hsm.h b/src/plm/common/plms_hsm.h index 4b33327..db62330 100644 --- a/src/plm/common/plms_hsm.h +++ b/src/plm/common/plms_hsm.h @@ -50,7 +50,6 @@ extern HSM_HA_STATE hsm_ha_state; /* Function Declarations */ SaUint32T plms_hsm_initialize(PLMS_HPI_CONFIG *hpi_cfg); -SaUint32T plms_hsm_finalize(void); SaUint32T hsm_get_idr_info(SaHpiRptEntryT *rpt_entry, PLMS_INV_DATA *inv_data); SaUint32T convert_entitypath_to_string(const SaHpiEntityPathT *entity_path, -- 2.7.4 signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 0/1] Review Request for plm: correct first arguement of API saPlmEntityGroupAdd() in apitest [#1983]
Ack. I will push it. Alex On 09/03/2018 07:56 AM, [1]meenak...@hasolutions.in wrote: __ NOTICE: This email was received from an EXTERNAL sender __ Hi Alex, Thanks for your comment. I just now floated the patch with your comment, please review. Thanks, Meenakshi High Availability Solutions Pvt. Ltd. [2]www.hasolutions.in - Original Message - Subject: Re: [PATCH 0/1] Review Request for plm: correct first arguement of API saPlmEntityGroupAdd() in apitest [#1983] From: "Alex Jones" [3] Date: 8/27/18 10:42 pm To: "Meenakshi TK" [4], [5]nagen...@hasolutions.in Cc: [6]opensaf-devel@lists.sourceforge.net Hi, This test is currently not enabled in test_saPlmEntityGroupCreate.c. Can you please enable it as part of this ticket? Alex On 08/20/2018 07:37 AM, Meenakshi TK wrote: ___ NOTICE: This email was received from an EXTERNAL sender ___ Summary: plm: correct first arguement of API saPlmEntityGroupAdd() in apitest [#1983] Review request for Ticket(s): 1983 Peer Reviewer([7]s):ajo...@rbbn.com Pull request to: Alex Affected branch(es): all Development branch: ticket-1983 Base revision: 1c19eddc9f03ebd18ab85b67ab50e3e5037b449e Personal repository: [8]git://git.code.sf.net/u/meenatk-hasoln/review Impacted area Impact y/n Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services n OpenSAF services n Core libraries n Samples n Tests y Other n Comments (indicate scope for each "y" above): - *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision 58a05affe898227fa96e1a08eaa37f4055077da2 Author: Meenakshi TK [9] Date: Mon, 20 Aug 2018 14:24:00 +0530 plm: correct first arguement of API saPlmEntityGroupAdd() in apitest [#1983] Complete diffstat: -- src/plm/apitest/test_saPlmEntityGroupAdd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Testing Commands: - Perform compilation on 32-bit machine Testing, Expected Results: -- All tests of apitest passed Conditions of Submission: - Ack from Alex Arch Built Started Linux distro --- mips n n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address
Re: [devel] [PATCH 1/1] ckpt: add the ckpt reference to the CPND node info [#2082]
Hi Mohan, I am not able to reproduce the problem as described in the ticket. Can you post your test code? Alex On 09/03/2018 03:32 AM, [1]mo...@hasolutions.in wrote: __ NOTICE: This email was received from an EXTERNAL sender __ Hi Vu/Gary/Alex, Polite remainder for review. Thanks Mohan High Availability Solutions Pvt Ltd [2]www.hasolutions.in - Original Message - Subject: [PATCH 1/1] ckpt: add the ckpt reference to the CPND node info [#2082] From: "Mohan Kanakam" [3] Date: 8/29/18 8:30 pm To: [4]vu.m.ngu...@dektech.com.au, [5]gary@dektech.com.au, [6]ajo...@rbbn.com Cc: [7]opensaf-devel@lists.sourceforge.net, "Mohan Kanakam" [8] --- src/ckpt/ckptd/cpd_proc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/ckpt/ckptd/cpd_proc.c b/src/ckpt/ckptd/cpd_proc.c index 26614ba..f1763c2 100644 --- a/src/ckpt/ckptd/cpd_proc.c +++ b/src/ckpt/ckptd/cpd_proc.c @@ -444,6 +444,8 @@ uint32_t cpd_ckpt_db_entry_update(CPD_CB *cb, MDS_DEST *cpnd_dest, /* Add the ckpt reference to the CPND node info */ cpd_ckpt_ref_info_add(node_info, ckpt_node); } + else + cpd_ckpt_ref_info_add(node_info, ckpt_node); TRACE_LEAVE(); return NCSCC_RC_SUCCESS; -- 2.7.4 References 1. mailto:mo...@hasolutions.in 2. https://protect-us.mimecast.com/s/ftxhCXDMJDUxw19c6ugVL?domain=hasolutions.in 3. mailto:mo...@hasolutions.in 4. mailto:vu.m.ngu...@dektech.com.au 5. mailto:gary@dektech.com.au 6. mailto:ajo...@rbbn.com 7. mailto:opensaf-devel@lists.sourceforge.net 8. mailto:mo...@hasolutions.in signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]
The probation time is the default in the config: 4s. Alex On 08/28/2018 01:32 AM, Gary Lee wrote: __ NOTICE: This email was received from an EXTERNAL sender __ Hi Alex No, I just ran kill 10 times to escalate restart to failover. Do you have a really small probation time in your demo config? Gary On 28/8/18 4:09 am, Alex Jones wrote: G'day Gary, I can't reproduce this. Do you have a script or something that reproduces it? Alex On 08/15/2018 11:52 PM, Gary Lee wrote: ___ NOTICE: This email was received from an EXTERNAL sender ___ Hi Alex Thanks, it looks much better. So I tried `killall amf_container_demo" 10 times really quickly: 2018-08-16 13:43:22.652 SC-1 osafamfnd[286]: NO 'safSu=SU1,safSg=Container,safApp=Container' restarts have reached configured limit of 10 2018-08-16 13:43:22.653 SC-1 osafamfnd[286]: NO 'safSu=SU1,safSg=Container,safApp=Container' SU restart probation timer stopped 2018-08-16 13:43:22.654 SC-1 osafamfnd[286]: NO SU failover probation timer started (timeout: 12000 ns) 2018-08-16 13:43:22.655 SC-1 osafamfnd[286]: NO Performing failover of 'safSu=SU1,safSg=Container,safApp=Container' (SU failover count: 1) 2018-08-16 13:43:22.655 SC-1 osafamfnd[286]: NO 'safComp=Container,safSu=SU1,safSg=Container,safApp=Container' recovery action escalated from 'componentRestart' to 'suFailover' 2018-08-16 13:43:22.656 SC-1 osafamfnd[286]: NO 'safComp=Container,safSu=SU1,safSg=Container,safApp=Container' faulted due to 'avaDown' : Recovery is 'suFailover' 2018-08-16 13:43:22.657 SC-1 osafamfnd[286]: NO Terminating components of 'safSu=SU1,safSg=Container,safApp=Container'(abruptly & unordered) 2018-08-16 13:43:22.658 SC-1 osafamfnd[286]: NO 'safSu=SU1,safSg=Container,safApp=Container' Presence State INSTANTIATED => TERMINATING 2018-08-16 13:43:22.659 SC-1 osafamfnd[286]: NO 'safSu=SU1,safSg=Container,safApp=Container' Presence State TERMINATING => TERMINATING 2018-08-16 13:43:22.667 SC-1 ubuntu: CONTAINED COMP NAME:safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_ 2N 2018-08-16 13:43:22.670 SC-1 osafamfnd[286]: NO 'safSu=SU1,safSg=Container,safApp=Container' Presence State TERMINATING => UNINSTANTIATED 2018-08-16 13:43:22.671 SC-1 osafamfnd[286]: NO Terminated all components in 'safSu=SU1,safSg=Container,safApp=Container' 2018-08-16 13:43:22.671 SC-1 osafamfnd[286]: NO Informing director of sufailover amf-state su: safSu=SU1,safSg=Contained_2N,safApp=Contained_2N saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=IN-SERVICE(2) amf-state si: safSi=SC-2N,safApp=OpenSAF saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=FULLY_ASSIGNED(2) safSi=Contained_2N_1,safApp=Contained_2N saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=FULLY_ASSIGNED(2) Thanks Gary From: Alex Jones [1] Organization: Ribbon Date: Thursday, 16 August 2018 at 3:41 am To: Gary Lee [2], [3], [4], [5] Cc: [6] Subject: Re: [PATCH 1/1] amf: add support for container/contained [#70] G'day Gary, I see you were adding the XML file dynamically with "immcfg -f". I hadn't tried that. I hadn't tried killing the sample app, either. Here is a patch that should fix both issues. Apply it on top of the latest big one I sent. Alex On 08/13/2018 10:37 PM, Gary Lee wrote: ___ NOTICE: This email was received from an EXTERNAL sender ___ Hi Alex I modified AppConfig-container.xml and changed saAmfSgtRedundancyModel from 4 (NwayAct) to 1 (2N). The xml still loads and I could unlock, resulting in: root@SC-1:/var/log# immlist safVersion=1,safSgType=Container Name Type Value(s) safVersion SA_STRING_T safVersion=1 saAmfSgtValidSuTypes SA_NAME_T safVersion=1,safSuType=Container (32) saAmfSgtRedundancyModelSA_UINT32_T 1 (0x1) safSISU=safSu=SU2\,safSg=Container\,safApp=Container,safSi=Container ,safApp=Container
Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]
rther testing. The documentation need to be done if you haven't tested : - Headless enabled - CSI Dep, SI Dep testimg - Etc. Thanks, Nagendra, 91-9866424860 High Availability Solutions Pvt. Ltd. ([2]www.hasolutions.in) - OpenSAF Support and Services - Original Message - Subject: Re: [PATCH 1/1] amf: add support for container/contained [#70] From: "Alex Jones" [3] Date: 8/15/18 11:10 pm To: "Gary Lee" [4], [5]hans.nordeb...@ericsson.com, [6]ravisekhar.ko...@oracle.com, [7]nagen...@hasolutions.in Cc: [8]opensaf-devel@lists.sourceforge.net G'day Gary, I see you were adding the XML file dynamically with "immcfg -f". I hadn't tried that. I hadn't tried killing the sample app, either. Here is a patch that should fix both issues. Apply it on top of the latest big one I sent. Alex On 08/13/2018 10:37 PM, Gary Lee wrote: ___ NOTICE: This email was received from an EXTERNAL sender ___ Hi Alex I modified AppConfig-container.xml and changed saAmfSgtRedundancyModel from 4 (NwayAct) to 1 (2N). The xml still loads and I could unlock, resulting in: root@SC-1:/var/log# immlist safVersion=1,safSgType=Container Name Type Value(s) safVersion SA_STRING_T safVersion=1 saAmfSgtValidSuTypes SA_NAME_T safVersion=1,safSuType=Container (32) saAmfSgtRedundancyModelSA_UINT32_T 1 (0x1) safSISU=safSu=SU2\,safSg=Container\,safApp=Container,safSi=Container ,safApp=Container saAmfSISUHAState=STANDBY(2) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU1\,safSg=Container\,safApp=Container,safSi=Container ,safApp=Container saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) Also, have you tried killing the amf_container_demo binary? Thanks Gary On 14/08/18 05:00, Alex Jones wrote: Hi Gary, I just resubmitted a new patch which breaks out the different components, and addresses the other comments here. But, #2 (rejecting all but NWay-active for container) should already be in there. Is there a specific test you ran that didn't work? Alex On 08/13/2018 02:43 AM, Gary Lee wrote: ___ NOTICE: This email was received from an EXTERNAL sender ___ Hi Alex Some initial comments: 0. Is it possible to split up the patch into amfd / amfnd / common / samples. Just makes it easier to reply inline. 1. Please compile the container demo by default, and make amf_container_script world executable. Eg. diff --git a/samples/amf/Makefile.am b/samples/amf/Makefile.am index 447dedd..7ebf9c3 100644 --- a/samples/amf/Makefile.am +++ b/samples/amf/Makefile.am @@ -19,5 +19,5 @@ include $(top_srcdir)/Makefile.common MAINTAINERCLEANFILES = Makefile.in -SUBDIRS = sa_aware non_sa_aware wrapper proxy api_demo +SUBDIRS = sa_aware non_sa_aware wrapper proxy api_demo container diff --git a/samples/amf/container/amf_container_script b/samples/amf/container/amf_container_script old mode 100644 new mode 100755 diff --git a/samples/configure.ac b/samples/configure.ac index 7cf803e..9765d54 100644 --- a/samples/configure.ac +++ b/samples/configure.ac @@ -67,6 +67,7 @@ AC_CONFIG_FILES([ \ amf/wrapper/Makefile \ amf/proxy/Makefile \ amf/api_demo/Makefile \ + amf/container/Makefile \ cpsv/Makefile \ cpsv/ckpt_demo/Makefile \ cpsv/ckpt_track_demo/Makefile \ 2. We should probably reject CCBs that set saAmfSgtRedundancyModel to anything other than NWayActive, for Containers. 3. Do we need to bump the msg format version to AVSV_AVD_AVND_MSG_FMT_VER_8? An old amfnd will assert if it gets an AVSV_D2N_CONTAINED_SU_MSG_INFO msg. Thanks Gary References 1. mailto:nagen...@hasolutions.in 2. https://protect-us.mimecast.com/s/8jY0CADmVDUYY3qhGuyFS?domain=hasolutions.in 3. mailto:ajo...@rbbn.com 4. mailto:gary@dektech.com.au 5. mailto:hans.nordeb...@ericsson.com 6. mailto:ravisekhar.ko...@oracle.com 7. mailto:nagen...@hasolutions.in 8. mailto:opensaf-devel@lists.sourceforge.net signature.asc Description: OpenPGP digital signature -- Check out the vibran
[devel] [PATCH 0/1] Review Request for plmd: fix crash when saPlmReadinessTrack is called in error [#2919]
Summary: plmd: fix crash when saPlmReadinessTrack is called in error [#2919] Review request for Ticket(s): 2919 Peer Reviewer(s): mathi, ravi Pull request to: Affected branch(es): develop Development branch: ticket-2919 Base revision: fb4890756ebd14fbe40906d37962b9261ed9a282 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision e18dabd0a8385ff61ba1ab0540eba4ee58b5cc4e Author: Alex Jones Date: Mon, 27 Aug 2018 16:33:33 -0400 plmd: fix crash when saPlmReadinessTrack is called in error [#2919] plmd crashes when saPlmReadinessTrack is called with entities pointer set, but smaller than what plmd would return. In this case plmd is returning ERR_NO_SPACE, which is correct, but it is setting numberOfEntities without setting the entities pointer. This causes the edu routines to crash. It is not necessary to set numberOfEntities since we are returning an error code. Complete diffstat: -- src/plm/plmd/plms_proc.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) Testing Commands: - plmtest 3 13 Testing, Expected Results: -- plmtest should run without problems Conditions of Submission: - Sep 4, or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] plmd: fix crash when saPlmReadinessTrack is called in error [#2919]
plmd crashes when saPlmReadinessTrack is called with entities pointer set, but smaller than what plmd would return. In this case plmd is returning ERR_NO_SPACE, which is correct, but it is setting numberOfEntities without setting the entities pointer. This causes the edu routines to crash. It is not necessary to set numberOfEntities since we are returning an error code. --- src/plm/plmd/plms_proc.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/plm/plmd/plms_proc.c b/src/plm/plmd/plms_proc.c index aa93e5942..2b4445394 100644 --- a/src/plm/plmd/plms_proc.c +++ b/src/plm/plmd/plms_proc.c @@ -879,6 +879,12 @@ void plms_process_trk_start_evt(PLMS_EVT *plm_evt) no_of_ent_recd = no_of_ent_in_grp; } + if (no_of_ent_in_grp != no_of_ent_recd) { + LOG_ER("PLMS: no of entities sent is != entities in grp"); + rc = SA_AIS_ERR_NO_SPACE; + goto send_resp; + } + plm_resp.res_evt.entities = (SaPlmReadinessTrackedEntitiesT *)malloc( sizeof(SaPlmReadinessTrackedEntitiesT)); @@ -889,12 +895,6 @@ void plms_process_trk_start_evt(PLMS_EVT *plm_evt) strerror(errno)); goto send_resp; } - if (no_of_ent_in_grp != no_of_ent_recd) { - LOG_ER("PLMS: no of entities sent is != entities in grp"); - plm_resp.res_evt.entities->numberOfEntities = no_of_ent_in_grp; - rc = SA_AIS_ERR_NO_SPACE; - goto send_resp; - } if (m_PLM_IS_SA_TRACK_CHANGES_SET(track_flags) || m_PLM_IS_SA_TRACK_CHANGES_ONLY_SET(track_flags)) { -- 2.14.4 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 0/1] Review Request for plm: correct first arguement of API saPlmEntityGroupAdd() in apitest [#1983]
Hi, This test is currently not enabled in test_saPlmEntityGroupCreate.c. Can you please enable it as part of this ticket? Alex On 08/20/2018 07:37 AM, Meenakshi TK wrote: __ NOTICE: This email was received from an EXTERNAL sender __ Summary: plm: correct first arguement of API saPlmEntityGroupAdd() in apitest [#1983] Review request for Ticket(s): 1983 Peer Reviewer([1]s):ajo...@rbbn.com Pull request to: Alex Affected branch(es): all Development branch: ticket-1983 Base revision: 1c19eddc9f03ebd18ab85b67ab50e3e5037b449e Personal repository: [2]git://git.code.sf.net/u/meenatk-hasoln/review Impacted area Impact y/n Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services n OpenSAF services n Core libraries n Samples n Tests y Other n Comments (indicate scope for each "y" above): - *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision 58a05affe898227fa96e1a08eaa37f4055077da2 Author: Meenakshi TK [3] Date: Mon, 20 Aug 2018 14:24:00 +0530 plm: correct first arguement of API saPlmEntityGroupAdd() in apitest [#1983] Complete diffstat: -- src/plm/apitest/test_saPlmEntityGroupAdd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Testing Commands: - Perform compilation on 32-bit machine Testing, Expected Results: -- All tests of apitest passed Conditions of Submission: - Ack from Alex Arch Built Started Linux distro --- mips n n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. References 1. mailto:s):ajo...@rbbn.com 2. https://protect-us.mimecast.com/s/V-86CNk8vkinzM3H4J2Bv 3. mailto:meenak...@hasolutions.in signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net
Re: [devel] [PATCH 1/1] ckpt: add new test case of API saCkptInitialize() of apitest [#2913]
Hi Mohan, Ack from me. Alex On 08/21/2018 04:16 AM, mohan kanakam wrote: __ NOTICE: This email was received from an EXTERNAL sender __ --- src/ckpt/apitest/test_cpa.c | 12 1 file changed, 12 insertions(+) diff --git a/src/ckpt/apitest/test_cpa.c b/src/ckpt/apitest/test_cpa.c index 0cc38a4..51f3c99 100644 --- a/src/ckpt/apitest/test_cpa.c +++ b/src/ckpt/apitest/test_cpa.c @@ -748,6 +748,16 @@ void cpsv_it_init_10() test_validate(result, TEST_PASS); } +void cpsv_it_init_11() +{ + int result; + printHead("To verify saCkptInitialize with one sync clbk"); + result = test_ckptInitialize(CKPT_INIT_SYNC_NULL_CBK_T, TEST_NONCONFIG_MODE); + test_cpsv_cleanup(CPSV_CLEAN_INIT_SYNC_NULL_CBK_T); + printResult(result); + test_validate(result, TEST_PASS); +} + /** saCkptSelectionObjectGet */ void cpsv_it_sel_01() @@ -7941,6 +7951,8 @@ __attribute__((constructor)) static void ckpt_cpa_test_constructor(void) "To verify saCkptInitialize with NULL handle"); test_case_add(1, cpsv_it_init_10, "To verify saCkptInitialize with one NULL clbk"); + test_case_add(1, cpsv_it_init_11, + "To verify saCkptInitialize with one SYNC clbk and NULL clbk"); test_suite_add(2, "CKPT API saCkptSelectObjectGet()"); test_case_add( -- 2.7.4 signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]
G'day Gary, I see you were adding the XML file dynamically with "immcfg -f". I hadn't tried that. I hadn't tried killing the sample app, either. Here is a patch that should fix both issues. Apply it on top of the latest big one I sent. Alex On 08/13/2018 10:37 PM, Gary Lee wrote: __ NOTICE: This email was received from an EXTERNAL sender __ Hi Alex I modified AppConfig-container.xml and changed saAmfSgtRedundancyModel from 4 (NwayAct) to 1 (2N). The xml still loads and I could unlock, resulting in: root@SC-1:/var/log# immlist safVersion=1,safSgType=Container Name Type Value(s) === = safVersion SA_STRING_T safVersion=1 saAmfSgtValidSuTypes SA_NAME_T safVersion=1,safSuType=Container (32) saAmfSgtRedundancyModelSA_UINT32_T 1 (0x1) safSISU=safSu=SU2\,safSg=Container\,safApp=Container,safSi=Container,sa fApp=Container saAmfSISUHAState=STANDBY(2) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) safSISU=safSu=SU1\,safSg=Container\,safApp=Container,safSi=Container,sa fApp=Container saAmfSISUHAState=ACTIVE(1) saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1) Also, have you tried killing the amf_container_demo binary? Thanks Gary On 14/08/18 05:00, Alex Jones wrote: Hi Gary, I just resubmitted a new patch which breaks out the different components, and addresses the other comments here. But, #2 (rejecting all but NWay-active for container) should already be in there. Is there a specific test you ran that didn't work? Alex On 08/13/2018 02:43 AM, Gary Lee wrote: ___ NOTICE: This email was received from an EXTERNAL sender ___ Hi Alex Some initial comments: 0. Is it possible to split up the patch into amfd / amfnd / common / samples. Just makes it easier to reply inline. 1. Please compile the container demo by default, and make amf_container_script world executable. Eg. diff --git a/samples/amf/Makefile.am b/samples/amf/Makefile.am index 447dedd..7ebf9c3 100644 --- a/samples/amf/Makefile.am +++ b/samples/amf/Makefile.am @@ -19,5 +19,5 @@ include $(top_srcdir)/Makefile.common MAINTAINERCLEANFILES = Makefile.in -SUBDIRS = sa_aware non_sa_aware wrapper proxy api_demo +SUBDIRS = sa_aware non_sa_aware wrapper proxy api_demo container diff --git a/samples/amf/container/amf_container_script b/samples/amf/container/amf_container_script old mode 100644 new mode 100755 diff --git a/samples/configure.ac b/samples/configure.ac index 7cf803e..9765d54 100644 --- a/samples/configure.ac +++ b/samples/configure.ac @@ -67,6 +67,7 @@ AC_CONFIG_FILES([ \ amf/wrapper/Makefile \ amf/proxy/Makefile \ amf/api_demo/Makefile \ + amf/container/Makefile \ cpsv/Makefile \ cpsv/ckpt_demo/Makefile \ cpsv/ckpt_track_demo/Makefile \ 2. We should probably reject CCBs that set saAmfSgtRedundancyModel to anything other than NWayActive, for Containers. 3. Do we need to bump the msg format version to AVSV_AVD_AVND_MSG_FMT_VER_8? An old amfnd will assert if it gets an AVSV_D2N_CONTAINED_SU_MSG_INFO msg. Thanks Gary diff --git a/src/amf/amfd/comp.cc b/src/amf/amfd/comp.cc index 571ac34fb..d8cbcf2ae 100644 --- a/src/amf/amfd/comp.cc +++ b/src/amf/amfd/comp.cc @@ -328,6 +328,31 @@ done: TRACE_LEAVE(); } +static bool get_container_redundancy_model_from_ccb( + CcbUtilOperationData_t *opdata, + const std::string& sg_name, + SaAmfRedundancyModelT& model) { + SaNameT aname, sgtypeName; + bool status(false); + + osaf_extended_name_alloc(sg_name.c_str(), ); + CcbUtilOperationData_t *ccbSgOpData(ccbutil_getCcbOpDataByDN(opdata->ccbId, )), +*ccbSgTypeOpData(nullptr); + + if (ccbSgOpData && ccbSgOpData->operationType == CCBUTIL_CREATE && + immutil_getAttr(const_cast("saAmfSGType"), + ccbSgOpData->param.create.attrValues, + 0, ) == SA_AIS_OK && + (ccbSgTypeOpData = ccbutil_getCcbOpDataByDN(opdata->ccbId, )) && + immutil_getAttr(const_cast("saAmfSgtRedundancyModel"), + ccbSgTypeOpData->param.create.attrValues, + 0, ) == SA_AIS_OK) { +status = true; + } + + return status; +} + /** * Validate configuration attributes for an
Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]
Hi Gary, I just resubmitted a new patch which breaks out the different components, and addresses the other comments here. But, #2 (rejecting all but NWay-active for container) should already be in there. Is there a specific test you ran that didn't work? Alex On 08/13/2018 02:43 AM, Gary Lee wrote: __ NOTICE: This email was received from an EXTERNAL sender __ Hi Alex Some initial comments: 0. Is it possible to split up the patch into amfd / amfnd / common / samples. Just makes it easier to reply inline. 1. Please compile the container demo by default, and make amf_container_script world executable. Eg. diff --git a/samples/amf/Makefile.am b/samples/amf/Makefile.am index 447dedd..7ebf9c3 100644 --- a/samples/amf/Makefile.am +++ b/samples/amf/Makefile.am @@ -19,5 +19,5 @@ include $(top_srcdir)/Makefile.common MAINTAINERCLEANFILES = Makefile.in -SUBDIRS = sa_aware non_sa_aware wrapper proxy api_demo +SUBDIRS = sa_aware non_sa_aware wrapper proxy api_demo container diff --git a/samples/amf/container/amf_container_script b/samples/amf/container/amf_container_script old mode 100644 new mode 100755 diff --git a/samples/configure.ac b/samples/configure.ac index 7cf803e..9765d54 100644 --- a/samples/configure.ac +++ b/samples/configure.ac @@ -67,6 +67,7 @@ AC_CONFIG_FILES([ \ amf/wrapper/Makefile \ amf/proxy/Makefile \ amf/api_demo/Makefile \ + amf/container/Makefile \ cpsv/Makefile \ cpsv/ckpt_demo/Makefile \ cpsv/ckpt_track_demo/Makefile \ 2. We should probably reject CCBs that set saAmfSgtRedundancyModel to anything other than NWayActive, for Containers. 3. Do we need to bump the msg format version to AVSV_AVD_AVND_MSG_FMT_VER_8? An old amfnd will assert if it gets an AVSV_D2N_CONTAINED_SU_MSG_INFO msg. Thanks Gary signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/5] amfd: add support for container/contained [#70]
This ticket adds support for container/contained in amfd. --- src/amf/amfd/comp.cc | 65 ++-- src/amf/amfd/comp.h | 4 +- src/amf/amfd/comptype.cc | 6 +- src/amf/amfd/csi.cc | 6 ++ src/amf/amfd/csi.h | 3 + src/amf/amfd/ndproc.cc | 14 + src/amf/amfd/node.cc | 29 + src/amf/amfd/node.h | 1 + src/amf/amfd/sg.cc | 29 + src/amf/amfd/sg.h| 4 ++ src/amf/amfd/sgproc.cc | 142 ++- src/amf/amfd/si.cc | 17 ++ src/amf/amfd/si.h| 1 + src/amf/amfd/su.cc | 155 ++- src/amf/amfd/su.h| 15 - src/amf/amfd/util.cc | 39 src/amf/amfd/util.h | 2 + 17 files changed, 517 insertions(+), 15 deletions(-) diff --git a/src/amf/amfd/comp.cc b/src/amf/amfd/comp.cc index 482322d2e..571ac34fb 100644 --- a/src/amf/amfd/comp.cc +++ b/src/amf/amfd/comp.cc @@ -73,7 +73,7 @@ void AVD_COMP::initialize() { curr_num_csi_actv = {}; curr_num_csi_stdby = {}; comp_proxy_csi = {}; - comp_container_csi = {}; + saAmfCompContainerCsi = {}; saAmfCompRestartCount = {}; saAmfCompCurrProxyName = {}; saAmfCompCurrProxiedNames = {}; @@ -357,7 +357,10 @@ static int is_config_valid(const std::string , 0, ); osafassert(rc == SA_AIS_OK); - if (comptype_db->find(Amf::to_string()) == nullptr) { + AVD_COMP_TYPE *comptype(comptype_db->find(Amf::to_string())); + CcbUtilOperationData_t *ccbCompTypeOpData(nullptr); + + if (comptype == nullptr) { /* Comp type does not exist in current model, check CCB */ if (opdata == nullptr) { report_ccb_validation_error(opdata, "'%s' does not exist in model", @@ -365,7 +368,8 @@ static int is_config_valid(const std::string , return 0; } -if (ccbutil_getCcbOpDataByDN(opdata->ccbId, ) == nullptr) { +ccbCompTypeOpData = ccbutil_getCcbOpDataByDN(opdata->ccbId, ); +if (ccbCompTypeOpData == nullptr) { report_ccb_validation_error( opdata, "'%s' does not exist in existing model or in CCB", osaf_extended_name_borrow()); @@ -399,6 +403,24 @@ static int is_config_valid(const std::string , return 0; } + if ((comptype && IS_COMP_CONTAINED(comptype->saAmfCtCompCategory)) || + (ccbCompTypeOpData && + ccbCompTypeOpData->operationType == CCBUTIL_CREATE && + immutil_getAttr(const_cast("saAmfCtCompCategory"), + ccbCompTypeOpData->param.create.attrValues, + 0, ) == SA_AIS_OK && + value & SA_AMF_COMP_CONTAINED)) { +rc = immutil_getAttr(const_cast("saAmfCompContainerCsi"), + attributes, 0, ); +if (rc != SA_AIS_OK) { + report_ccb_validation_error( + opdata, "Contained component '%s' must have saAmfCompContainerCsi " + "attribute set", dn.c_str()); + return 0; +} + } + + #if 0 if ((comp->comp_info.category == AVSV_COMP_TYPE_SA_AWARE) && (comp->comp_info.init_len == 0)) { LOG_ER("Sa Aware Component: instantiation command not configured"); @@ -716,6 +738,20 @@ static AVD_COMP *comp_create(const std::string , >comp_info.comp_restart) != SA_AIS_OK) comp->comp_info.comp_restart = comptype->saAmfCtDefDisableRestart; + if (comp->contained()) { +SaNameT container_csi; + +if (immutil_getAttr(const_cast("saAmfCompContainerCsi"), + attributes, 0, _csi) != SA_AIS_OK) { + LOG_ER("unable to get container csi for %s", dn.c_str()); + goto done; +} + +comp->saAmfCompContainerCsi = Amf::to_string(_csi); +//XXX TODO_70: verify db if container csi. DO we requre this. +container_csis.insert(comp->saAmfCompContainerCsi); + } + comp->max_num_csi_actv = -1; // TODO comp->max_num_csi_stdby = -1; // TODO @@ -770,6 +806,7 @@ SaAisErrorT avd_comp_config_get(const std::string _name, AVD_SU *su) { const_cast("saAmfCompQuiescingCompleteTimeout"), const_cast("saAmfCompRecoveryOnError"), const_cast("saAmfCompDisableRestart"), + const_cast("saAmfCompContainerCsi"), nullptr}; TRACE_ENTER(); @@ -1735,9 +1772,9 @@ static void comp_ccb_apply_modify_hdlr(struct CcbUtilOperationData *opdata) { comp->comp_proxy_csi = Amf::to_string((SaNameT *)value); } else if (!strcmp(attribute->attrName, "saAmfCompContainerCsi")) { if (value_is_deleted) -comp->comp_proxy_csi = ""; +comp->saAmfCompContainerCsi = ""; else -comp->comp_container_csi = Amf::to_string((SaNameT *)value); +comp->saAmfCompContainerCsi = Amf::to_string((SaNameT *)value); } else { osafassert(0); } @@ -1842,6 +1879,8 @@ void avd_comp_constructor(void) { bool AVD_COMP::is_preinstantiable() const { AVSV_COMP_TYPE_VAL category = comp_info.category; return ((category ==
[devel] [PATCH 3/5] amf: add support for container/contained [#70]
Add support for container/contained amf common. --- src/amf/common/amf_amfparam.h | 22 ++ src/amf/common/amf_d2nmsg.h | 11 +++ src/amf/common/amf_defs.h | 2 ++ src/amf/common/amf_util.h | 3 ++- src/amf/common/d2nedu.c | 22 +- src/amf/common/n2avaedu.c | 6 +- src/amf/common/n2avamsg.c | 13 + src/amf/common/util.c | 32 +--- 8 files changed, 105 insertions(+), 6 deletions(-) diff --git a/src/amf/common/amf_amfparam.h b/src/amf/common/amf_amfparam.h index ca3d7c869..2baa35fa8 100644 --- a/src/amf/common/amf_amfparam.h +++ b/src/amf/common/amf_amfparam.h @@ -67,6 +67,8 @@ typedef enum avsv_amf_cbk_type { AVSV_AMF_PXIED_COMP_CLEAN, AVSV_AMF_CSI_ATTR_CHANGE, AVSV_AMF_SC_STATUS_CHANGE, + AVSV_AMF_CONTAINED_COMP_INST, + AVSV_AMF_CONTAINED_COMP_CLEAN, AVSV_AMF_CBK_MAX } AVSV_AMF_CBK_TYPE; @@ -105,6 +107,14 @@ typedef struct avsv_amf_comp_reg_param_tag { SaNameT comp_name; /* comp name */ SaNameT proxy_comp_name; /* proxy comp name */ SaVersionT version; // SAF VERSION of component. +#define AVSV_AMF_CALLBACK_TERMINATE 0x01 +#define AVSV_AMF_CALLBACK_CSI_SET 0x02 +#define AVSV_AMF_CALLBACK_CSI_REMOVE 0x04 +#define AVSV_AMF_CALLBACK_CONTAINED_INST 0x08 +#define AVSV_AMF_CALLBACK_CONTAINED_CLEAN 0x10 +#define AVSV_AMF_CALLBACK_PROXIED_INST0x20 +#define AVSV_AMF_CALLBACK_PROXIED_CLEAN 0x40 + SaUint64T callbacks; } AVSV_AMF_COMP_REG_PARAM; /* component unregister */ @@ -284,6 +294,16 @@ typedef struct avsv_amf_pxied_comp_clean_param_tag { SaNameT comp_name; /* comp name */ } AVSV_AMF_PXIED_COMP_CLEAN_PARAM; +/* contained component instantiate */ +typedef struct avsv_amf_contained_comp_inst_param_tag { + SaNameT comp_name; /* comp name */ +} AVSV_AMF_CONTAINED_COMP_INST_PARAM; + +/* contained component cleanup */ +typedef struct avsv_amf_contained_comp_clean_param_tag { + SaNameT comp_name; /* comp name */ +} AVSV_AMF_CONTAINED_COMP_CLEAN_PARAM; + /* wrapper structure for all the callbacks */ typedef struct avsv_amf_cbk_info_tag { SaAmfHandleT hdl; /* AMF handle */ @@ -299,6 +319,8 @@ typedef struct avsv_amf_cbk_info_tag { AVSV_AMF_PXIED_COMP_CLEAN_PARAM pxied_comp_clean; AVSV_AMF_CSI_ATTR_CHANGE_PARAM csi_attr_change; AVSV_AMF_SC_STATUS_CHANGE_PARAM sc_status_change; +AVSV_AMF_CONTAINED_COMP_INST_PARAM contained_inst; +AVSV_AMF_CONTAINED_COMP_CLEAN_PARAM contained_clean; } param; } AVSV_AMF_CBK_INFO; diff --git a/src/amf/common/amf_d2nmsg.h b/src/amf/common/amf_d2nmsg.h index e99c0399c..187279d2a 100644 --- a/src/amf/common/amf_d2nmsg.h +++ b/src/amf/common/amf_d2nmsg.h @@ -52,6 +52,7 @@ extern "C" { #define AVSV_AVD_AVND_MSG_FMT_VER_5 5 #define AVSV_AVD_AVND_MSG_FMT_VER_6 6 #define AVSV_AVD_AVND_MSG_FMT_VER_7 7 +#define AVSV_AVD_AVND_MSG_FMT_VER_8 8 /* Internode/External Components Validation result */ typedef enum { @@ -110,6 +111,7 @@ typedef enum { AVSV_N2D_ND_SISU_STATE_INFO_MSG, AVSV_N2D_ND_CSICOMP_STATE_INFO_MSG, AVSV_D2N_COMPCSI_ASSIGN_MSG, + AVSV_D2N_CONTAINED_SU_MSG, AVSV_DND_MSG_MAX } AVSV_DND_MSG_TYPE; @@ -603,6 +605,14 @@ typedef struct avsv_d2n_presence_su_msg_info_tag { bool term_state; } AVSV_D2N_PRESENCE_SU_MSG_INFO; +typedef struct avsv_d2n_contained_su_msg_info_tag { + uint32_t msg_id; + SaClmNodeIdT node_id; + SaNameT container_su_name; + SaNameT contained_su_name; + bool term_state; +} AVSV_D2N_CONTAINED_SU_MSG_INFO; + typedef struct avsv_d2n_data_verify_msg_info { uint32_t snd_id_cnt; uint32_t rcv_id_cnt; @@ -701,6 +711,7 @@ typedef struct avsv_dnd_msg { AVSV_D2N_HB_MSG_INFO d2n_hb_info; AVSV_D2N_REBOOT_MSG_INFO d2n_reboot_info; AVSV_D2N_COMPCSI_ASSIGN_MSG_INFO d2n_compcsi_assign_msg_info; +AVSV_D2N_CONTAINED_SU_MSG_INFO d2n_contained_su_msg_info; } msg_info; } AVSV_DND_MSG; diff --git a/src/amf/common/amf_defs.h b/src/amf/common/amf_defs.h index 24549b3af..3ee5a5aca 100644 --- a/src/amf/common/amf_defs.h +++ b/src/amf/common/amf_defs.h @@ -72,6 +72,8 @@ typedef enum { AVSV_COMP_TYPE_EXTERNAL_PRE_INSTANTIABLE, AVSV_COMP_TYPE_EXTERNAL_NON_PRE_INSTANTIABLE, AVSV_COMP_TYPE_NON_SAF, + AVSV_COMP_TYPE_CONTAINER, + AVSV_COMP_TYPE_CONTAINED } AVSV_COMP_TYPE_VAL; /* diff --git a/src/amf/common/amf_util.h b/src/amf/common/amf_util.h index ffb8b21c6..15ecbcaad 100644 --- a/src/amf/common/amf_util.h +++ b/src/amf/common/amf_util.h @@ -50,7 +50,8 @@ extern "C" { #define IS_COMP_PROXIED_NPI(category) (((category)_AMF_COMP_PROXIED_NPI)) #define IS_COMP_LOCAL(category) \ - (((category)_AMF_COMP_SA_AWARE) || ((category)_AMF_COMP_LOCAL)) + (((category)_AMF_COMP_SA_AWARE) || ((category)_AMF_COMP_LOCAL) || \ + ((category)_AMF_COMP_CONTAINER) || ((category)_AMF_COMP_CONTAINED)) #define IS_COMP_CONTAINER(category) (((category)_AMF_COMP_CONTAINER)) diff --git
[devel] [PATCH 5/5] amf: add support for container/contained [#70]
Add support for container/contained samples. --- samples/amf/Makefile.am | 2 +- samples/amf/container/AppConfig-contained-2N.xml | 327 + samples/amf/container/AppConfig-container.xml| 331 ++ samples/amf/container/Makefile.am| 45 ++ samples/amf/container/README | 36 + samples/amf/container/amf_container_demo.c | 803 +++ samples/amf/container/amf_container_script | 101 +++ samples/configure.ac | 1 + tools/cluster_sim_uml/build_uml | 45 ++ 9 files changed, 1690 insertions(+), 1 deletion(-) create mode 100644 samples/amf/container/AppConfig-contained-2N.xml create mode 100644 samples/amf/container/AppConfig-container.xml create mode 100644 samples/amf/container/Makefile.am create mode 100644 samples/amf/container/README create mode 100644 samples/amf/container/amf_container_demo.c create mode 100755 samples/amf/container/amf_container_script diff --git a/samples/amf/Makefile.am b/samples/amf/Makefile.am index 447dedd20..7ebf9c3a5 100644 --- a/samples/amf/Makefile.am +++ b/samples/amf/Makefile.am @@ -19,5 +19,5 @@ include $(top_srcdir)/Makefile.common MAINTAINERCLEANFILES = Makefile.in -SUBDIRS = sa_aware non_sa_aware wrapper proxy api_demo +SUBDIRS = sa_aware non_sa_aware wrapper proxy api_demo container diff --git a/samples/amf/container/AppConfig-contained-2N.xml b/samples/amf/container/AppConfig-contained-2N.xml new file mode 100644 index 0..b8f7c572d --- /dev/null +++ b/samples/amf/container/AppConfig-contained-2N.xml @@ -0,0 +1,327 @@ + + + +http://www.saforum.org/IMMSchema; xsi:noNamespaceSchemaLocation="SAI-AIS-IMM-XSD-A.01.01.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;> + + safAppType=Contained1 + + + safSgType=Contained1 + + + safSuType=Contained1 + + + safCompType=Contained1 + + + safSvcType=Contained1 + + + safCSType=Contained1 + + + safVersion=1,safSvcType=Contained1 + + + safVersion=1,safAppType=Contained1 + + saAmfApptSGTypes + safVersion=1,safSgType=Contained1 + + + + safVersion=1,safSgType=Contained1 + + saAmfSgtRedundancyModel + 1 + + + saAmfSgtValidSuTypes + safVersion=1,safSuType=Contained1 + + + saAmfSgtDefAutoAdjustProb + 100 + + + saAmfSgtDefCompRestartProb + 40 + + + saAmfSgtDefCompRestartMax + 10 + + + saAmfSgtDefSuRestartProb + 40 + + + saAmfSgtDefSuRestartMax + 10 + + + + safVersion=1,safSuType=Contained1 + + saAmfSutIsExternal + 0 + + + saAmfSutDefSUFailover + 1 + + + saAmfSutProvidesSvcTypes + safVersion=1,safSvcType=Contained1 + + + + safVersion=1,safCompType=Contained1 + + saAmfCtCompCategory + 32 + + + saAmfCtSwBundle + safSmfBundle=Contained_2N + + + saAmfCtDefClcCliTimeout + 900 + + + saAmfCtDefCallbackTimeout + 900 + + + saAmfCtRelPathInstantiateCmd + amf_container_script + + + saAmfCtDefInstantiateCmdArgv + instantiate + + + saAmfCtRelPathCleanupCmd + amf_container_script + + + saAmfCtDefCleanupCmdArgv + cleanup_contained + + + saAmfCtDefQuiescingCompleteTimeout + 900 + + + saAmfCtDefRecoveryOnError + 2 + +
[devel] [PATCH 0/5] Review Request for amf: add support for container/contained [#70]
Summary: amfd: add support for container/contained [#70] Review request for Ticket(s): 70 Peer Reviewer(s): Gary, Ravi, Nagu, Hans Pull request to: Affected branch(es): develop Development branch: ticket-70 Base revision: e46d29e47ebf328f9bab041064070341ab94848f Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples y Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - revision 9c9f7e04c39fca9030025b0a8394eabf328a4c70 Author: Alex Jones Date: Mon, 13 Aug 2018 14:48:14 -0400 amf: add support for container/contained [#70] Add support for container/contained samples. revision cf9d7565376059239c0902555c1c4811db6deff2 Author: Alex Jones Date: Mon, 13 Aug 2018 14:48:14 -0400 amf: add support for container/contained [#70] Add support for container/contained for amf agent. revision 199d81e0e479d6caf0bed10598008f1423261ecd Author: Alex Jones Date: Mon, 13 Aug 2018 14:48:14 -0400 amf: add support for container/contained [#70] Add support for container/contained amf common. revision d53a063d34c2c9b96a95033fca25cd5b4fdb7f5b Author: Alex Jones Date: Mon, 13 Aug 2018 14:48:14 -0400 amfnd: add support for container/contained [#70] This ticket adds support for container/contained. revision 308b8d7335380120d025bc3b10924fa45aff1402 Author: Alex Jones Date: Mon, 13 Aug 2018 14:48:14 -0400 amfd: add support for container/contained [#70] This ticket adds support for container/contained in amfd. Added Files: samples/amf/container/amf_container_demo.c samples/amf/container/amf_container_script samples/amf/container/AppConfig-contained-2N.xml samples/amf/container/AppConfig-container.xml samples/amf/container/Makefile.am samples/amf/container/README Complete diffstat: -- samples/amf/Makefile.am | 2 +- samples/amf/container/AppConfig-contained-2N.xml | 327 + samples/amf/container/AppConfig-container.xml| 331 ++ samples/amf/container/Makefile.am| 45 ++ samples/amf/container/README | 36 + samples/amf/container/amf_container_demo.c | 803 +++ samples/amf/container/amf_container_script | 101 +++ samples/configure.ac | 1 + src/amf/agent/amf_agent.cc | 73 ++- src/amf/agent/ava_cb.h | 1 + src/amf/agent/ava_hdl.cc | 31 + src/amf/agent/ava_mds.cc | 34 +- src/amf/agent/ava_mds.h | 3 +- src/amf/agent/ava_op.cc | 6 + src/amf/amfd/comp.cc | 65 +- src/amf/amfd/comp.h | 4 +- src/amf/amfd/comptype.cc | 6 +- src/amf/amfd/csi.cc | 6 + src/amf/amfd/csi.h | 3 + src/amf/amfd/ndproc.cc | 14 + src/amf/amfd/node.cc | 29 + src/amf/amfd/node.h | 1 + src/amf/amfd/sg.cc | 29 + src/amf/amfd/sg.h| 4 + src/amf/amfd/sgproc.cc | 142 +++- src/amf/amfd/si.cc | 17 + src/amf/amfd/si.h| 1 + src/amf/amfd/su.cc | 155 - src/amf/amfd/su.h| 15 +- src/amf/amfd/util.cc | 39 ++ src/amf/amfd/util.h | 2 + src/amf/amfnd/amfnd.cc | 5 +- src/amf/amfnd/avnd_cb.h | 2 + src/amf/amfnd/avnd_comp.h| 64 +- src/amf/amfnd/avnd_evt.h | 1 + src/amf/amfnd/avnd_mds.h | 4 +- src/amf/amfnd/avnd_proc.h| 2 + src/amf/amfnd/avnd_su.h | 4 + src/amf/amfnd/cbq.cc | 102 ++- src/amf/amfnd/chc.cc | 2 +- src/amf/amfnd/clc.cc | 95 ++- src/amf/amfnd/comp.cc| 90 ++- src/amf/amfnd/compdb.cc | 22 +- src/amf/amfnd/err.cc | 2 +- src/amf/amfnd/evt.cc | 2 + src/amf/amfnd/main.cc|
[devel] [PATCH 2/5] amfnd: add support for container/contained [#70]
This ticket adds support for container/contained. --- src/amf/amfnd/amfnd.cc| 5 ++- src/amf/amfnd/avnd_cb.h | 2 + src/amf/amfnd/avnd_comp.h | 64 + src/amf/amfnd/avnd_evt.h | 1 + src/amf/amfnd/avnd_mds.h | 4 +- src/amf/amfnd/avnd_proc.h | 2 + src/amf/amfnd/avnd_su.h | 4 ++ src/amf/amfnd/cbq.cc | 102 ++ src/amf/amfnd/chc.cc | 2 +- src/amf/amfnd/clc.cc | 95 -- src/amf/amfnd/comp.cc | 90 src/amf/amfnd/compdb.cc | 22 +- src/amf/amfnd/err.cc | 2 +- src/amf/amfnd/evt.cc | 2 + src/amf/amfnd/main.cc | 1 + src/amf/amfnd/mds.cc | 23 ++- src/amf/amfnd/proxy.cc| 2 +- src/amf/amfnd/su.cc | 8 src/amf/amfnd/susm.cc | 88 --- 19 files changed, 440 insertions(+), 79 deletions(-) diff --git a/src/amf/amfnd/amfnd.cc b/src/amf/amfnd/amfnd.cc index 3ac3f8fb0..9e8739bee 100644 --- a/src/amf/amfnd/amfnd.cc +++ b/src/amf/amfnd/amfnd.cc @@ -30,6 +30,7 @@ // Remember MDS install version of Agents. It can be used to send msg to Agent // based on their versions. std::map agent_mds_ver_db; +std::set container_csis; extern const AVND_EVT_HDLR g_avnd_func_list[AVND_EVT_MAX]; static uint32_t avnd_evt_avnd_avnd_api_msg_hdl(AVND_CB *cb, AVND_EVT *evt); @@ -78,7 +79,7 @@ uint32_t avnd_evt_avnd_avnd_evh(AVND_CB *cb, AVND_EVT *evt) { goto done; } -avnd_comp_cbq_rec_pop_and_del(cb, o_comp, cbk_rec, false); +avnd_comp_cbq_rec_pop_and_del(cb, o_comp, cbk_rec->opq_hdl, false); goto done; } @@ -373,7 +374,7 @@ uint32_t avnd_evt_avnd_avnd_cbk_msg_hdl(AVND_CB *cb, AVND_EVT *evt) { /* pop & delete */ uint32_t found; -m_AVND_COMP_CBQ_REC_POP(comp, rec, found); +rec = avnd_comp_cbq_rec_pop(comp, rec->opq_hdl, found); rec->cbk_info = 0; if (found) avnd_comp_cbq_rec_del(cb, comp, rec); } diff --git a/src/amf/amfnd/avnd_cb.h b/src/amf/amfnd/avnd_cb.h index ff21e3108..8b0cc2304 100644 --- a/src/amf/amfnd/avnd_cb.h +++ b/src/amf/amfnd/avnd_cb.h @@ -33,6 +33,7 @@ #ifndef AMF_AMFND_AVND_CB_H_ #define AMF_AMFND_AVND_CB_H_ #include +#include #include typedef struct avnd_cb_tag { @@ -151,5 +152,6 @@ void cb_increment_su_failover_count(AVND_CB , const AVND_SU ); extern AVND_CB *avnd_cb; extern std::map agent_mds_ver_db; +extern std::set container_csis; #endif // AMF_AMFND_AVND_CB_H_ diff --git a/src/amf/amfnd/avnd_comp.h b/src/amf/amfnd/avnd_comp.h index 611e90e11..b02e704a4 100644 --- a/src/amf/amfnd/avnd_comp.h +++ b/src/amf/amfnd/avnd_comp.h @@ -31,6 +31,7 @@ #define AMF_AMFND_AVND_COMP_H_ #include +#include struct avnd_cb_tag; struct avnd_su_si_rec; @@ -72,6 +73,7 @@ typedef enum avnd_comp_clc_pres_fsm_ev { AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL, AVND_COMP_CLC_PRES_FSM_EV_RESTART, AVND_COMP_CLC_PRES_FSM_EV_ORPH, + AVND_COMP_CLC_PRES_FSM_EV_INST_TRY_AGAIN, AVND_COMP_CLC_PRES_FSM_EV_MAX } AVND_COMP_CLC_PRES_FSM_EV; @@ -324,6 +326,7 @@ typedef struct avnd_comp_tag { std::string name; /* comp name */ std::string saAmfCompType; + std::string saAmfCompContainerCsi; uint32_t numOfCompCmdEnv; /* number of comp command environment variables */ SaStringT *saAmfCompCmdEnv; /* comp command environment variables */ uint32_t inst_level;/* comp instantiation level */ @@ -384,6 +387,9 @@ typedef struct avnd_comp_tag { struct avnd_comp_tag *pxy_comp; /* ptr to the proxy comp (if any) */ + // list of associated contained sus. + std::vector list_of_contained_sus; + AVND_COMP_CLC_PRES_FSM_EV pend_evt; /* stores last fsm event got in orph state */ @@ -412,6 +418,9 @@ typedef struct avnd_comp_tag { SaInvocationT term_cbq_inv_value; /* invocation value for termination callback. */ SaVersionT version; // SAF version of comp. + + bool container(void) const; + bool contained(void) const; } AVND_COMP; #define AVND_COMP_NULL ((AVND_COMP *)0) @@ -457,6 +466,8 @@ typedef struct avnd_comp_tag { #define AVND_COMP_TYPE_PROXIED 0x0004 #define AVND_COMP_TYPE_PREINSTANTIABLE 0x0008 #define AVND_COMP_TYPE_SAAWARE 0x0010 +#define AVND_COMP_TYPE_CONTAINER 0x0020 +#define AVND_COMP_TYPE_CONTAINED 0x0040 /* component state (comp-reg, failed etc.) values */ #define AVND_COMP_FLAG_REG 0x0100 @@ -492,6 +503,8 @@ typedef struct avnd_comp_tag { #define m_AVND_COMP_TYPE_IS_PREINSTANTIABLE(x) \ (((x)->flag) & AVND_COMP_TYPE_PREINSTANTIABLE) #define m_AVND_COMP_TYPE_IS_SAAWARE(x) (((x)->flag) & AVND_COMP_TYPE_SAAWARE) +#define m_AVND_COMP_TYPE_IS_CONTAINER(x) (((x)->flag) & AVND_COMP_TYPE_CONTAINER) +#define m_AVND_COMP_TYPE_IS_CONTAINED(x) (((x)->flag) & AVND_COMP_TYPE_CONTAINED) /* macros for setting the comp types */ #define m_AVND_COMP_TYPE_SET(x, bitmap)
Re: [devel] [PATCH 0/1] Review Request for amf: add support for container/contained [#70]
Better, as in it's not happening anymore? :) Alex On 08/05/2018 08:46 PM, Gary Lee wrote: __ NOTICE: This email was received from an EXTERNAL sender __ Hi Alex I can reproduce the coredump by doing "immcfg -f AppConfig-2N.xml" (the amf_demo sample). It looks better with the patch. Thanks Gary From: Alex Jones [1] Organization: Ribbon Date: Saturday, 4 August 2018 at 12:59 am To: Gary Lee [2], [3], [4], [5] Cc: [6] Subject: Re: [devel] [PATCH 0/1] Review Request for amf: add support for container/contained [#70] Hi Gary, The check to make sure saAmfCompContainerCsi is defined for a contained component, was not handling the case in which the comptype was being added dynamically in the same ccb. I assume that is what your tests are doing... Try the attached patch on top of what I sent. Alex On 08/02/2018 09:50 PM, Gary Lee wrote: ___ NOTICE: This email was received from an EXTERNAL sender ___ Some more info from valgrind: ==274== Invalid read of size 1 ==274== at 0x14C080: is_config_valid(std::__cxx11::basic_string, std::allocator > const&, SaImmAttrValuesT_2 const**, CcbUtilOperationData*) [clone .constprop.98] (comp.cc:404) ==274== by 0x14C4A0: comp_ccb_completed_cb(CcbUtilOperationData*) (comp.cc:1359) ==274== by 0x16B4C7: ccb_completed_cb(unsigned long long, unsigned long long) (imm.cc:1177) ==274== by 0x548D487: imma_process_callback_info(imma_cb*, imma_client_node*, imma_callback_info*, unsigned long long) (imma_proc.cc:2337) ==274== by 0x548EED8: imma_hdl_callbk_dispatch_all(imma_cb*, unsigned long long) (imma_proc.cc:1832) ==274== by 0x5482FA6: saImmOiDispatch (imma_oi_api.cc:642) ==274== by 0x12866B: main_loop (main.cc:713) ==274== by 0x12866B: main (main.cc:844) ==274== Address 0x20 is not stack'd, malloc'd or (recently) free'd On 3/8/18, 11:25 am, "Gary Lee" [7] wrote: Hi Alex I haven't had a chance to look at it, but I did run our regression tests with the patch. amfd is segfaulting regularly, with backtraces like the attachment. Thanks Gary From: Alex Jones [8] Organization: Ribbon Date: Thursday, 2 August 2018 at 3:52 am To: [9], [10], [11], [12] Cc: [13] Subject: Re: [PATCH 0/1] Review Request for amf: add support for container/contained [#70] Hi Guys, I realized I forgot to put some notes in this review request... 75% of this code is from Praveen. I added some stuff that wasn't there like shutting down contained sus when the container goes down, allowing TRY_AGAIN in contained instantiation, and some more configuration checking. Everything in the B.04.01 spec regarding container/contained should be implemented, but I have not testing everything. Everything in the samples/amf/container directory has been tested (container n-way, contained 2-n, TRY_AGAIN for contained instantiation), but I have not tested other service models for the contained SG. I have also tested locking the container SUs, and the container SG, to make sure the contained SUs go down. Let me know if you see problems, or think something wasn't done right. Alex On 07/31/2018 04:22 PM, Alex Jones wrote: Summary: amf: add support for container/contained [#70] Review request for Ticket(s): 70 Peer Reviewer(s): Nagu, Hans, Ravi, Gary Pull request to: Affected branch(es): develop Development branch: ticket-70 Base revision: 7f6f6c0531a0f5e4f2b0dc1abf4bab6962a3d1a9 Personal repository: [14]git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services y OpenSAF services n Core libraries n Samples n Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - revision d33e50eeb51ccf8808c24a445637d6f1472c396e Author: Alex Jones [15] Date: Tue, 31 Jul 2018 16:06:47 -0400 amf: add support for container/contained [#70] This ticket adds support for container/contained for AMF. Added Files: samples/amf/container/amf_container_demo.c samples/amf/container/amf_container_script samples/amf/container/AppConfig-contained-2N.xml
Re: [devel] [PATCH 0/1] Review Request for amf: add support for container/contained [#70]
Hi Gary, The check to make sure saAmfCompContainerCsi is defined for a contained component, was not handling the case in which the comptype was being added dynamically in the same ccb. I assume that is what your tests are doing... Try the attached patch on top of what I sent. Alex On 08/02/2018 09:50 PM, Gary Lee wrote: __ NOTICE: This email was received from an EXTERNAL sender __ Some more info from valgrind: ==274== Invalid read of size 1 ==274== at 0x14C080: is_config_valid(std::__cxx11::basic_string, std::allocator > const&, SaImmAttrValuesT_2 const**, CcbUtilOperationData*) [clone .constprop.98] (comp.cc:404) ==274== by 0x14C4A0: comp_ccb_completed_cb(CcbUtilOperationData*) (comp.cc:1359) ==274== by 0x16B4C7: ccb_completed_cb(unsigned long long, unsigned long long) (imm.cc:1177) ==274== by 0x548D487: imma_process_callback_info(imma_cb*, imma_client_node*, imma_callback_info*, unsigned long long) (imma_proc.cc:2337) ==274== by 0x548EED8: imma_hdl_callbk_dispatch_all(imma_cb*, unsigned long long) (imma_proc.cc:1832) ==274== by 0x5482FA6: saImmOiDispatch (imma_oi_api.cc:642) ==274== by 0x12866B: main_loop (main.cc:713) ==274== by 0x12866B: main (main.cc:844) ==274== Address 0x20 is not stack'd, malloc'd or (recently) free'd On 3/8/18, 11:25 am, "Gary Lee" [1] wrote: Hi Alex I haven't had a chance to look at it, but I did run our regression tests with the patch. amfd is segfaulting regularly, with backtraces like the attachment. Thanks Gary From: Alex Jones [2] Organization: Ribbon Date: Thursday, 2 August 2018 at 3:52 am To: [3], [4], [5], [6] Cc: [7] Subject: Re: [PATCH 0/1] Review Request for amf: add support for container/contained [#70] Hi Guys, I realized I forgot to put some notes in this review request... 75% of this code is from Praveen. I added some stuff that wasn't there like shutting down contained sus when the container goes down, allowing TRY_AGAIN in contained instantiation, and some more configuration checking. Everything in the B.04.01 spec regarding container/contained should be implemented, but I have not testing everything. Everything in the samples/amf/container directory has been tested (container n-way, contained 2-n, TRY_AGAIN for contained instantiation), but I have not tested other service models for the contained SG. I have also tested locking the container SUs, and the container SG, to make sure the contained SUs go down. Let me know if you see problems, or think something wasn't done right. Alex On 07/31/2018 04:22 PM, Alex Jones wrote: Summary: amf: add support for container/contained [#70] Review request for Ticket(s): 70 Peer Reviewer(s): Nagu, Hans, Ravi, Gary Pull request to: Affected branch(es): develop Development branch: ticket-70 Base revision: 7f6f6c0531a0f5e4f2b0dc1abf4bab6962a3d1a9 Personal repository: [8]git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services y OpenSAF services n Core libraries n Samples n Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - revision d33e50eeb51ccf8808c24a445637d6f1472c396e Author: Alex Jones [9] Date: Tue, 31 Jul 2018 16:06:47 -0400 amf: add support for container/contained [#70] This ticket adds support for container/contained for AMF. Added Files: samples/amf/container/amf_container_demo.c samples/amf/container/amf_container_script samples/amf/container/AppConfig-contained-2N.xml samples/amf/container/AppConfig-container.xml samples/amf/container/Makefile.am samples/amf/container/README Complete diffstat: -- samples/amf/container/AppConfig-contained-2N.xml | 327 + samples/amf/container/AppConfig-container.xml | 331 ++ samples/amf/container/Makefile.am | 45 ++ samples/amf/container/README | 36 + samples/amf/container/amf_container_demo.c | 803 +++ samples/amf/container/amf_container_script | 101 +++ src/amf/agent/amf_agent.cc | 73 ++- src/amf/agent/ava_cb.h | 1 + src/amf/agent/ava_hdl.cc | 31 + src/amf/agent/ava_mds.cc | 34 +- src/amf/agent/ava_mds.h | 3 +- src/amf/agent/ava_op.cc | 6 + src/amf/amfd/comp.cc | 55 +- src/amf/amfd/comp.h | 4 +- src/amf/amfd/comptype.cc | 6 +- src/amf/amfd/csi.cc | 6 + src/amf/amfd/csi.h | 3 + src/amf/amfd/ndproc.cc | 14 + src/amf/amfd/nod
Re: [devel] [PATCH 0/1] Review Request for amf: add support for container/contained [#70]
Hi Guys, I realized I forgot to put some notes in this review request... 75% of this code is from Praveen. I added some stuff that wasn't there like shutting down contained sus when the container goes down, allowing TRY_AGAIN in contained instantiation, and some more configuration checking. Everything in the B.04.01 spec regarding container/contained should be implemented, but I have not testing everything. Everything in the samples/amf/container directory has been tested (container n-way, contained 2-n, TRY_AGAIN for contained instantiation), but I have not tested other service models for the contained SG. I have also tested locking the container SUs, and the container SG, to make sure the contained SUs go down. Let me know if you see problems, or think something wasn't done right. Alex On 07/31/2018 04:22 PM, Alex Jones wrote: Summary: amf: add support for container/contained [#70] Review request for Ticket(s): 70 Peer Reviewer(s): Nagu, Hans, Ravi, Gary Pull request to: Affected branch(es): develop Development branch: ticket-70 Base revision: 7f6f6c0531a0f5e4f2b0dc1abf4bab6962a3d1a9 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - revision d33e50eeb51ccf8808c24a445637d6f1472c396e Author: Alex Jones [1] Date: Tue, 31 Jul 2018 16:06:47 -0400 amf: add support for container/contained [#70] This ticket adds support for container/contained for AMF. Added Files: samples/amf/container/amf_container_demo.c samples/amf/container/amf_container_script samples/amf/container/AppConfig-contained-2N.xml samples/amf/container/AppConfig-container.xml samples/amf/container/Makefile.am samples/amf/container/README Complete diffstat: -- samples/amf/container/AppConfig-contained-2N.xml | 327 + samples/amf/container/AppConfig-container.xml| 331 ++ samples/amf/container/Makefile.am| 45 ++ samples/amf/container/README | 36 + samples/amf/container/amf_container_demo.c | 803 +++ samples/amf/container/amf_container_script | 101 +++ src/amf/agent/amf_agent.cc | 73 ++- src/amf/agent/ava_cb.h | 1 + src/amf/agent/ava_hdl.cc | 31 + src/amf/agent/ava_mds.cc | 34 +- src/amf/agent/ava_mds.h | 3 +- src/amf/agent/ava_op.cc | 6 + src/amf/amfd/comp.cc | 55 +- src/amf/amfd/comp.h | 4 +- src/amf/amfd/comptype.cc | 6 +- src/amf/amfd/csi.cc | 6 + src/amf/amfd/csi.h | 3 + src/amf/amfd/ndproc.cc | 14 + src/amf/amfd/node.cc | 29 + src/amf/amfd/node.h | 1 + src/amf/amfd/sg.cc | 29 + src/amf/amfd/sg.h| 4 + src/amf/amfd/sgproc.cc | 142 +++- src/amf/amfd/si.cc | 17 + src/amf/amfd/si.h| 1 + src/amf/amfd/su.cc | 155 - src/amf/amfd/su.h| 15 +- src/amf/amfd/util.cc | 39 ++ src/amf/amfd/util.h | 2 + src/amf/amfnd/amfnd.cc | 5 +- src/amf/amfnd/avnd_cb.h | 2 + src/amf/amfnd/avnd_comp.h| 64 +- src/amf/amfnd/avnd_evt.h | 1 + src/amf/amfnd/avnd_proc.h| 2 + src/amf/amfnd/avnd_su.h | 4 + src/amf/amfnd/cbq.cc | 102 ++- src/amf/amfnd/chc.cc | 2 +- src/amf/amfnd/clc.cc | 95 ++- src/amf/amfnd/comp.cc| 90 ++- src/amf/amfnd/compdb.cc | 22 +- src/amf/amfnd/err.cc | 2 +- src/amf/amfnd/evt.cc | 2 + src/amf/amfnd/main.cc| 1 + src/amf/amfnd/mds.cc | 19 + src/amf/amfnd/proxy.cc
[devel] [PATCH 0/1] Review Request for amf: add support for container/contained [#70]
Summary: amf: add support for container/contained [#70] Review request for Ticket(s): 70 Peer Reviewer(s): Nagu, Hans, Ravi, Gary Pull request to: Affected branch(es): develop Development branch: ticket-70 Base revision: 7f6f6c0531a0f5e4f2b0dc1abf4bab6962a3d1a9 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - revision d33e50eeb51ccf8808c24a445637d6f1472c396e Author: Alex Jones Date: Tue, 31 Jul 2018 16:06:47 -0400 amf: add support for container/contained [#70] This ticket adds support for container/contained for AMF. Added Files: samples/amf/container/amf_container_demo.c samples/amf/container/amf_container_script samples/amf/container/AppConfig-contained-2N.xml samples/amf/container/AppConfig-container.xml samples/amf/container/Makefile.am samples/amf/container/README Complete diffstat: -- samples/amf/container/AppConfig-contained-2N.xml | 327 + samples/amf/container/AppConfig-container.xml| 331 ++ samples/amf/container/Makefile.am| 45 ++ samples/amf/container/README | 36 + samples/amf/container/amf_container_demo.c | 803 +++ samples/amf/container/amf_container_script | 101 +++ src/amf/agent/amf_agent.cc | 73 ++- src/amf/agent/ava_cb.h | 1 + src/amf/agent/ava_hdl.cc | 31 + src/amf/agent/ava_mds.cc | 34 +- src/amf/agent/ava_mds.h | 3 +- src/amf/agent/ava_op.cc | 6 + src/amf/amfd/comp.cc | 55 +- src/amf/amfd/comp.h | 4 +- src/amf/amfd/comptype.cc | 6 +- src/amf/amfd/csi.cc | 6 + src/amf/amfd/csi.h | 3 + src/amf/amfd/ndproc.cc | 14 + src/amf/amfd/node.cc | 29 + src/amf/amfd/node.h | 1 + src/amf/amfd/sg.cc | 29 + src/amf/amfd/sg.h| 4 + src/amf/amfd/sgproc.cc | 142 +++- src/amf/amfd/si.cc | 17 + src/amf/amfd/si.h| 1 + src/amf/amfd/su.cc | 155 - src/amf/amfd/su.h| 15 +- src/amf/amfd/util.cc | 39 ++ src/amf/amfd/util.h | 2 + src/amf/amfnd/amfnd.cc | 5 +- src/amf/amfnd/avnd_cb.h | 2 + src/amf/amfnd/avnd_comp.h| 64 +- src/amf/amfnd/avnd_evt.h | 1 + src/amf/amfnd/avnd_proc.h| 2 + src/amf/amfnd/avnd_su.h | 4 + src/amf/amfnd/cbq.cc | 102 ++- src/amf/amfnd/chc.cc | 2 +- src/amf/amfnd/clc.cc | 95 ++- src/amf/amfnd/comp.cc| 90 ++- src/amf/amfnd/compdb.cc | 22 +- src/amf/amfnd/err.cc | 2 +- src/amf/amfnd/evt.cc | 2 + src/amf/amfnd/main.cc| 1 + src/amf/amfnd/mds.cc | 19 + src/amf/amfnd/proxy.cc | 2 +- src/amf/amfnd/su.cc | 8 + src/amf/amfnd/susm.cc| 88 ++- src/amf/common/amf_amfparam.h| 22 + src/amf/common/amf_d2nmsg.h | 10 + src/amf/common/amf_defs.h| 2 + src/amf/common/amf_util.h| 3 +- src/amf/common/d2nedu.c | 22 +- src/amf/common/n2avaedu.c| 6 +- src/amf/common/n2avamsg.c| 13 + src/amf/common/util.c| 32 +- tools/cluster_sim_uml/build_uml | 45 ++ 56 files changed, 2852 insertions(+), 127 deletions(-) Testing Commands: - *** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES *** Testing, Expected Results: -- *** PASTE COMMAND OUTPUTS / TE
[devel] [PATCH 1/1] msg: update msg to use CLM B.04.01 [#2841]
Update msgd and msgnd to use CLM B.04.01. --- src/msg/Makefile.am | 2 -- src/msg/common/mqsv_def.h | 5 + src/msg/msgd/mqd_api.c| 15 --- src/msg/msgd/mqd_clm.c| 17 +++-- src/msg/msgd/mqd_clm.h| 10 -- src/msg/msgnd/mqnd_init.c | 18 +- src/msg/msgnd/mqnd_proc.c | 17 +++-- src/msg/msgnd/mqnd_proc.h | 10 -- 8 files changed, 68 insertions(+), 26 deletions(-) diff --git a/src/msg/Makefile.am b/src/msg/Makefile.am index dd282504e..d77251609 100644 --- a/src/msg/Makefile.am +++ b/src/msg/Makefile.am @@ -135,7 +135,6 @@ dist_pkgsysconf_DATA += \ src/msg/msgnd/msgnd.conf bin_osafmsgnd_CPPFLAGS = \ - -DSA_CLM_B01=1 \ -DNCS_MQND=1 -DASAPi_DEBUG=1 \ $(AM_CPPFLAGS) @@ -166,7 +165,6 @@ bin_osafmsgnd_LDADD = \ lib/libopensaf_core.la bin_osafmsgd_CPPFLAGS = \ - -DSA_CLM_B01=1 \ -DNCS_MQD=1 -DASAPi_DEBUG=1 \ $(AM_CPPFLAGS) diff --git a/src/msg/common/mqsv_def.h b/src/msg/common/mqsv_def.h index bfeb8bc71..de805d6cf 100644 --- a/src/msg/common/mqsv_def.h +++ b/src/msg/common/mqsv_def.h @@ -80,6 +80,11 @@ typedef struct mqsv_dsend_info { amf_ver.majorVersion = 0x01; \ amf_ver.minorVersion = 0x01; +#define m_MQSV_GET_CLM_VER(clm_ver) \ + clm_ver.releaseCode = 'B';\ + clm_ver.majorVersion = 0x04; \ + clm_ver.minorVersion = 0x01; + #define m_MQSV_IS_ACKFLAGS_NOT_VALID(ackFlags) \ ((ackFlags) && ((ackFlags) != SA_MSG_MESSAGE_DELIVERED_ACK)) diff --git a/src/msg/msgd/mqd_api.c b/src/msg/msgd/mqd_api.c index 83d5c2198..ee92f8375 100644 --- a/src/msg/msgd/mqd_api.c +++ b/src/msg/msgd/mqd_api.c @@ -113,17 +113,17 @@ static SaAisErrorT mqd_clm_init(MQD_CB *cb) do { SaVersionT clm_version; - SaClmCallbacksT mqd_clm_cbk; + SaClmCallbacksT_4 mqd_clm_cbk; - m_MQSV_GET_AMF_VER(clm_version); + m_MQSV_GET_CLM_VER(clm_version); mqd_clm_cbk.saClmClusterNodeGetCallback = NULL; mqd_clm_cbk.saClmClusterTrackCallback = mqd_clm_cluster_track_callback; saErr = - saClmInitialize(>clm_hdl, _clm_cbk, _version); + saClmInitialize_4(>clm_hdl, _clm_cbk, _version); if (saErr != SA_AIS_OK) { - LOG_ER("saClmInitialize failed with error %u", + LOG_ER("saClmInitialize_4 failed with error %u", (unsigned)saErr); break; } @@ -137,10 +137,11 @@ static SaAisErrorT mqd_clm_init(MQD_CB *cb) } TRACE_1("saClmSelectionObjectGet success"); - saErr = - saClmClusterTrack(cb->clm_hdl, SA_TRACK_CHANGES_ONLY, NULL); + saErr = saClmClusterTrack_4(cb->clm_hdl, + SA_TRACK_CHANGES_ONLY, + NULL); if (SA_AIS_OK != saErr) { - LOG_ER("saClmClusterTrack failed with error %u", + LOG_ER("saClmClusterTrack_4 failed with error %u", (unsigned)saErr); break; } diff --git a/src/msg/msgd/mqd_clm.c b/src/msg/msgd/mqd_clm.c index 41d9bcf15..ce285283c 100644 --- a/src/msg/msgd/mqd_clm.c +++ b/src/msg/msgd/mqd_clm.c @@ -39,8 +39,14 @@ extern MQDLIB_INFO gl_mqdinfo; * **/ void mqd_clm_cluster_track_callback( -const SaClmClusterNotificationBufferT *notificationBuffer, -SaUint32T numberOfMembers, SaAisErrorT error) + const SaClmClusterNotificationBufferT_4 *notificationBuffer, + SaUint32T numberOfMembers, + SaInvocationT invocation, + const SaNameT *rootCauseEntity, + const SaNtfCorrelationIdsT *correlationIds, + SaClmChangeStepT step, + SaTimeT timeSupervision, + SaAisErrorT error) { MQD_CB *pMqd = 0; SaClmNodeIdT node_id; @@ -49,6 +55,11 @@ void mqd_clm_cluster_track_callback( TRACE_ENTER2("cluster change=%d", notificationBuffer->notification[counter].clusterChange); + if (error != SA_AIS_OK) { + LOG_ER("mqd_clm_cluster_track_callback error: %i", error); + goto done; + } + /* Get the Controll block */ pMqd = ncshm_take_hdl(NCS_SERVICE_ID_MQD, gl_mqdinfo.inst_hdl); if (!pMqd) { @@ -116,6 +127,8 @@ void mqd_clm_cluster_track_callback( } } ncshm_give_hdl(pMqd->hdl); + +done: TRACE_LEAVE(); } diff --git a/src/msg/msgd/mqd_clm.h b/src/msg/msgd/mqd_clm.h index 0bb42dbc2..1c06dc641 100644 --- a/src/msg/msgd/mqd_clm.h +++ b/src/msg/msgd/mqd_clm.h @@ -33,8 +33,14 @@ #include void mqd_clm_cluster_track_callback( -
[devel] [PATCH 1/1] msgd: put node down handling on thread [#2852]
If multiple nodes go down simultaneously which are hosting msg queues (e.g. multiple VMs on a host, and the host goes down), msgd can take a long time to process the node downs which blocks the main thread, and therefore the healthcheck doesn't get processed, so msgd dies, which restarts the controller. msgd needs to sit in a loop waiting for imm to release the implementers for each of the down nodes. For many nodes which went down simultaneously this can take up to 20 seconds when done serially. Node down logic needs to be put on a thread, so that we can continue to process other messages like healthcheck. This also allows us to parallelize the node down handling. --- src/msg/msgd/mqd_asapi.c | 17 +++-- src/msg/msgd/mqd_clm.c | 183 +++ src/msg/msgd/mqd_evt.c | 5 ++ src/msg/msgd/mqd_mbcsv.c | 16 +++-- src/msg/msgd/mqd_ntf.cc | 4 ++ 5 files changed, 151 insertions(+), 74 deletions(-) diff --git a/src/msg/msgd/mqd_asapi.c b/src/msg/msgd/mqd_asapi.c index c44df4d2a..eb760ca8e 100644 --- a/src/msg/msgd/mqd_asapi.c +++ b/src/msg/msgd/mqd_asapi.c @@ -1298,18 +1298,18 @@ static uint32_t mqd_asapi_queue_make(MQD_OBJ_INFO *pObjInfo, "%s:%u:ERR_MEMORY:Failed To Allocate Memory for QGroups", __FILE__, __LINE__); return SA_AIS_ERR_NO_MEMORY; - return SA_AIS_ERR_NO_MEMORY; } itr.state = 0; - for (idx = 0; idx < qcnt; idx++) { - pOelm = (MQD_OBJECT_ELEM *)ncs_walk_items( - >ilist, ); + idx = 0; + while ((pOelm = (MQD_OBJECT_ELEM *)ncs_walk_items( + >ilist, ))) { memcpy([idx].name, >pObject->name, sizeof(SaNameT)); mqd_qparam_fill(>pObject->info.q, [idx]); + idx++; } } } else { @@ -1632,6 +1632,8 @@ void mqd_nd_restart_update_dest_info(MQD_CB *pMqd, MDS_DEST dest) NCS_Q_ITR itr; uint32_t count = 0; + m_NCS_LOCK(>mqd_cb_lock, NCS_LOCK_WRITE); + pObjNode = (MQD_OBJ_NODE *)ncs_patricia_tree_getnext(>qdb, (uint8_t *)NULL); while (pObjNode) { @@ -1686,6 +1688,8 @@ void mqd_nd_restart_update_dest_info(MQD_CB *pMqd, MDS_DEST dest) pObjNode = (MQD_OBJ_NODE *)ncs_patricia_tree_getnext( >qdb, (uint8_t *)); } + + m_NCS_UNLOCK(>mqd_cb_lock, NCS_LOCK_WRITE); } /\ @@ -1707,6 +1711,8 @@ void mqd_nd_down_update_info(MQD_CB *pMqd, MDS_DEST dest) NCS_Q_ITR itr; uint32_t count = 0; + m_NCS_LOCK(>mqd_cb_lock, NCS_LOCK_WRITE); + pObjNode = (MQD_OBJ_NODE *)ncs_patricia_tree_getnext(>qdb, (uint8_t *)NULL); while (pObjNode) { @@ -1757,6 +1763,9 @@ void mqd_nd_down_update_info(MQD_CB *pMqd, MDS_DEST dest) pObjNode = (MQD_OBJ_NODE *)ncs_patricia_tree_getnext( >qdb, (uint8_t *)); } + + m_NCS_UNLOCK(>mqd_cb_lock, NCS_LOCK_WRITE); + return; } diff --git a/src/msg/msgd/mqd_clm.c b/src/msg/msgd/mqd_clm.c index 41d9bcf15..0dbb21b23 100644 --- a/src/msg/msgd/mqd_clm.c +++ b/src/msg/msgd/mqd_clm.c @@ -119,84 +119,141 @@ void mqd_clm_cluster_track_callback( TRACE_LEAVE(); } -void mqd_del_node_down_info(MQD_CB *pMqd, NODE_ID nodeid) +static void * _mqd_del_node_down_info(void *arg) { - MQD_OBJ_NODE *pNode = 0; - MQD_A2S_MSG msg; - SaImmOiHandleT immOiHandle; + NODE_ID nodeid = *(NODE_ID *) arg; SaAisErrorT rc = SA_AIS_OK; - SaImmOiImplementerNameT implementer_name; - int retries = 5; - char i_name[256] = {0}; - SaVersionT imm_version = {'A', 0x02, 0x01}; + SaImmOiHandleT immOiHandle = 0; + MQD_CB *pMqd = ncshm_take_hdl(NCS_SERVICE_ID_MQD, gl_mqdinfo.inst_hdl); + TRACE_ENTER2("nodeid=%u", nodeid); - rc = immutil_saImmOiInitialize_2(, NULL, _version); - if (rc != SA_AIS_OK) - LOG_ER("saImmOiInitialize_2 failed with return value=%d", rc); + free(arg); - snprintf(i_name, SA_MAX_NAME_LENGTH, "%s%u", "MsgQueueService", nodeid); - implementer_name = i_name; + do { + MQD_OBJ_NODE *pNode = 0; + MQD_A2S_MSG msg; + SaImmOiImplementerNameT implementer_name; + int retries = 5; + char i_name[256] = {0}; +
[devel] [PATCH 0/1] Review Request for msgd: put node down handling on thread [#2852]
Summary: msgd: put node down handling on thread [#2852] Review request for Ticket(s): 2852 Peer Reviewer(s): Srinivas Pull request to: Affected branch(es): develop Development branch: ticket-2852 Base revision: 93e2808fb0bd3143a77e31dd2f0115a6596479ed Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision 9bb598f8390aaf41c1e0dcd458ee0d82fae58999 Author: Alex Jones <ajo...@rbbn.com> Date: Fri, 11 May 2018 11:04:34 -0400 msgd: put node down handling on thread [#2852] If multiple nodes go down simultaneously which are hosting msg queues (e.g. multiple VMs on a host, and the host goes down), msgd can take a long time to process the node downs which blocks the main thread, and therefore the healthcheck doesn't get processed, so msgd dies, which restarts the controller. msgd needs to sit in a loop waiting for imm to release the implementers for each of the down nodes. For many nodes which went down simultaneously this can take up to 20 seconds when done serially. Node down logic needs to be put on a thread, so that we can continue to process other messages like healthcheck. This also allows us to parallelize the node down handling. Complete diffstat: -- src/msg/msgd/mqd_asapi.c | 17 +++-- src/msg/msgd/mqd_clm.c | 183 +++ src/msg/msgd/mqd_evt.c | 5 ++ src/msg/msgd/mqd_mbcsv.c | 16 +++-- src/msg/msgd/mqd_ntf.cc | 4 ++ 5 files changed, 151 insertions(+), 74 deletions(-) Testing Commands: - 1) have multiple nodes (in our test we have 17) which hold msg queues 2) take them all down at the same time, and bring them back up reopening the msg queues 3) do this repeatedly Testing, Expected Results: -- 1) msgd should not fail healthcheck 2) msg queues should be successfully reopened on the nodes Conditions of Submission: - May 17, or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documenta
[devel] [PATCH 1/1] lck: fix errors when displaying SaLckResource class [#2070]
When getting IMM info for a lock resource, SaLckResource, the information is often not correct. Both lckd and lcknd are not updating IMM correctly when SaLckResource information changes at runtime. Write test cases which make sure these attributes are being updated correctly. And fix the issues. --- src/lck/Makefile.am | 5 ++- src/lck/apitest/test_saLckLimitGet.cc | 4 ++ src/lck/lckd/gld_evt.c| 17 +--- src/lck/lckd/gld_rsc.c| 26 ++-- src/lck/lckd/gld_standby.c| 2 +- src/lck/lcknd/glnd_client.c | 76 +-- src/lck/lcknd/glnd_client.h | 4 -- src/lck/lcknd/glnd_evt.c | 48 ++ src/lck/lcknd/glnd_res.c | 24 +++ 9 files changed, 121 insertions(+), 85 deletions(-) diff --git a/src/lck/Makefile.am b/src/lck/Makefile.am index db3e043e1..2aa64b4a5 100644 --- a/src/lck/Makefile.am +++ b/src/lck/Makefile.am @@ -200,7 +200,10 @@ bin_lcktest_SOURCES = \ src/lck/apitest/tet_glsv_util.c \ src/lck/apitest/tet_gla.c \ src/lck/apitest/tet_gla_conf.c \ - src/lck/apitest/tet_gld.c + src/lck/apitest/tet_gld.c \ + src/lck/apitest/test_ErrUnavailable.cc \ + src/lck/apitest/test_saLckLimitGet.cc \ + src/lck/apitest/test_saLckResourceClass.cc bin_lcktest_LDADD = \ lib/libSaLck.la \ diff --git a/src/lck/apitest/test_saLckLimitGet.cc b/src/lck/apitest/test_saLckLimitGet.cc index 74c9194d4..dbf804ac1 100644 --- a/src/lck/apitest/test_saLckLimitGet.cc +++ b/src/lck/apitest/test_saLckLimitGet.cc @@ -3,6 +3,7 @@ #include #include #include +#include #include "ais/include/saLck.h" #include "lck/apitest/lcktest.h" @@ -153,6 +154,9 @@ static void saLckLimitGet_08(void) rc = saLckFinalize(lckHandle); assert(rc == SA_AIS_OK); + + // wait for resources to clean up + sleep(2); } static void saLckLimitGet_09(void) diff --git a/src/lck/lckd/gld_evt.c b/src/lck/lckd/gld_evt.c index 6134093f1..c6a33282e 100644 --- a/src/lck/lckd/gld_evt.c +++ b/src/lck/lckd/gld_evt.c @@ -144,7 +144,7 @@ static uint32_t gld_rsc_open(GLSV_GLD_EVT *evt) NCSMDS_INFO snd_mds; uint32_t res = NCSCC_RC_FAILURE; ; - SaAisErrorT error; + SaAisErrorT error = SA_AIS_OK; uint32_t node_id; bool node_first_rsc_open = false; GLSV_GLD_GLND_RSC_REF *glnd_rsc = NULL; @@ -347,14 +347,14 @@ static uint32_t gld_rsc_close(GLSV_GLD_EVT *evt) glnd_rsc->rsc_info->saf_rsc_no_of_users = glnd_rsc->rsc_info->saf_rsc_no_of_users - 1; + if (evt->info.rsc_details.lcl_ref_cnt == 0) + gld_rsc_rmv_node_ref(gld_cb, glnd_rsc->rsc_info, glnd_rsc, +node_details, orphan_flag); + /*Checkkpoint resource close event */ glsv_gld_a2s_ckpt_rsc_details( gld_cb, evt->evt_type, evt->info.rsc_details, node_details->dest_id, evt->info.rsc_details.lcl_ref_cnt); - - if (evt->info.rsc_details.lcl_ref_cnt == 0) - gld_rsc_rmv_node_ref(gld_cb, glnd_rsc->rsc_info, glnd_rsc, -node_details, orphan_flag); end: TRACE_LEAVE2("Return value %u", rc); return rc; @@ -426,19 +426,24 @@ uint32_t gld_rsc_ref_set_orphan(GLSV_GLD_GLND_DETAILS *node_details, { GLSV_GLD_GLND_RSC_REF *glnd_rsc_ref; + TRACE_ENTER2("rsc_id: %i orphan: %i lck_mode: %i", rsc_id, orphan, + lck_mode); + /* Find the rsc_info based on resource id */ glnd_rsc_ref = (GLSV_GLD_GLND_RSC_REF *)ncs_patricia_tree_get( _details->rsc_info_tree, (uint8_t *)_id); if ((glnd_rsc_ref == NULL) || (glnd_rsc_ref->rsc_info == NULL)) { LOG_ER("Patricia tree get failed"); + TRACE_LEAVE(); return NCSCC_RC_FAILURE; } glnd_rsc_ref->rsc_info->can_orphan = orphan; glnd_rsc_ref->rsc_info->orphan_lck_mode = lck_mode; - if (orphan == true) + if (orphan == false) glnd_rsc_ref->rsc_info->saf_rsc_stripped_cnt++; + TRACE_LEAVE(); return NCSCC_RC_SUCCESS; } diff --git a/src/lck/lckd/gld_rsc.c b/src/lck/lckd/gld_rsc.c index ed2bd5a71..7a45cd716 100644 --- a/src/lck/lckd/gld_rsc.c +++ b/src/lck/lckd/gld_rsc.c @@ -297,12 +297,16 @@ void gld_free_rsc_info(GLSV_GLD_CB *gld_cb, GLSV_GLD_RSC_INFO *rsc_info) SaNameT lck_name; SaNameT immObj_name; + TRACE_ENTER(); + memset(_name, '\0', sizeof(SaNameT)); memset(_name, '\0', sizeof(SaNameT)); /* Some node is still referring to this resource, so backout */ - if (rsc_info->node_list != NULL) + if (rsc_info->node_list != NULL) { + TRACE_LEAVE(); return; + } /* Free the node from the resource linked list */ if
[devel] [PATCH 0/1] Review Request for lck: fix errors when displaying SaLckResource class [#2070]
Summary: lck: fix errors when displaying SaLckResource class [#2070] Review request for Ticket(s): 2070 Peer Reviewer(s): Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2070 Base revision: 1ca82324e733acd3a2fc9272253a65df7ed31baa Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - This patch fixes the bugs and adds tests to check them. revision 8fe4377c25259e1430717d3b67e2c4cc2fd3c66f Author: Alex Jones <ajo...@rbbn.com> Date: Mon, 7 May 2018 10:04:42 -0400 lck: fix errors when displaying SaLckResource class [#2070] When getting IMM info for a lock resource, SaLckResource, the information is often not correct. Both lckd and lcknd are not updating IMM correctly when SaLckResource information changes at runtime. Write test cases which make sure these attributes are being updated correctly. And fix the issues. Complete diffstat: -- src/lck/Makefile.am | 5 ++- src/lck/apitest/test_saLckLimitGet.cc | 4 ++ src/lck/lckd/gld_evt.c| 17 +--- src/lck/lckd/gld_rsc.c| 26 ++-- src/lck/lckd/gld_standby.c| 2 +- src/lck/lcknd/glnd_client.c | 76 +-- src/lck/lcknd/glnd_client.h | 4 -- src/lck/lcknd/glnd_evt.c | 48 ++ src/lck/lcknd/glnd_res.c | 24 +++ 9 files changed, 121 insertions(+), 85 deletions(-) Testing Commands: - 1) run the lcktest executable Testing, Expected Results: -- 1) all tests should pass Conditions of Submission: - May 13, or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sit
[devel] [PATCH 0/1] Review Request for plm: don't instantiate child EEs twice when unlocking parent EE [#2846]
Summary: plm: don't instantiate child EEs twice when unlocking parent EE [#2846] Review request for Ticket(s): 2846 Peer Reviewer(s): Mathi, Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2846 Base revision: 46181161a4b4afbf1f269d601914951da97265ef Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision 4efaccbcde991cd3ff848e43af6c6d007912af14 Author: Alex Jones <ajo...@rbbn.com> Date: Thu, 3 May 2018 10:53:15 -0400 plm: don't instantiate child EEs twice when unlocking parent EE [#2846] Child EEs (VMs) can fail to boot up when unlocking the parent EE. The current code resets the VM when unlocking the parent EE. This is done in plms_move_chld_ent_to_insvc(). Later in the unlock function, the child EEs are reset again. libvirt does not like these resets being done in less than 1 second, and often will not boot the VM. Don't reset the child EEs twice when unlocking the parent EE. Complete diffstat: -- src/plm/plmd/plms_adm_fsm.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) Testing Commands: - 1) Setup a parent EE with at least 17 child EEs 2) Lock the parent EE 3) Unlock the parent EE Testing, Expected Results: -- 1) child EEs should not get instantiated (reset) twice Conditions of Submission: - May 9, or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] plm: don't instantiate child EEs twice when unlocking parent EE [#2846]
Child EEs (VMs) can fail to boot up when unlocking the parent EE. The current code resets the VM when unlocking the parent EE. This is done in plms_move_chld_ent_to_insvc(). Later in the unlock function, the child EEs are reset again. libvirt does not like these resets being done in less than 1 second, and often will not boot the VM. Don't reset the child EEs twice when unlocking the parent EE. --- src/plm/plmd/plms_adm_fsm.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/plm/plmd/plms_adm_fsm.c b/src/plm/plmd/plms_adm_fsm.c index 8f5725cd8..a29dc28e0 100644 --- a/src/plm/plmd/plms_adm_fsm.c +++ b/src/plm/plmd/plms_adm_fsm.c @@ -4520,7 +4520,10 @@ static SaUint32T plms_ent_unlock(PLMS_ENTITY *ent, PLMS_TRACK_INFO *trk_info, if ((PLMS_EE_ENTITY == head->plm_entity->entity_type) && (!plms_rdness_flag_is_set(head->plm_entity, - SA_PLM_RF_DEPENDENCY))) { + SA_PLM_RF_DEPENDENCY)) && + /* child EEs have already been instantiated above */ + head->plm_entity->parent->entity_type != + PLMS_EE_ENTITY) { ret_err = plms_ee_instantiate(head->plm_entity, false, true); if (NCSCC_RC_SUCCESS != ret_err) { -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] clmd: clear admin_op and stat_change for COMPLETED plm readiness cb [#2847]
Sometimes CLM will reboot a node which was locked with PLM admin command. admin_op and stat_change are not being cleared in COMPLETED step in PLM readiness callback. Clear admin_op and stat_change. --- src/clm/clmd/clms.h | 2 +- src/clm/clmd/clms_plm.cc | 7 +++ src/clm/clmd/clms_util.cc | 12 ++-- 3 files changed, 10 insertions(+), 11 deletions(-) diff --git a/src/clm/clmd/clms.h b/src/clm/clmd/clms.h index 1d9c8daf1..f7384aad0 100644 --- a/src/clm/clmd/clms.h +++ b/src/clm/clmd/clms.h @@ -100,7 +100,7 @@ extern uint32_t clms_mds_msg_bcast(CLMS_CB *cb, CLMSV_MSG *bcast_msg); extern SaAisErrorT clms_imm_activate(CLMS_CB *cb); extern uint32_t clms_node_trackresplist_empty(CLMS_CLUSTER_NODE *op_node); extern uint32_t clms_send_cbk_start_sub(CLMS_CB *cb, CLMS_CLUSTER_NODE *node); -extern void clms_clear_node_dep_list(CLMS_CLUSTER_NODE *node); +extern void clms_clear_node_dep_list(CLMS_CLUSTER_NODE *node, bool checkpoint); extern uint32_t clms_client_del_trackresp(SaUint32T client_id); extern CLMS_CLUSTER_NODE *clms_node_get_by_name(const SaNameT *name); extern CLMS_CLUSTER_NODE *clms_node_getnext_by_name(const SaNameT *name); diff --git a/src/clm/clmd/clms_plm.cc b/src/clm/clmd/clms_plm.cc index 9c3076aa9..1ca1e1c66 100644 --- a/src/clm/clmd/clms_plm.cc +++ b/src/clm/clmd/clms_plm.cc @@ -79,7 +79,7 @@ static void clms_plm_readiness_track_callback( step completed will come and we need to clear node list as we dont no the order of entity coming from plm, better to remove dependency list on each node */ - clms_clear_node_dep_list(node); + clms_clear_node_dep_list(node, true); if (node->nodeup && trackedEntities->entities[i].expectedReadinessStatus.readinessState == @@ -278,9 +278,8 @@ static void clms_plm_readiness_track_callback( * Don't checkpoint if this is COMPLETED and nodeup is 0. Node * has already been removed from standby. */ -if (step != SA_PLM_CHANGE_COMPLETED || node->nodeup) { - clms_clear_node_dep_list(node); -} +clms_clear_node_dep_list(node, + step != SA_PLM_CHANGE_COMPLETED || node->nodeup); if (step == SA_PLM_CHANGE_COMPLETED) { if (node->stat_change == SA_TRUE) { if ((node->disable_reboot == SA_FALSE) && diff --git a/src/clm/clmd/clms_util.cc b/src/clm/clmd/clms_util.cc index dde88788e..4b2dd19e2 100644 --- a/src/clm/clmd/clms_util.cc +++ b/src/clm/clmd/clms_util.cc @@ -601,18 +601,18 @@ done: /** * Clear the node dependency list,made for multiple nodes in the plm callback */ -void clms_clear_node_dep_list(CLMS_CLUSTER_NODE *node) { +void clms_clear_node_dep_list(CLMS_CLUSTER_NODE *node, bool checkpoint) { CLMS_CLUSTER_NODE *new_node = nullptr; node->admin_op = ADMIN_OP{}; node->stat_change = SA_FALSE; - ckpt_node_rec(node); + if (checkpoint) ckpt_node_rec(node); while (node->dep_node_list != nullptr) { new_node = node->dep_node_list; new_node->stat_change = SA_FALSE; new_node->admin_op = ADMIN_OP{}; new_node->change = SA_CLM_NODE_NO_CHANGE; -ckpt_node_rec(new_node); +if (checkpoint) ckpt_node_rec(new_node); node->dep_node_list = node->dep_node_list->next; new_node->next = nullptr; } @@ -670,7 +670,7 @@ uint32_t clms_clmresp_rejected(CLMS_CB *cb, CLMS_CLUSTER_NODE *node, CLMS_CLIENT_INFO *client = nullptr; SaAisErrorT ais_er; - clms_clear_node_dep_list(node); + clms_clear_node_dep_list(node, true); client = clms_client_get_by_id(trk->client_id); if (client != nullptr) { if (client->track_flags & SA_TRACK_VALIDATE_STEP) { @@ -775,7 +775,7 @@ uint32_t clms_clmresp_error(CLMS_CB *cb, CLMS_CLUSTER_NODE *node) { #ifdef ENABLE_AIS_PLM SaAisErrorT ais_er = SA_AIS_OK; - clms_clear_node_dep_list(node); + clms_clear_node_dep_list(node, true); ais_er = saPlmReadinessTrackResponse(cb->ent_group_hdl, node->plm_invid, SA_PLM_CALLBACK_RESPONSE_ERROR); if (ais_er != SA_AIS_OK) { @@ -856,7 +856,7 @@ uint32_t clms_clmresp_ok(CLMS_CB *cb, CLMS_CLUSTER_NODE *op_node, if (ncs_patricia_tree_size(_node->trackresp) == 0) { /*Clear the node dependency list */ - clms_clear_node_dep_list(op_node); + clms_clear_node_dep_list(op_node, true); ais_er = saPlmReadinessTrackResponse( cb->ent_group_hdl, op_node->plm_invid, SA_PLM_CALLBACK_RESPONSE_OK); if (ais_er != SA_AIS_OK) { -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for clmd: clear admin_op and stat_change for COMPLETED plm readiness cb [#2847]
Summary: clmd: clear admin_op and stat_change for COMPLETED plm readiness cb [#2847] Review request for Ticket(s): 2847 Peer Reviewer(s): Mathi, Hans Pull request to: Affected branch(es): develop Development branch: ticket-2847 Base revision: 46181161a4b4afbf1f269d601914951da97265ef Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision f566e34de691ace5bc7d2832bc1f06b481075db3 Author: Alex Jones <ajo...@rbbn.com> Date: Thu, 3 May 2018 11:13:38 -0400 clmd: clear admin_op and stat_change for COMPLETED plm readiness cb [#2847] Sometimes CLM will reboot a node which was locked with PLM admin command. admin_op and stat_change are not being cleared in COMPLETED step in PLM readiness callback. Clear admin_op and stat_change. Complete diffstat: -- src/clm/clmd/clms.h | 2 +- src/clm/clmd/clms_plm.cc | 7 +++ src/clm/clmd/clms_util.cc | 12 ++-- 3 files changed, 10 insertions(+), 11 deletions(-) Testing Commands: - 1) Use PLM lock command on EE Testing, Expected Results: -- 1) EE should not get rebooted Conditions of Submission: - May 9, or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]
Hi Hans, I was finally able to get back to this. Having "Restart=on-failure" set works with REBOOT_ON_FAIL_TIMEOUT as long as RestartSec=xxx is also set in the service file to something greater than REBOOT_ON_FAIL_TIMEOUT. Maybe we could put a comment in nid.conf that says if you use systemd you need to also set RestartSec to a failure greater than REBOOT_ON_FAIL_TIMEOUT? Regarding "systemctl start opensafd; sleep 1; pkill -ABRT immnd". In my setup it does not restart after the nid phase. If I increase the time to 3, it starts to work. Here is the backtrace. Nothing looks suspicious. (gdb) thread apply all bt Thread 4 (Thread 0x7fbf852e9b00 (LWP 5123)): #0 0x7fbf839b906d in poll () from /lib64/libc.so.6 #1 0x7fbf8462a370 in poll (__timeout=2, __nfds=2, __fds=) at /usr/include/bits/poll2.h:46 #2 mdtm_process_recv_events_tcp () at src/mds/mds_dt_trans.c:986 #3 0x7fbf83c910db in start_thread () from /lib64/libpthread.so.0 #4 0x7fbf839c1e3d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7fbf85309b00 (LWP 5122)): #0 0x7fbf839b906d in poll () from /lib64/libc.so.6 #1 0x7fbf84601641 in poll (__timeout=4900, __nfds=1, __fds=0x7fbf85309260) at /usr/include/bits/poll2.h:46 #2 osaf_ppoll (io_fds=io_fds@entry=0x7fbf85309260, i_nfds=i_nfds@entry=1, i_timeout_ts=0x7fbf85309280, i_sigmask=i_sigmask@entry=0x0) at src/base/osaf_poll.c:108 #3 0x7fbf84608c2f in ncs_tmr_wait () at src/base/sysf_tmr.c:463 #4 0x7fbf83c910db in start_thread () from /lib64/libpthread.so.0 #5 0x7fbf839c1e3d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7fbf82787700 (LWP 5121)): #0 0x7fbf839b906d in poll () from /lib64/libc.so.6 #1 0x7fbf84601560 in poll (__timeout=-1, __nfds=1, __fds=0x7fbf82786e30) at /usr/include/bits/poll2.h:46 #2 osaf_poll_no_timeout (io_fds=0x7fbf82786e30, i_nfds=1) at src/base/osaf_poll.c:31 #3 0x7fbf846017e5 in osaf_poll (io_fds=io_fds@entry=0x7fbf82786e30, i_nfds=i_nfds@entry=1, i_timeout=i_timeout@entry=-1) at src/base/osaf_poll.c:44 #4 0x7fbf8460197c in auth_server_main (_fd=) at src/base/osaf_secutil.c:176 #5 0x7fbf83c910db in start_thread () from /lib64/libpthread.so.0 #6 0x7fbf839c1e3d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7fbf85341740 (LWP 5120)): #0 0x7fbf839b906d in poll () from /lib64/libc.so.6 #1 0x7fbf850cc3b8 in poll (__timeout=, __nfds=5, __fds=0x7ffdb1e02590) at /usr/include/bits/poll2.h:46 #2 main (argc=, argv=) at src/imm/immnd/immnd_main.c:358 (gdb) Alex On 04/26/2018 03:38 AM, Hans Nordeback wrote: __ NOTICE: This email was received from an EXTERNAL sender __ Hi Alex, I tested this, immnd gets restarted and systemd reports opensafd.service as active (running), so it works as expected. In your case, immnd is never restarted after the nid phase, or does it work if you increase the sleep time? One thing you can check is to send an ABRT instead of the KILL and check the core dump at e.g. which address you receive the signal. Perhaps you have found a "window" where immnd is not monitored? /Regards HansN On 04/25/2018 03:23 PM, Alex Jones wrote: Hi Hans, I understand. But, what if it doesn't fail in the nid phase? If you run this command in your setup: "systemctl start opensafd; sleep 2; pkill -KILL immnd", does immnd get restarted? And does opensafd successfully come up according to systemd? Alex On 04/25/2018 09:19 AM, Hans Nordebäck wrote: ___ NOTICE: This email was received from an EXTERNAL sender ___ Hi Alex, the reboot should only happen if REBOOT_ON_FAIL_TIMEOUT is set, (i.e. not 0). I checked the latest version, the reboot works fine if e.g. immnd fails in the nid phase and REBOOT_ON_FAIL_TIMEOUT is set. /Thanks HansN From: Alex Jones [[1]mailto:ajo...@rbbn.com] Sent: den 25 april 2018 15:05 To: Hans Nordebäck [2]<hans.nordeb...@ericsson.com>; Anders Widell [3]<anders.wid...@ericsson.com> Cc: [4]opensaf-devel@lists.sourceforge.net Subject: Re: SV: [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839] Hi Hans, There must be a hole here, then. Because in our setup, if dtmd or immnd crashes early in the startup process, the node doesn't reboot, and the executables are not restarted. If I set "Restart=on-failure" it works fine. Can you test this in your setup to see if you see the same thin
[devel] [PATCH 1/1] fmd: fix regression interacting with PLM [#2844]
fmd does not pass the EE to opensaf_reboot when attempting to reset the peer. The legacy code passed 0 to fm_mds_async_send. The new code passes NCSMDS_SCOPE_NONE, but doesn't update how bcast_scope is used. Change fm_mds_async_send to check bcast_scope. If it is not NCSMDS_SCOPE_NONE, then use it. Otherwise, use the MDS dest. --- src/fm/fmd/fm_mds.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/fm/fmd/fm_mds.cc b/src/fm/fmd/fm_mds.cc index 60db5dab1..c5b3581ee 100644 --- a/src/fm/fmd/fm_mds.cc +++ b/src/fm/fmd/fm_mds.cc @@ -763,7 +763,7 @@ uint32_t fm_mds_async_send(FM_CB *fm_cb, NCSCONTEXT msg, NCSMDS_SVC_ID svc_id, memset(&(info.info.svc_send.info.snd.i_to_dest), 0, sizeof(MDS_DEST)); -if (bcast_scope) { +if (bcast_scope != NCSMDS_SCOPE_NONE) { info.info.svc_send.info.bcast.i_bcast_scope = bcast_scope; } else { info.info.svc_send.info.snd.i_to_dest = i_to_dest; -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for fmd: fix regression interacting with PLM [#2844]
Summary: fmd: fix regression interacting with PLM [#2844] Review request for Ticket(s): 2844 Peer Reviewer(s): Gary, Mathi, Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2844 Base revision: fd827200ddd0336d8301fefed62d4afc40e5f10b Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesn OpenSAF servicesy Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision 9ab40a006c71a27c140cea5a32ab71b33facdb25 Author: Alex Jones <ajo...@rbbn.com> Date: Mon, 30 Apr 2018 10:52:41 -0400 fmd: fix regression interacting with PLM [#2844] fmd does not pass the EE to opensaf_reboot when attempting to reset the peer. The legacy code passed 0 to fm_mds_async_send. The new code passes NCSMDS_SCOPE_NONE, but doesn't update how bcast_scope is used. Change fm_mds_async_send to check bcast_scope. If it is not NCSMDS_SCOPE_NONE, then use it. Otherwise, use the MDS dest. Complete diffstat: -- src/fm/fmd/fm_mds.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Testing Commands: - 1) bring down a controller in a PLM environment Testing, Expected Results: -- 1) The remaining controller should attempt to use PLM to reset the controller which went down Conditions of Submission: - May 6, or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/4] Review Request for lck: resurrect apitests [#2437]
Summary: lck: resurrect apitests [#2437] Review request for Ticket(s): 2437 Peer Reviewer(s): Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2437 Base revision: b05d3f7ab7b88662a89c3493767969f6c890dc95 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesn OpenSAF servicesn Core libraries n Samples n Tests y Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - These checkins resurrect the apitest for LCK. revision 494407c7d28526ac0d616f9be8c2484981bbbeda Author: Alex Jones <ajo...@rbbn.com> Date: Fri, 27 Apr 2018 14:37:12 -0400 lck: resurrect apitest [#2437] Resurrect apitest revision 106200a751299a2adf20574809845098e055b874 Author: Alex Jones <ajo...@rbbn.com> Date: Fri, 27 Apr 2018 14:29:53 -0400 lck: resurrect apitest [#2437] Resurrect apitest revision 79ecd8f8dee2a66472df23eb99d9e6b8bdc72856 Author: Alex Jones <ajo...@rbbn.com> Date: Fri, 27 Apr 2018 14:29:53 -0400 lck: resurrect apitests [#2437] Resurrect apitests revision 602a7774266651c429d672a4a7d26d46ab989909 Author: Alex Jones <ajo...@rbbn.com> Date: Fri, 27 Apr 2018 14:29:53 -0400 lck: resurrect apitests [#2437] Resurrect apitests Added Files: src/lck/apitest/lcktest.c src/lck/apitest/lcktest.h src/lck/apitest/Makefile src/lck/apitest/test_ErrUnavailable.cc src/lck/apitest/test_saLckLimitGet.cc src/lck/apitest/test_saLckResourceClass.cc Complete diffstat: -- src/lck/Makefile.am| 22 +- src/lck/apitest/Makefile | 18 + src/lck/apitest/lcktest.c | 42 + src/lck/apitest/lcktest.h | 30 + src/lck/apitest/test_ErrUnavailable.cc | 1265 +++ src/lck/apitest/test_saLckLimitGet.cc | 423 +++ src/lck/apitest/test_saLckResourceClass.cc | 1892 src/lck/apitest/tet_gla.c | 735 +++ src/lck/apitest/tet_gla_conf.c | 229 +--- src/lck/apitest/tet_glsv.h | 39 +- src/lck/apitest/tet_glsv_util.c| 576 - src/lck/lckd/gld_mds.c |3 - 12 files changed, 4411 insertions(+), 863 deletions(-) Testing Commands: - 1) run lcktest Testing, Expected Results: -- 1) all tests pass Conditions of Submission: - May 3, or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individual
[devel] [PATCH 2/4] lck: resurrect apitests [#2437]
Resurrect apitests --- src/lck/apitest/test_ErrUnavailable.cc | 2 +- src/lck/apitest/test_saLckLimitGet.cc | 2 +- src/lck/apitest/test_saLckResourceClass.cc | 10 +++--- 3 files changed, 9 insertions(+), 5 deletions(-) diff --git a/src/lck/apitest/test_ErrUnavailable.cc b/src/lck/apitest/test_ErrUnavailable.cc index ff1548296..db1e0b72f 100644 --- a/src/lck/apitest/test_ErrUnavailable.cc +++ b/src/lck/apitest/test_ErrUnavailable.cc @@ -6,8 +6,8 @@ #include #include #include +#include "ais/include/saLck.h" #include "lck/apitest/lcktest.h" -#include "lck/saf/saLck.h" static SaVersionT lck3_1 = { 'B', 3, 0 }; diff --git a/src/lck/apitest/test_saLckLimitGet.cc b/src/lck/apitest/test_saLckLimitGet.cc index e187c5885..1236b47bf 100644 --- a/src/lck/apitest/test_saLckLimitGet.cc +++ b/src/lck/apitest/test_saLckLimitGet.cc @@ -3,8 +3,8 @@ #include #include #include +#include "ais/include/saLck.h" #include "lck/apitest/lcktest.h" -#include "lck/saf/saLck.h" static SaVersionT lck3_1 = { 'B', 3, 0 }; diff --git a/src/lck/apitest/test_saLckResourceClass.cc b/src/lck/apitest/test_saLckResourceClass.cc index fada9e4fb..106ca87c4 100644 --- a/src/lck/apitest/test_saLckResourceClass.cc +++ b/src/lck/apitest/test_saLckResourceClass.cc @@ -1,12 +1,13 @@ #include #include #include +#include #include #include -#include "imm/saf/saImm.h" -#include "imm/saf/saImmOm.h" +#include "ais/include/saImm.h" +#include "ais/include/saImmOm.h" +#include "ais/include/saLck.h" #include "lck/apitest/lcktest.h" -#include "lck/saf/saLck.h" static SaVersionT lck3_1 = { 'B', 3, 0 }; @@ -69,6 +70,8 @@ static void verifyOutput(SaUint32T strippedCount, SaImmAttrValuesT_2 **attributes(0); SaAisErrorT rc(saImmOmAccessorGet_2(accessorHandle, , names, )); + if (rc != SA_AIS_OK) +std::cerr << "saImmOmAccessorGet_2 returned: " << rc << std::endl; assert(rc == SA_AIS_OK); int i(0); @@ -105,6 +108,7 @@ static void saLckResourceClass_01(void) ); assert(rc == SA_AIS_OK); + sleep(1); verifyOutput(0, 1, false); rc = saLckResourceClose(lockResourceHandle); -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 4/4] lck: resurrect apitest [#2437]
Resurrect apitest --- src/lck/apitest/test_ErrUnavailable.cc | 2 +- src/lck/apitest/test_saLckLimitGet.cc | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/lck/apitest/test_ErrUnavailable.cc b/src/lck/apitest/test_ErrUnavailable.cc index db1e0b72f..715efe47c 100644 --- a/src/lck/apitest/test_ErrUnavailable.cc +++ b/src/lck/apitest/test_ErrUnavailable.cc @@ -45,7 +45,7 @@ static std::string getClmNodeName(void) static void lockUnlockNode(bool lock) { std::string command("immadm -o "); - + if (lock) command += '2'; else diff --git a/src/lck/apitest/test_saLckLimitGet.cc b/src/lck/apitest/test_saLckLimitGet.cc index 7d63d3ed2..74c9194d4 100644 --- a/src/lck/apitest/test_saLckLimitGet.cc +++ b/src/lck/apitest/test_saLckLimitGet.cc @@ -418,6 +418,6 @@ __attribute__((constructor)) static void saLckLimitGet_constructor(void) * Add test cases for: * x) HA tests * x) LockResource during failover of lckd (imm safLock never gets cleaned up) and kill application - * x) + * x) */ } -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 3/4] lck: resurrect apitest [#2437]
Resurrect apitest --- src/lck/Makefile.am | 5 + src/lck/apitest/test_saLckLimitGet.cc | 7 +-- 2 files changed, 2 insertions(+), 10 deletions(-) diff --git a/src/lck/Makefile.am b/src/lck/Makefile.am index 5b3102722..db3e043e1 100644 --- a/src/lck/Makefile.am +++ b/src/lck/Makefile.am @@ -200,10 +200,7 @@ bin_lcktest_SOURCES = \ src/lck/apitest/tet_glsv_util.c \ src/lck/apitest/tet_gla.c \ src/lck/apitest/tet_gla_conf.c \ - src/lck/apitest/tet_gld.c \ - src/lck/apitest/test_saLckLimitGet.cc \ - src/lck/apitest/test_ErrUnavailable.cc \ - src/lck/apitest/test_saLckResourceClass.cc + src/lck/apitest/tet_gld.c bin_lcktest_LDADD = \ lib/libSaLck.la \ diff --git a/src/lck/apitest/test_saLckLimitGet.cc b/src/lck/apitest/test_saLckLimitGet.cc index 1236b47bf..7d63d3ed2 100644 --- a/src/lck/apitest/test_saLckLimitGet.cc +++ b/src/lck/apitest/test_saLckLimitGet.cc @@ -140,11 +140,8 @@ static void saLckLimitGet_08(void) SA_TIME_ONE_SECOND * 5, [i]); -if (i != 1000) { - if (rc != SA_AIS_OK) -printf("rc: %i i: %i\n", rc, i); +if (i != 1000) assert(rc == SA_AIS_OK); -} } test_validate(rc, SA_AIS_ERR_NO_RESOURCES); @@ -312,8 +309,6 @@ static void saLckLimitGet_11(void) if (i != 1000) assert(lockStatus == SA_LCK_LOCK_GRANTED); -if (rc != SA_AIS_OK) - printf("rc: %i i: %i\n", rc, i); assert(rc == SA_AIS_OK); } -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for msgd: handle abrupt restart of remote node [#2840]
Summary: msgd: handle abrupt restart of remote node [#2840] Review request for Ticket(s): 2840 Peer Reviewer(s): Srinivas Pull request to: Affected branch(es): develop Development branch: ticket-2840 Base revision: dd6a9bfe9d897fe9cc3a70e21d7e066b7a727d44 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - revision e04d343ab46a7409772001c61624eb39c2eb50aa Author: Alex Jones <ajo...@rbbn.com> Date: Wed, 25 Apr 2018 10:27:13 -0400 msgd: handle abrupt restart of remote node [#2840] Sometimes when a remote node restarts abruptly, queues which were created on that node, are unable to be opened again when that node comes up. There is a race condition when the remote node goes down between msgd getting the CLM and MDS events indicating node down, and immd removing the implementer for that remote node. When msgd gets the CLM and MDS events indicating node down it temporarily becomes the implementer for any queues on that node so that it can remove the entries in IMM. If IMM has not yet removed the implementer, msgd will fail to remove the IMM entries. When the remote node comes back up, and the queues are opened, they will fail because the IMM entries are still there. When msgd recevies ERR_EXIST from implementer set in this case, it should treat it as TRY_AGAIN. Complete diffstat: -- src/msg/msgd/mqd_clm.c | 60 + src/msg/msgd/mqd_db.h | 2 +- src/msg/msgd/mqd_evt.c | 12 -- src/msg/msgd/mqd_util.c | 2 +- 4 files changed, 58 insertions(+), 18 deletions(-) Testing Commands: - 1) create 10 or so queues on node 2 2) reboot -f of node 2 (you may need to do this 10x to exhibit the problem) 3) when node comes back up it should reopen the queues Testing, Expected Results: -- 1) when node comes back up after abrupt reboot, it should successfully reopen the queues Conditions of Submission: - May 1, or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and y
[devel] [PATCH 1/1] msgd: handle abrupt restart of remote node [#2840]
Sometimes when a remote node restarts abruptly, queues which were created on that node, are unable to be opened again when that node comes up. There is a race condition when the remote node goes down between msgd getting the CLM and MDS events indicating node down, and immd removing the implementer for that remote node. When msgd gets the CLM and MDS events indicating node down it temporarily becomes the implementer for any queues on that node so that it can remove the entries in IMM. If IMM has not yet removed the implementer, msgd will fail to remove the IMM entries. When the remote node comes back up, and the queues are opened, they will fail because the IMM entries are still there. When msgd recevies ERR_EXIST from implementer set in this case, it should treat it as TRY_AGAIN. --- src/msg/msgd/mqd_clm.c | 60 + src/msg/msgd/mqd_db.h | 2 +- src/msg/msgd/mqd_evt.c | 12 -- src/msg/msgd/mqd_util.c | 2 +- 4 files changed, 58 insertions(+), 18 deletions(-) diff --git a/src/msg/msgd/mqd_clm.c b/src/msg/msgd/mqd_clm.c index 912d5a3f5..41d9bcf15 100644 --- a/src/msg/msgd/mqd_clm.c +++ b/src/msg/msgd/mqd_clm.c @@ -26,6 +26,7 @@ the cluster track */ +#include "base/osaf_time.h" #include "msg/msgd/mqd.h" #include "mqd_imm.h" extern MQDLIB_INFO gl_mqdinfo; @@ -56,11 +57,10 @@ void mqd_clm_cluster_track_callback( } else { for (counter = 0; counter < notificationBuffer->numberOfItems; counter++) { + node_id = notificationBuffer->notification[counter]. + clusterNode.nodeId; if (notificationBuffer->notification[counter] .clusterChange == SA_CLM_NODE_LEFT) { - node_id = - notificationBuffer->notification[counter] - .clusterNode.nodeId; pNdNode = (MQD_ND_DB_NODE *)ncs_patricia_tree_get( >node_db, (uint8_t *)_id); @@ -78,6 +78,8 @@ void mqd_clm_cluster_track_callback( true; } } else { + SaTimeT timeout = +m_NCS_CONVERT_SATIME_TO_TEN_MILLI_SEC(MQD_ND_EXPIRY_TIME_STANDBY); TRACE_2( "%s:%u: CLM Event is coming first for Node down", __FILE__, __LINE__); @@ -93,9 +95,22 @@ void mqd_clm_cluster_track_callback( pNdNode->info.nodeid = node_id; pNdNode->info.is_clm_down = true; mqd_red_db_node_add(pMqd, pNdNode); - if (pMqd->ha_state == SA_AMF_HA_ACTIVE) - mqd_del_node_down_info(pMqd, - node_id); + mqd_tmr_start(>info.timer, + timeout); + } + } else if (notificationBuffer->notification[counter]. + clusterChange == SA_CLM_NODE_JOINED) { + pNdNode = + (MQD_ND_DB_NODE *)ncs_patricia_tree_get( + >node_db, (uint8_t *)_id); + if (pNdNode) { + mqd_tmr_stop(>info.timer); + + if (pMqd->ha_state == + SA_AMF_HA_ACTIVE) { + mqd_red_db_node_del(pMqd, + pNdNode); + } } } } @@ -111,21 +126,38 @@ void mqd_del_node_down_info(MQD_CB *pMqd, NODE_ID nodeid) SaImmOiHandleT immOiHandle; SaAisErrorT rc = SA_AIS_OK; SaImmOiImplementerNameT implementer_name; + int retries = 5; char i_name[256] = {0}; SaVersionT imm_version = {'A', 0x02, 0x01}; TRACE_ENTER2("nodeid=%u", nodeid); rc = immutil_saImmOiInitialize_2(, NULL, _version); if (rc != SA_AIS_OK) - TRACE_4("saImmOiInitialize_2 failed with return value=%d", rc); + LOG_ER("saImmOiInitialize_2 failed with return value=%d", rc); snprintf(i_name, SA_MAX_NAME_LENGTH,
Re: [devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]
Hi Hans, I understand. But, what if it doesn't fail in the nid phase? If you run this command in your setup: "systemctl start opensafd; sleep 2; pkill -KILL immnd", does immnd get restarted? And does opensafd successfully come up according to systemd? Alex On 04/25/2018 09:19 AM, Hans Nordebäck wrote: __ NOTICE: This email was received from an EXTERNAL sender __ Hi Alex, the reboot should only happen if REBOOT_ON_FAIL_TIMEOUT is set, (i.e. not 0). I checked the latest version, the reboot works fine if e.g. immnd fails in the nid phase and REBOOT_ON_FAIL_TIMEOUT is set. /Thanks HansN From: Alex Jones [[1]mailto:ajo...@rbbn.com] Sent: den 25 april 2018 15:05 To: Hans Nordebäck [2]<hans.nordeb...@ericsson.com>; Anders Widell [3]<anders.wid...@ericsson.com> Cc: [4]opensaf-devel@lists.sourceforge.net Subject: Re: SV: [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839] Hi Hans, There must be a hole here, then. Because in our setup, if dtmd or immnd crashes early in the startup process, the node doesn't reboot, and the executables are not restarted. If I set "Restart=on-failure" it works fine. Can you test this in your setup to see if you see the same thing? Alex On 04/24/2018 05:04 AM, Hans Nordeback wrote: ___ NOTICE: This email was received from an EXTERNAL sender ___ Hi Alex, please see comment below. /Thanks HansN On 04/23/2018 03:56 PM, Alex Jones wrote: Hi Hans, I just did some tests. Maybe there is a bug in nid, but when I do not have "Restart=on-failure", the node does not reboot when I run the command "systemctl start opensafd; sleep 3; pkill -KILL immnd", and opensafd times out and fails, with REBOOT_ON_FAIL_TIMEOUT=30. [HansN] isn't the nid phase finished before the sleep 3 command? It is only during the nid phase that the REBOOT_ON_FAIL_TIMEOUT is used, After the nid phase opensaf enters "normal" operation, no reboot will be performed as immnd is restartable. Instead of the sleep 3, you can edit the nodeinit.conf.controller file and change the immnd line to e.g. "/usr/local/lib/opensaf/clc-cli/osaf-immndx:IMMND ... " then nid should fail to start and REBOOT_ON_FAIL_TIMEOUT should work. But, opensafd restarts every time when I run that command with "Restart=on-failure" set. Alex On 04/19/2018 04:02 PM, Hans Nordebäck wrote: ___ NOTICE: This email was received from an EXTERNAL sender ___ Hi Alex, a question, if opensafd fails, (assert or exit code ne 0) a reboot of the node will be performed if REBOOT_ON_FAIL_TIMEOUT is configured, I have not checked, but how do systemd handle the reboot request if Restart=on-failure is set? /BR HansN _____ Från: Alex Jones [5]<ajo...@rbbn.com> Skickat: den 19 april 2018 17:27:27 Till: Hans Nordebäck; Anders Widell Kopia: [6]opensaf-devel@lists.sourceforge.net; Alex Jones Ämne: [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839] Under certain circumstances opensafd fails to start (immnd or dtmd crashes, etc). Apr 19 15:07:31 ams-idsp-46-novnfm osafdtmd[3315]: src/dtm/dtmnd/dtm_intra_svc.cc:1778: dtm_process_internode_service_up_msg: Assertion '0' failed. We can tell systemd to restart opensafd if it fails to start. --- src/nid/opensafd.service.in | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/nid/opensafd.service.in b/src/nid/opensafd.service.in index 7f4d75ee3..6050f5e88 100644 --- a/src/nid/opensafd.service.in +++ b/src/nid/opensafd.service.in @@ -12,5 +12,7 @@ ControlGroup=cpu:/ TimeoutStartSec=3hours KillMode=none @systemdtasksmax@ +Restart=on-failure + [Install] WantedBy=multi-user.target -- 2.13.6 References 1. mailto:ajo...@rbbn.com 2. mailto:hans.nordeb...@ericsson.com 3. mailto:anders.wid...@ericsson.com 4. mailto:opensaf-devel@lists.sourceforge.net 5. mailto:ajo...@rbbn.com 6. mailto:opensaf-devel@lists.sourceforge.net signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech
Re: [devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]
Hi Hans, There must be a hole here, then. Because in our setup, if dtmd or immnd crashes early in the startup process, the node doesn't reboot, and the executables are not restarted. If I set "Restart=on-failure" it works fine. Can you test this in your setup to see if you see the same thing? Alex On 04/24/2018 05:04 AM, Hans Nordeback wrote: __ NOTICE: This email was received from an EXTERNAL sender __ Hi Alex, please see comment below. /Thanks HansN On 04/23/2018 03:56 PM, Alex Jones wrote: Hi Hans, I just did some tests. Maybe there is a bug in nid, but when I do not have "Restart=on-failure", the node does not reboot when I run the command "systemctl start opensafd; sleep 3; pkill -KILL immnd", and opensafd times out and fails, with REBOOT_ON_FAIL_TIMEOUT=30. [HansN] isn't the nid phase finished before the sleep 3 command? It is only during the nid phase that the REBOOT_ON_FAIL_TIMEOUT is used, After the nid phase opensaf enters "normal" operation, no reboot will be performed as immnd is restartable. Instead of the sleep 3, you can edit the nodeinit.conf.controller file and change the immnd line to e.g. "/usr/local/lib/opensaf/clc-cli/osaf-immndx:IMMND ... " then nid should fail to start and REBOOT_ON_FAIL_TIMEOUT should work. But, opensafd restarts every time when I run that command with "Restart=on-failure" set. Alex On 04/19/2018 04:02 PM, Hans Nordebäck wrote: ___ NOTICE: This email was received from an EXTERNAL sender ___ Hi Alex, a question, if opensafd fails, (assert or exit code ne 0) a reboot of the node will be performed if REBOOT_ON_FAIL_TIMEOUT is configured, I have not checked, but how do systemd handle the reboot request if Restart=on-failure is set? /BR HansN _______ Från: Alex Jones [1]<ajo...@rbbn.com> Skickat: den 19 april 2018 17:27:27 Till: Hans Nordebäck; Anders Widell Kopia: [2]opensaf-devel@lists.sourceforge.net; Alex Jones Ämne: [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839] Under certain circumstances opensafd fails to start (immnd or dtmd crashes, etc). Apr 19 15:07:31 ams-idsp-46-novnfm osafdtmd[3315]: src/dtm/dtmnd/dtm_intra_svc.cc:1778: dtm_process_internode_service_up_msg: Assertion '0' failed. We can tell systemd to restart opensafd if it fails to start. --- src/nid/opensafd.service.in | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/nid/opensafd.service.in b/src/nid/opensafd.service.in index 7f4d75ee3..6050f5e88 100644 --- a/src/nid/opensafd.service.in +++ b/src/nid/opensafd.service.in @@ -12,5 +12,7 @@ ControlGroup=cpu:/ TimeoutStartSec=3hours KillMode=none @systemdtasksmax@ +Restart=on-failure + [Install] WantedBy=multi-user.target -- 2.13.6 References 1. mailto:ajo...@rbbn.com 2. mailto:opensaf-devel@lists.sourceforge.net signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] amfd: if rootCauseEntity is PLM entity don't engage lock/lock-in [#2835]
My comments inline: Alex On 04/20/2018 04:00 AM, Hans Nordeback wrote: __ NOTICE: This email was received from an EXTERNAL sender __ Hi Alex, please see below for some comments/questions. /Regards HansN On 04/18/2018 03:41 PM, Alex Jones wrote: When using PLM an AMF node mapped to a CLM node mapped to a PLM EE, can get stuck in locked state when rebooting, or going through a PLM EE lock/unlock. When amfd receives a START step from CLM tracking it attempts to gracefully shutdown the AMF node using AMF admin operations lock/lock-in. When PLM is involved this doesn't always work correctly because PLM is also shutting down the node by calling "opensafd stop". There is a race condition between PLM using "opensafd stop", and amfd using the admin operations to bring down the node, so that sometimes the AMF node gets stuck in locked state. If the rootCauseEntity in the CLM tracking is a PLM entity then don't do anything, as "opensafd stop" is already being called. --- src/amf/amfd/clm.cc | 25 - 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc index 2bcea2db0..7f675d8e9 100644 --- a/src/amf/amfd/clm.cc +++ b/src/amf/amfd/clm.cc @@ -274,6 +274,27 @@ static void clm_track_cb( TRACE_3("Already got callback for start of this change."); continue; } + + if (strncmp(osaf_extended_name_borrow(rootCauseEntity), + "safEE=", + sizeof("safEE=") - 1) == 0 || + strncmp(osaf_extended_name_borrow(rootCauseEntity), + "safHE=", + sizeof("safHE=") - 1) == 0) { +// PLM will take care of calling opensafd stop +TRACE("rootCause: %s from PLM operation so skipping %u", + osaf_extended_name_borrow(rootCauseEntity), + notifItem->clusterNode.nodeId); + +SaAisErrorT rc(saClmResponse_4(avd_cb->clmHandle, + invocation, + SA_CLM_CALLBACK_RESPONSE_OK)); [HansN] perhaps use: SaAisErrorT rc = saClmResponse_4 or SaAisErrorT rc{saClmResponse_4 instead? [Alex] I'm not sure what you are asking here. Do you not like the function syntax? And what is '{'? I don't understand your second suggestion. +if (rc != SA_AIS_OK) + LOG_ER("saClmResponse_4 failed: %i", rc); + [HansN] I think the amf operational state has to be checked and set to disabled? And should break be used instead of continue? [Alex] Setting operational state to disabled is taken care of when COMPLETED is received in the track callback. My code change is only when receiving START. I used "continue" to explicitly mean that we are done processing this node, and we need to move to the next node in the for loop. The same thing is done in legacy code above when checking for "clm_change_start_preceded." +continue; + } + /* invocation to be used by pending clm response */ node->clm_pend_inv = invocation; clm_node_exit_start(node, notifItem->clusterChange); @@ -304,7 +325,9 @@ static void clm_track_cb( osaf_extended_name_borrow(rootCauseEntity), notifItem->clusterNode.nodeId); if (strncmp(osaf_extended_name_borrow(rootCauseEntity), - "safEE=", 6) == 0) { + "safEE=", 6) == 0 || + strncmp(osaf_extended_name_borrow(rootCauseEntity), + "safHE=", 6) == 0) { [HansN] sizeof("safHE=") as above [Alex] Agreed. I will make this change. And change the older code to conform. /* This callback is because of operation on PLM, so we need to mark the node absent, because PLCD will anyway call opensafd stop.*/ AVD_AVND *node = signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for nid: restart opensafd on failure when systemd enabled [#2839]
Summary: nid: restart opensafd on failure when systemd enabled [#2839] Review request for Ticket(s): 2839 Peer Reviewer(s): Hans, Anders Pull request to: Affected branch(es): develop Development branch: ticket-2839 Base revision: 72b6ed1fdd6851d8af6bb3dcd2fea25d8095ad1e Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesn OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision c67596599b7728ea45e2d449d5ba3c3103bf8452 Author: Alex Jones <ajo...@rbbn.com> Date: Thu, 19 Apr 2018 11:17:06 -0400 nid: restart opensafd on failure when systemd enabled [#2839] Under certain circumstances opensafd fails to start (immnd or dtmd crashes, etc). Apr 19 15:07:31 ams-idsp-46-novnfm osafdtmd[3315]: src/dtm/dtmnd/dtm_intra_svc.cc:1778: dtm_process_internode_service_up_msg: Assertion '0' failed. We can tell systemd to restart opensafd if it fails to start. Complete diffstat: -- src/nid/opensafd.service.in | 2 ++ 1 file changed, 2 insertions(+) Testing Commands: - systemctl start opensafd Testing, Expected Results: -- opensafd should restart if it fails to come up Conditions of Submission: - Apr 25, or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]
Under certain circumstances opensafd fails to start (immnd or dtmd crashes, etc). Apr 19 15:07:31 ams-idsp-46-novnfm osafdtmd[3315]: src/dtm/dtmnd/dtm_intra_svc.cc:1778: dtm_process_internode_service_up_msg: Assertion '0' failed. We can tell systemd to restart opensafd if it fails to start. --- src/nid/opensafd.service.in | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/nid/opensafd.service.in b/src/nid/opensafd.service.in index 7f4d75ee3..6050f5e88 100644 --- a/src/nid/opensafd.service.in +++ b/src/nid/opensafd.service.in @@ -12,5 +12,7 @@ ControlGroup=cpu:/ TimeoutStartSec=3hours KillMode=none @systemdtasksmax@ +Restart=on-failure + [Install] WantedBy=multi-user.target -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for amfd: if rootCauseEntity is PLM entity don't engage lock/lock-in [#2835]
Summary: amfd: if rootCauseEntity is PLM entity don't engage lock/lock-in [#2835] Review request for Ticket(s): 2835 Peer Reviewer(s): Hans, Gary, Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2835 Base revision: d6d899c39d15a91614ce2a350010c8634134ba0c Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - This patch should be reviewed/tested with the patch from ticket 2834. revision 9e09af922cf88a56ee4984abe46b01f363117e30 Author: Alex Jones <ajo...@rbbn.com> Date: Wed, 18 Apr 2018 09:08:41 -0400 amfd: if rootCauseEntity is PLM entity don't engage lock/lock-in [#2835] When using PLM an AMF node mapped to a CLM node mapped to a PLM EE, can get stuck in locked state when rebooting, or going through a PLM EE lock/unlock. When amfd receives a START step from CLM tracking it attempts to gracefully shutdown the AMF node using AMF admin operations lock/lock-in. When PLM is involved this doesn't always work correctly because PLM is also shutting down the node by calling "opensafd stop". There is a race condition between PLM using "opensafd stop", and amfd using the admin operations to bring down the node, so that sometimes the AMF node gets stuck in locked state. If the rootCauseEntity in the CLM tracking is a PLM entity then don't do anything, as "opensafd stop" is already being called. Complete diffstat: -- src/amf/amfd/clm.cc | 25 - 1 file changed, 24 insertions(+), 1 deletion(-) Testing Commands: - 1) lock a PLM EE Testing, Expected Results: -- 2) amfd should not engage lock/lock-in for the AMF node, when START step is received from CLM tracking Conditions of Submission: - Apr 24, or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not con
[devel] [PATCH 1/1] amfd: if rootCauseEntity is PLM entity don't engage lock/lock-in [#2835]
When using PLM an AMF node mapped to a CLM node mapped to a PLM EE, can get stuck in locked state when rebooting, or going through a PLM EE lock/unlock. When amfd receives a START step from CLM tracking it attempts to gracefully shutdown the AMF node using AMF admin operations lock/lock-in. When PLM is involved this doesn't always work correctly because PLM is also shutting down the node by calling "opensafd stop". There is a race condition between PLM using "opensafd stop", and amfd using the admin operations to bring down the node, so that sometimes the AMF node gets stuck in locked state. If the rootCauseEntity in the CLM tracking is a PLM entity then don't do anything, as "opensafd stop" is already being called. --- src/amf/amfd/clm.cc | 25 - 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc index 2bcea2db0..7f675d8e9 100644 --- a/src/amf/amfd/clm.cc +++ b/src/amf/amfd/clm.cc @@ -274,6 +274,27 @@ static void clm_track_cb( TRACE_3("Already got callback for start of this change."); continue; } + + if (strncmp(osaf_extended_name_borrow(rootCauseEntity), + "safEE=", + sizeof("safEE=") - 1) == 0 || + strncmp(osaf_extended_name_borrow(rootCauseEntity), + "safHE=", + sizeof("safHE=") - 1) == 0) { +// PLM will take care of calling opensafd stop +TRACE("rootCause: %s from PLM operation so skipping %u", + osaf_extended_name_borrow(rootCauseEntity), + notifItem->clusterNode.nodeId); + +SaAisErrorT rc(saClmResponse_4(avd_cb->clmHandle, + invocation, + SA_CLM_CALLBACK_RESPONSE_OK)); +if (rc != SA_AIS_OK) + LOG_ER("saClmResponse_4 failed: %i", rc); + +continue; + } + /* invocation to be used by pending clm response */ node->clm_pend_inv = invocation; clm_node_exit_start(node, notifItem->clusterChange); @@ -304,7 +325,9 @@ static void clm_track_cb( osaf_extended_name_borrow(rootCauseEntity), notifItem->clusterNode.nodeId); if (strncmp(osaf_extended_name_borrow(rootCauseEntity), - "safEE=", 6) == 0) { + "safEE=", 6) == 0 || + strncmp(osaf_extended_name_borrow(rootCauseEntity), + "safHE=", 6) == 0) { /* This callback is because of operation on PLM, so we need to mark the node absent, because PLCD will anyway call opensafd stop.*/ AVD_AVND *node = -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] plmd: use virDomainDestroy and virDomainCreate to reset VM [#2836]
Abrupt restart or unlock-in of child EE does not always work. virDomainReset() does not always work. Use virDomainDestroy() and virDomainCreate() instead. --- src/plm/plmd/plms_virt.cc | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/src/plm/plmd/plms_virt.cc b/src/plm/plmd/plms_virt.cc index 2fd735ac0..0bf11e5a8 100644 --- a/src/plm/plmd/plms_virt.cc +++ b/src/plm/plmd/plms_virt.cc @@ -922,8 +922,20 @@ int PlmsVm::instantiate(virDomainPtr domain) { } int PlmsVm::restart(virDomainPtr domain) { - TRACE("calling virDomainReset to restart vm"); - return virDomainReset(domain, 0); + TRACE("calling virDomainDestroy and virDomainCreate to restart vm"); + int rc(-1); + + do { +rc = virDomainDestroy(domain); + +if (rc < 0) break; + +rc = virDomainCreate(domain); + +if (rc < 0) break; + } while (false); + + return rc; } int PlmsVm::isolate(virDomainPtr domain) { -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for plmd: use virDomainDestroy and virDomainCreate to reset VM [#2836]
Summary: plmd: use virDomainDestroy and virDomainCreate to reset VM [#2836] Review request for Ticket(s): 2836 Peer Reviewer(s): Mathi, Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2836 Base revision: b13a65123bfddcc6f5105fe340131e3bd8a5ac70 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision 56a0e35daf04083c5fb76270dbf0163b03500d58 Author: Alex Jones <ajo...@rbbn.com> Date: Thu, 12 Apr 2018 13:05:19 -0400 plmd: use virDomainDestroy and virDomainCreate to reset VM [#2836] Abrupt restart or unlock-in of child EE does not always work. virDomainReset() does not always work. Use virDomainDestroy() and virDomainCreate() instead. Complete diffstat: -- src/plm/plmd/plms_virt.cc | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) Testing Commands: - 1) Do lock/lock-in/unlock-in/unlock of child EE 50 times. Testing, Expected Results: -- 1) The commands should never fail. Conditions of Submission: - Apr 18 or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for clmd: pass rootCauseEntity from PLM tracking to CLM tracking clients [#2834]
Summary: clmd: pass rootCauseEntity from PLM tracking to CLM tracking clients [#2834] Review request for Ticket(s): 2834 Peer Reviewer(s): Anders, Mathi Pull request to: Affected branch(es): develop Development branch: ticket-2834 Base revision: aff54ff091727f27830443332b830890668749cf Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision 62589757a43679a24bc4c1f863a68346a23b5a37 Author: Alex Jones <ajo...@rbbn.com> Date: Thu, 12 Apr 2018 10:53:19 -0400 clmd: pass rootCauseEntity from PLM tracking to CLM tracking clients [#2834] CLM tracking clients have no context for the tracking callback. PLM rootCauseEntity is not passed by CLM to its own tracking clients. When CLM tracking is invoked because of PLM tracking, pass on the rootCauseEntity. Complete diffstat: -- src/clm/clmd/clms_evt.cc | 4 +-- src/clm/clmd/clms_imm.cc | 80 ++- src/clm/clmd/clms_imm.h | 9 -- src/clm/clmd/clms_plm.cc | 3 +- src/clm/clmd/clms_util.cc | 13 5 files changed, 69 insertions(+), 40 deletions(-) Testing Commands: - 1) Create a CLM tracking client. 2) Using PLM, lock a parent (host) EE, that also has child EEs. Testing, Expected Results: -- 1) rootCauseEntity of host should be passed in the tracking callback 2) all EEs (child and parent) should be present in the notification Conditions of Submission: - Apr 18 or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.ne
Re: [devel] [PATCH 1/1] msg: updated the assert condition , to avoid core [#2802]
Ack. Alex On 04/03/2018 06:46 AM, srinivas wrote: __ NOTICE: This email was received from an EXTERNAL sender __ --- src/msg/apitest/test_MetaDataSize.cc | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/src/msg/apitest/test_MetaDataSize.cc b/src/msg/apitest/test_MetaDataSize.cc index f99b02b..16efe69 100644 --- a/src/msg/apitest/test_MetaDataSize.cc +++ b/src/msg/apitest/test_MetaDataSize.cc @@ -6,6 +6,7 @@ #include #include #include +#include "msg/agent/mqa.h" #include "msg/apitest/msgtest.h" #include "msg/apitest/tet_mqsv.h" #include @@ -65,12 +66,14 @@ static void metaDataSize_05(void) { SaUint32T metaDataSize; rc = saMsgMetadataSizeGet(msgHandle, ); + if(rc == SA_AIS_OK){ + if (metaDataSize != sizeof(MQSV_MESSAGE) + + sizeof(NCS_OS_MQ_MSG_LL_HDR)) + rc = SA_AIS_ERR_MESSAGE_ERROR; + } + if (rc == SA_AIS_OK) + rc = saMsgFinalize(msgHandle); aisrc_validate(rc, SA_AIS_OK); - - assert(metaDataSize == 344); - - rc = saMsgFinalize(msgHandle); - assert(rc == SA_AIS_OK); } __attribute__((constructor)) static void metaDataSize_constructor(void) { -- 2.7.4 signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for plmd: handle admin-operation-pending for EE unlock [#2819]
Summary: plmd: handle admin-operation-pending for EE unlock [#2819] Review request for Ticket(s): 2819 Peer Reviewer(s): Mathi, Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2819 Base revision: 9c846d28a5dac616b2619d1fe274105d463d0d20 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision ae59ca0e4d33b97d3fbc28d531452e391afe488a Author: Alex Jones <ajo...@rbbn.com> Date: Mon, 26 Mar 2018 11:10:51 -0400 plmd: handle admin-operation-pending for EE unlock [#2819] If EE unlock fails, it is never retried when management is regained. The EE just sits in LOCKED admin state. If EE unlock fails, the code continues as if it did succeed, setting readiness state to in-service, etc. If EE unlock fails, just return ERR_DEPLOYMENT immediately, and don't set anything else. Complete diffstat: -- src/plm/plmd/plms_adm_fsm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) Testing Commands: - 1) lock, lock-in, unlock-in, unlock of an EE in a loop waiting for the SUs to come online before starting again Testing, Expected Results: -- 1) if unlock returns ERR_DEPLOYMENT, the EE should unlock when plmd receives the connection from plmcd Conditions of Submission: - Apr 1 or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] plmd: handle admin-operation-pending for EE unlock [#2819]
If EE unlock fails, it is never retried when management is regained. The EE just sits in LOCKED admin state. If EE unlock fails, the code continues as if it did succeed, setting readiness state to in-service, etc. If EE unlock fails, just return ERR_DEPLOYMENT immediately, and don't set anything else. --- src/plm/plmd/plms_adm_fsm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/plm/plmd/plms_adm_fsm.c b/src/plm/plmd/plms_adm_fsm.c index 370c30f36..fdcd6ea05 100644 --- a/src/plm/plmd/plms_adm_fsm.c +++ b/src/plm/plmd/plms_adm_fsm.c @@ -4437,10 +4437,9 @@ static SaUint32T plms_ent_unlock(PLMS_ENTITY *ent, PLMS_TRACK_INFO *trk_info, /* Unlock the EE.*/ unlck_err = plms_ee_unlock(ent, true, 1 /*mngt_cbk*/); if (NCSCC_RC_SUCCESS != unlck_err) { - /* TODO: Should I return from here, sending failure to - IMM and calling management lost callback.*/ LOG_ER("EE unlock operation failed. Ent: %s", ent->dn_name_str); + goto send_rsp; } } @@ -4548,6 +4547,8 @@ static SaUint32T plms_ent_unlock(PLMS_ENTITY *ent, PLMS_TRACK_INFO *trk_info, plms_ent_exp_rdness_status_clear(ent); plms_aff_ent_exp_rdness_status_clear(trk_info->aff_ent_list); + +send_rsp: /* Respnd to IMM.*/ if (NCSCC_RC_SUCCESS == unlck_err) { ret_err = saImmOiAdminOperationResult(cb->oi_hdl, adm_op.inv_id, -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] msg: updated the assert condition , to avoid core [#2802]
Hi Srinivas, Two comments: 1. Put the new include file before the above "msg/..." files, so it is in alphabetical order 2. change the test, so there is only one aisrc_validate call in it. Otherwise, 2 PASSED show up for the test. Alex On 03/26/2018 07:23 AM, srinivas wrote: __ NOTICE: This email was received from an EXTERNAL sender __ --- src/msg/apitest/test_MetaDataSize.cc | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/msg/apitest/test_MetaDataSize.cc b/src/msg/apitest/test_MetaDataSize.cc index f99b02b..6d7375a 100644 --- a/src/msg/apitest/test_MetaDataSize.cc +++ b/src/msg/apitest/test_MetaDataSize.cc @@ -10,6 +10,7 @@ #include "msg/apitest/tet_mqsv.h" #include #include +#include "msg/agent/mqa.h" static SaVersionT msg3_1 = {'B', 3, 0}; @@ -67,7 +68,10 @@ static void metaDataSize_05(void) { rc = saMsgMetadataSizeGet(msgHandle, ); aisrc_validate(rc, SA_AIS_OK); - assert(metaDataSize == 344); + if (metaDataSize != sizeof(MQSV_MESSAGE) + + sizeof(NCS_OS_MQ_MSG_LL_HDR)) + rc = SA_AIS_ERR_MESSAGE_ERROR; + aisrc_validate(rc, SA_AIS_OK); rc = saMsgFinalize(msgHandle); assert(rc == SA_AIS_OK); -- 2.7.4 signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for plmd: connect to hypervisor after middleware switchover [#2817]
Summary: plmd: connect to hypervisor after middleware switchover [#2817] Review request for Ticket(s): 2817 Peer Reviewer(s): Mathi, Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2817 Base revision: 97a895449e41b65da5d32c15aedc7a004cbd74b5 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - This patch replaces the previous one, as the previous patch did not handle admin-operation-pending for child EEs while the parent EE was not available. revision 28094fa2491d458478491d6343f0be4fb5ecdbd7 Author: Alex Jones <ajo...@rbbn.com> Date: Thu, 22 Mar 2018 20:46:14 -0400 plmd: connect to hypervisor after middleware switchover [#2817] Any PLM admin operation which requires hypervisor assistance (e.g. unlock-in, abrupt restart) will fail after middleware switchover. When plmcds are reconnecting to the new active plmd, the plmd does not attempt to connect to the hypervisor if the EE is a virtual machine monitor. Connect to the hypervisor when the virtual machine monitor EE reconnects, and perform any admin-pending-operations that occurred while the hypervisor was out of contact. Complete diffstat: -- src/plm/common/plms.h | 4 + src/plm/plmd/plms_adm_fsm.c | 213 src/plm/plmd/plms_plmc.c| 36 3 files changed, 155 insertions(+), 98 deletions(-) Testing Commands: - 1) do a middleware switchover 2) do a "unlock-in" or "abrupt restart" on a VM EE Testing, Expected Results: -- 1) operation should succeed Conditions of Submission: - Mar 28, or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech
[devel] [PATCH 1/1] plmd: connect to hypervisor after middleware switchover [#2817]
Any PLM admin operation which requires hypervisor assistance (e.g. unlock-in, abrupt restart) will fail after middleware switchover. When plmcds are reconnecting to the new active plmd, the plmd does not attempt to connect to the hypervisor if the EE is a virtual machine monitor. Connect to the hypervisor when the virtual machine monitor EE reconnects, and perform any admin-pending-operations that occurred while the hypervisor was out of contact. --- src/plm/common/plms.h | 4 + src/plm/plmd/plms_adm_fsm.c | 213 src/plm/plmd/plms_plmc.c| 36 3 files changed, 155 insertions(+), 98 deletions(-) diff --git a/src/plm/common/plms.h b/src/plm/common/plms.h index 57c7e374d..5041663c5 100644 --- a/src/plm/common/plms.h +++ b/src/plm/common/plms.h @@ -409,6 +409,7 @@ typedef enum { PLMS_MNGT_EE_UNLOCK, PLMS_MNGT_EE_TERM, PLMS_MNGT_EE_RESTART, + PLMS_MNGT_EE_RESTART_ABRUPT, PLMS_MNGT_EE_GET_OS_INFO, PLMS_MNGT_EE_INST, PLMS_MNGT_EE_ISOLATE, @@ -547,6 +548,9 @@ SaUint32T plms_imm_adm_op_req_process(PLMS_EVT *); SaUint32T plms_cbk_response_process(PLMS_EVT *); void plms_deact_completed_cbk_call(PLMS_ENTITY *, PLMS_TRACK_INFO *); void plms_deact_start_cbk_call(PLMS_ENTITY *, PLMS_TRACK_INFO *); +void plms_post_abrupt_restart(PLMS_ENTITY *, + PLMS_EVT *, + PLMS_GROUP_ENTITY *aff_ent_list); /* Function declaration from plms_utils.c*/ SaUint32T plms_readiness_impact_process(PLMS_EVT *); diff --git a/src/plm/plmd/plms_adm_fsm.c b/src/plm/plmd/plms_adm_fsm.c index 370c30f36..5ab8db65d 100644 --- a/src/plm/plmd/plms_adm_fsm.c +++ b/src/plm/plmd/plms_adm_fsm.c @@ -5510,6 +5510,116 @@ void plms_deact_completed_cbk_call(PLMS_ENTITY *ent, PLMS_TRACK_INFO *trk_info) return; } +void plms_post_abrupt_restart(PLMS_ENTITY *ent, + PLMS_EVT *evt, + PLMS_GROUP_ENTITY *aff_ent_list) { + SaUint32T count = 0; + PLMS_GROUP_ENTITY *head = 0; + PLMS_ENTITY_GROUP_INFO_LIST *log_head_grp = 0; + PLMS_TRACK_INFO *trk_info = 0; + + TRACE_ENTER(); + + /* Admin operation started. */ + ent->adm_op_in_progress = SA_PLM_CAUSE_EE_RESTART; + ent->am_i_aff_ent = true; + plms_aff_ent_flag_mark_unmark(aff_ent_list, true); + + /* Take care of target EE. */ + plms_presence_state_set(ent, SA_PLM_EE_PRESENCE_INSTANTIATING, NULL, + SA_NTF_MANAGEMENT_OPERATION, + SA_PLM_NTFID_STATE_CHANGE_ROOT); + + plms_readiness_state_set(ent, SA_PLM_READINESS_OUT_OF_SERVICE, NULL, +SA_NTF_MANAGEMENT_OPERATION, +SA_PLM_NTFID_STATE_CHANGE_ROOT); + count++; + + /* Get the trk_info ready.*/ + trk_info = (PLMS_TRACK_INFO *)calloc(1, sizeof(PLMS_TRACK_INFO)); + trk_info->root_entity = ent; + ent->trk_info = trk_info; + + /* Reset all the dependent EEs.*/ + head = aff_ent_list; + while (head) { + SaUint32T ret_err = + plms_ee_reboot(head->plm_entity, false, true); + + if (NCSCC_RC_SUCCESS == ret_err) { + plms_presence_state_set( + head->plm_entity, SA_PLM_EE_PRESENCE_UNINSTANTIATED, + ent, SA_NTF_MANAGEMENT_OPERATION, + SA_PLM_NTFID_STATE_CHANGE_DEP); + head->plm_entity->trk_info = trk_info; + count++; + } else { + LOG_ER("EE reset failed. Ent: %s", + head->plm_entity->dn_name_str); + } + plms_readiness_state_set( + head->plm_entity, SA_PLM_READINESS_OUT_OF_SERVICE, ent, + SA_NTF_MANAGEMENT_OPERATION, SA_PLM_NTFID_STATE_CHANGE_DEP); + plms_readiness_flag_mark_unmark( + head->plm_entity, SA_PLM_RF_DEPENDENCY, 1 /* mark */, ent, + SA_NTF_MANAGEMENT_OPERATION, SA_PLM_NTFID_STATE_CHANGE_DEP); + head = head->next; + } + + plms_aff_ent_exp_rdness_state_ow(aff_ent_list); + plms_ent_exp_rdness_state_ow(ent); + + trk_info->aff_ent_list = aff_ent_list; + + /* Add the groups, root entity(ent) belong to.*/ + plms_ent_grp_list_add(ent, &(trk_info->group_info_list)); + + /* Find out all the groups, all affected entities belong to and add + the groups to trk_info->group_info_list.*/ + plms_ent_list_grp_list_add(aff_ent_list, &(trk_info->group_info_list)); + + TRACE("Affected groups for ent %s: ", ent->dn_name_str); + log_head_grp = trk_info->group_info_list; + while (log_head_grp) { + TRACE("%llu,", log_head_grp->ent_grp_inf->entity_grp_hdl); + log_head_grp = log_head_grp->next; +
[devel] [PATCH 1/1] plmd: connect to hypervisor after middleware switchover [#2817]
After a middleware switchover, EE admin commands that need hypervisor support do not work (e.g. unlock-in, abrupt restart). After the switchover, the plmcds on the different nodes reconnect to the new plmd. But, the new plmd does not make any contact with the hypervisors. So, the commands fail. When a parent EE reconnects to the new plmd after a middleware switchover, connect to the hypervisor. --- src/plm/plmd/plms_plmc.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/plm/plmd/plms_plmc.c b/src/plm/plmd/plms_plmc.c index 1f0cef609..2fa1f5a45 100644 --- a/src/plm/plmd/plms_plmc.c +++ b/src/plm/plmd/plms_plmc.c @@ -402,6 +402,11 @@ SaUint32T plms_plmc_tcp_connect_process(PLMS_ENTITY *ent) if (plms_is_rdness_state_set(ent, SA_PLM_READINESS_IN_SERVICE)) { TRACE("Ent %s is already in insvc.", ent->dn_name_str); + + /* if this is a parent EE, connect to the hypervisor */ + if (ent->leftmost_child) + plms_ee_hypervisor_instantiated(ent); + return NCSCC_RC_SUCCESS; } /*If previous state is not instantiating/intantiated, then get os info -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for plmd: connect to hypervisor after middleware switchover [#2817]
Summary: plmd: connect to hypervisor after middleware switchover [#2817] Review request for Ticket(s): 2817 Peer Reviewer(s): Mathi, Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2817 Base revision: dc467e7e143d113bc11445c909bd8520aed6dfd7 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision 6042af1f311dc6b6ec270bd0aaa8e570e6477842 Author: Alex Jones <ajo...@rbbn.com> Date: Wed, 21 Mar 2018 11:49:55 -0400 plmd: connect to hypervisor after middleware switchover [#2817] After a middleware switchover, EE admin commands that need hypervisor support do not work (e.g. unlock-in, abrupt restart). After the switchover, the plmcds on the different nodes reconnect to the new plmd. But, the new plmd does not make any contact with the hypervisors. So, the commands fail. When a parent EE reconnects to the new plmd after a middleware switchover, connect to the hypervisor. Complete diffstat: -- src/plm/plmd/plms_plmc.c | 5 + 1 file changed, 5 insertions(+) Testing Commands: - 1) do a si-swap of the middleware 2) do abrupt restart of a VM EE Testing, Expected Results: -- 1) VM EE should abruptly restart Conditions of Submission: - Mar 27, or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for msgnd: prevent race condition during q transfer [#2816]
Summary: msgnd: prevent race condition during q transfer [#2816] Review request for Ticket(s): 2816 Peer Reviewer(s): Srinivas Pull request to: Affected branch(es): develop Development branch: ticket-2816 Base revision: dc467e7e143d113bc11445c909bd8520aed6dfd7 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision 36e5a1d4fb123862cc442301140f70e8ce10a7c4 Author: Alex Jones <ajo...@rbbn.com> Date: Tue, 20 Mar 2018 11:46:42 -0400 msgnd: prevent race condition during q transfer [#2816] During q transfer when new node is opening the q, msgnd fails to create the runtime IMM object for the queue, and the open fails. When the transfer is done, the old side and owner of the runtime object doesn't delete the IMM object until after the q transfer response is sent. This is a race condition. If the new side tries to create the runtime object before the old side has deleted it, the opening of the queue on the new side fails. Delete the runtime object before sending the q transfer response. Complete diffstat: -- src/msg/msgnd/mqnd_proc.c | 38 +- 1 file changed, 29 insertions(+), 9 deletions(-) Testing Commands: - See ticket for how to reproduce Testing, Expected Results: -- After at least 100 iterations of q transfer from one node to another, q is successfully opened all the time. Conditions of Submission: - Mar 26, or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] msgnd: prevent race condition during q transfer [#2816]
During q transfer when new node is opening the q, msgnd fails to create the runtime IMM object for the queue, and the open fails. When the transfer is done, the old side and owner of the runtime object doesn't delete the IMM object until after the q transfer response is sent. This is a race condition. If the new side tries to create the runtime object before the old side has deleted it, the opening of the queue on the new side fails. Delete the runtime object before sending the q transfer response. --- src/msg/msgnd/mqnd_proc.c | 38 +- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/src/msg/msgnd/mqnd_proc.c b/src/msg/msgnd/mqnd_proc.c index ca7dfc8ff..3205d714b 100644 --- a/src/msg/msgnd/mqnd_proc.c +++ b/src/msg/msgnd/mqnd_proc.c @@ -468,6 +468,21 @@ reg_req: asapi_msg_free(); send_rsp: + /* +* Delete the runtime object before responding, otherwise the other side +* might create it before we have removed it +*/ + rc = immutil_saImmOiRtObjectDelete(cb->immOiHandle, + >qinfo.queueName); + + if (rc != SA_AIS_OK) { + LOG_ER("immutil_saImmOiRtObjectDelete: Deletion of MsgQueue " + "object %s failed: %i", + qnode->qinfo.queueName.value, + rc); + return NCSCC_RC_FAILURE; + } + /* Send the response */ transfer_rsp.type = MQSV_EVT_MQP_RSP; transfer_rsp.msg.mqp_rsp.type = MQP_EVT_TRANSFER_QUEUE_RSP; @@ -485,18 +500,23 @@ send_rsp: transfer_rsp.msg.mqp_rsp.error = err; rc = mqnd_mds_send_rsp(cb, >sinfo, _rsp); - if (rc != NCSCC_RC_SUCCESS) + if (rc != NCSCC_RC_SUCCESS) { TRACE_2( "Queue Attribute get :Mds Send Response Failed %" PRIx64, cb->my_dest); - else - /* delete Message Queue Objetc at IMMSV */ - if (immutil_saImmOiRtObjectDelete( - cb->immOiHandle, >qinfo.queueName) != SA_AIS_OK) { - LOG_ER( - "immutil_saImmOiRtObjectDelete: Deletion of MsgQueue object %s", - qnode->qinfo.queueName.value); - return NCSCC_RC_FAILURE; + + /* readd the runtime object which was deleted above */ + err = mqnd_create_runtime_MsgQobject( + (char *)qnode->qinfo.queueName.value, + qnode->qinfo.creationTime, + qnode, + cb->immOiHandle); + + if (err != SA_AIS_OK) { + LOG_ER("failed to recreate IMM q object for %s: %i", + qnode->qinfo.queueName.value, + err); + } } if (mqsv_message_cpy) -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] plmd: enable dynamic tracing [#2796]
Dynamic tracing does not work with plmd. plmd overrides the USR2 signal with its own dump routine. Remove the signal hander code for USR2 in plmd. --- src/plm/plmd/plms_main.c | 20 1 file changed, 20 deletions(-) diff --git a/src/plm/plmd/plms_main.c b/src/plm/plmd/plms_main.c index 23b019444..5de1f461e 100644 --- a/src/plm/plmd/plms_main.c +++ b/src/plm/plmd/plms_main.c @@ -70,20 +70,6 @@ static void sigusr1_handler(int sig) ncs_sel_obj_ind(_cb->usr1_sel_obj); } -static void usr2_sig_handler(int sig) -{ - PLMS_CB *cb = plms_cb; - PLMS_EVT *evt; - evt = (PLMS_EVT *)malloc(sizeof(PLMS_EVT)); - memset(evt, 0, sizeof(PLMS_EVT)); - evt->req_res = PLMS_REQ; - evt->req_evt.req_type = PLMS_DUMP_CB_EVT_T; - (void)sig; - /* Put it in PLMS's Event Queue */ - m_NCS_IPC_SEND(>mbx, (NCSCONTEXT)evt, NCS_IPC_PRIORITY_HIGH); - signal(SIGUSR2, usr2_sig_handler); -} - / * Name : plms_db_init * @@ -327,12 +313,6 @@ static uint32_t plms_init() rc = NCSCC_RC_FAILURE; goto done; } - /* Initialize a signal handler for debugging purpose */ - if ((signal(SIGUSR2, usr2_sig_handler)) == SIG_ERR) { - LOG_ER("signal USR2 failed: %s", strerror(errno)); - rc = NCSCC_RC_FAILURE; - goto done; - } if (!cb->nid_started && plms_amf_register() != NCSCC_RC_SUCCESS) { LOG_ER("AMF Initialization failed"); -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for plmd: enable dynamic tracing [#2796]
Summary: plmd: enable dynamic tracing [#2796] Review request for Ticket(s): 2796 Peer Reviewer(s): Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2796 Base revision: 3587648509bf14d692852d0ce4882377bc0831b5 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision c75a7990a32d4d0d05bad0ba69e920dd42d780e8 Author: Alex Jones <ajo...@rbbn.com> Date: Wed, 7 Mar 2018 15:31:24 -0500 plmd: enable dynamic tracing [#2796] Dynamic tracing does not work with plmd. plmd overrides the USR2 signal with its own dump routine. Remove the signal hander code for USR2 in plmd. Complete diffstat: -- src/plm/plmd/plms_main.c | 20 1 file changed, 20 deletions(-) Testing Commands: - 1) send USR2 signal to plmd to enable tracing 1) send another USR2 signal to plmd to disable tracing Testing, Expected Results: -- 1) osafplmd file is generated and tracing can be enabled and disabled Conditions of Submission: - Mar 13 or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for msgd: during cold sync don't add tracking entries which already exist [#2793]
Summary: msgd: during cold sync don't add tracking entries which already exist [#2793] Review request for Ticket(s): 2793 Peer Reviewer(s): Srinivas Pull request to: Affected branch(es): develop Development branch: ticket-2793 Base revision: 5d0175a756c4d7fe47dc8b815725332ca7ca4291 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision 916b838764c03891c5e35b18626d89aadbb5caf6 Author: Alex Jones <ajo...@rbbn.com> Date: Tue, 6 Mar 2018 18:48:49 -0500 msgd: during cold sync don't add tracking entries which already exist [#2793] Opening of an existing msg q using saMsgQueueOpen (for q failover) may take a long time. When cold sync is done, sometimes two MDS cold sync requests are sent by the standby, so the standby can receive 2 cold syncs. The standby code to process the cold sync response blindly adds the tracking entries for message queue groups. If two cold syncs are done, the tracking list can have duplicate entries. When controllers are rebooted back and forth, this list can get large (1000s of entries), and if another cluster node is rebooted and a q needs to move from there, 1000s of duplicate tracking messages are sent by msgd, which slows down the failover, and saMsgQueueOpen can take a long time. Fix is to not blindly add tracking entries during cold sync, but only add them if they are not already there. Complete diffstat: -- src/msg/msgd/mqd_mbcsv.c | 17 +++-- 1 file changed, 3 insertions(+), 14 deletions(-) Testing Commands: - 1) create a msg q group 2) create 4 msg qs on different nodes and add them to the group 3) send some messages to the group (to enable tracking) 4) open another message q on a different node 5) reboot the controllers back and forth about 20 or 30 times 6) reboot the node with the message q from (4) 7) open the msg q on another node Testing, Expected Results: -- 1) step 7 should not take seconds 2) there should not be 1000s of entries in syslog saying "unable to send "tracking message" Conditions of Submission: - Mar 12 or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time;
[devel] [PATCH 1/1] msgd: during cold sync don't add tracking entries which already exist [#2793]
Opening of an existing msg q using saMsgQueueOpen (for q failover) may take a long time. When cold sync is done, sometimes two MDS cold sync requests are sent by the standby, so the standby can receive 2 cold syncs. The standby code to process the cold sync response blindly adds the tracking entries for message queue groups. If two cold syncs are done, the tracking list can have duplicate entries. When controllers are rebooted back and forth, this list can get large (1000s of entries), and if another cluster node is rebooted and a q needs to move from there, 1000s of duplicate tracking messages are sent by msgd, which slows down the failover, and saMsgQueueOpen can take a long time. Fix is to not blindly add tracking entries during cold sync, but only add them if they are not already there. --- src/msg/msgd/mqd_mbcsv.c | 17 +++-- 1 file changed, 3 insertions(+), 14 deletions(-) diff --git a/src/msg/msgd/mqd_mbcsv.c b/src/msg/msgd/mqd_mbcsv.c index b87b038d9..5b0de15c8 100644 --- a/src/msg/msgd/mqd_mbcsv.c +++ b/src/msg/msgd/mqd_mbcsv.c @@ -1057,7 +1057,6 @@ static uint32_t mqd_ckpt_encode_cold_sync_data(MQD_CB *pMqd, MQD_OBJ_NODE *queue_record = 0; MQD_OBJ_INFO queue_obj_info; MQD_A2S_MSG cold_sync_data; - SaNameT queue_name; SaNameT queue_index_name; NCS_PATRICIA_NODE *q_node = 0; NCS_LOCK *q_rec_lock = >mqd_cb_lock; @@ -1075,7 +1074,6 @@ static uint32_t mqd_ckpt_encode_cold_sync_data(MQD_CB *pMqd, } memset(_obj_info, 0, sizeof(MQD_OBJ_INFO)); memset(_sync_data, 0, sizeof(MQD_A2S_MSG)); - memset(_name, 0, sizeof(SaNameT)); memset(_index_name, 0, sizeof(SaNameT)); /*First reserve space to store the number of checkpoints that will be @@ -1388,7 +1386,6 @@ static uint32_t mqd_a2s_make_record_from_coldsync(MQD_CB *pMqd, uint32_t rc = NCSCC_RC_SUCCESS; MQD_OBJ_NODE *q_obj_node = 0, *q_node = 0; - MQD_TRACK_OBJ *q_track_obj = 0; uint32_t index = 0; SaNameT record_qindex_name; MQD_OBJECT_ELEM *pOelm = 0; @@ -1458,17 +1455,9 @@ static uint32_t mqd_a2s_make_record_from_coldsync(MQD_CB *pMqd, /* Filling the track info to the queue database */ for (index = 0; index < q_data_msg.track_cnt; index++) { - q_track_obj = m_MMGR_ALLOC_MQD_TRACK_OBJ; - if (q_track_obj == NULL) { - LOG_CR("%s:%u: ERR_MEMORY: Failed To Allocate Memory", - __FILE__, __LINE__); - rc = NCSCC_RC_FAILURE; - return NCSCC_RC_FAILURE; - } - memset(q_track_obj, 0, sizeof(MQD_TRACK_OBJ)); - q_track_obj->dest = q_data_msg.track_info[index].dest; - q_track_obj->to_svc = q_data_msg.track_info[index].to_svc; - ncs_enqueue(_obj_node->oinfo.tlist, q_track_obj); + mqd_track_add(_obj_node->oinfo.tlist, + _data_msg.track_info[index].dest, + q_data_msg.track_info[index].to_svc); } if (new_record) rc = mqd_db_node_add(pMqd, q_obj_node); -- 2.13.6 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] cpnd: Correct duration of cpnd_tmr_start in cpnd_proc_update_remote [#2787]
signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 0/1] Review Request for cpnd: Correct duration of cpnd_tmr_start in cpnd_proc_update_remote [#2787]
signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 0/1] Review Request for msg: implement metadata size and limit fetch operations [#2626]
signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for plm: handle race condition for EE instantiation [#2514]
Summary: plm: handle race condition for EE instantiation [#2514] Review request for Ticket(s): 2514 Peer Reviewer(s): Ravi, Mathi Pull request to: Affected branch(es): develop Development branch: ticket-2514 Base revision: 52de0283e7ae33d948f26f37981f1c141ca0f448 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - revision 14dfc8f3e86559585b072a9c18025cb562caaeff Author: Alex Jones <alex.jo...@genband.com> Date: Tue, 2 Jan 2018 10:45:31 -0500 plm: handle race condition for EE instantiation [#2514] Child EE which is a controller can get shutdown because its parent EE (host) has not connected to PLM, yet. If the controller is a VM, and the host is a payload, there is a race condition when instantiating the EEs. If the host doesn't connect to PLM first, then when the controller EE (child of host EE) connects to PLM, it see that the host isn't instantiated, and shuts itself down. If the controller child EE instantiates before the host has connected to PLM, set a 20 second timer. If the host doesn't instantiate within this time, then all child EEs will be shut down. Complete diffstat: -- src/plm/common/plms_evt.h | 3 +- src/plm/plmd/plms_plmc.c | 79 +++ src/plm/plmd/plms_utils.c | 11 ++- 3 files changed, 91 insertions(+), 2 deletions(-) Testing Commands: - 1) Create some VMs and run plmc on all of them including the host 2) Make one of the VMs the controller 3) Boot them all up. Testing, Expected Results: -- 1) If controller VM EE connects to plmd before host does, make sure the VM doesn't shut itself off Conditions of Submission: - Jan 8 or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates th
[devel] [PATCH 1/1] plm: handle race condition for EE instantiation [#2514]
Child EE which is a controller can get shutdown because its parent EE (host) has not connected to PLM, yet. If the controller is a VM, and the host is a payload, there is a race condition when instantiating the EEs. If the host doesn't connect to PLM first, then when the controller EE (child of host EE) connects to PLM, it see that the host isn't instantiated, and shuts itself down. If the controller child EE instantiates before the host has connected to PLM, set a 20 second timer. If the host doesn't instantiate within this time, then all child EEs will be shut down. --- src/plm/common/plms_evt.h | 3 +- src/plm/plmd/plms_plmc.c | 79 +++ src/plm/plmd/plms_utils.c | 11 ++- 3 files changed, 91 insertions(+), 2 deletions(-) diff --git a/src/plm/common/plms_evt.h b/src/plm/common/plms_evt.h index 43f4748..e87c632 100644 --- a/src/plm/common/plms_evt.h +++ b/src/plm/common/plms_evt.h @@ -98,7 +98,8 @@ typedef enum { typedef enum { PLMS_TMR_NONE, PLMS_TMR_EE_INSTANTIATING, - PLMS_TMR_EE_TERMINATING + PLMS_TMR_EE_TERMINATING, + PLMS_TMR_EE_HOST_INSTANTIATED } PLMS_TMR_EVT_TYPE; typedef struct plms_imm_admin_op { diff --git a/src/plm/plmd/plms_plmc.c b/src/plm/plmd/plms_plmc.c index 06c8d4b..c310a86 100644 --- a/src/plm/plmd/plms_plmc.c +++ b/src/plm/plmd/plms_plmc.c @@ -50,6 +50,8 @@ static SaUint32T plms_os_info_resp_mngt_flag_clear(PLMS_ENTITY *); static void plms_insted_dep_immi_failure_cbk_call(PLMS_ENTITY *, PLMS_GROUP_ENTITY *); static void plms_is_dep_set_cbk_call(PLMS_ENTITY *); + +static void plms_ee_stop_host_timer(PLMS_ENTITY *); /** @brief : Process instantiating event from PLMC. 1. Do the OS verification irrespective of previous state. @@ -346,6 +348,7 @@ SaUint32T plms_plmc_tcp_connect_process(PLMS_ENTITY *ent) if ((SA_PLM_EE_ADMIN_LOCKED_INSTANTIATION == ent->entity.ee_entity.saPlmEEAdminState) || ((NULL != ent->parent) && + ent->parent->entity_type != PLMS_EE_ENTITY && (plms_is_rdness_state_set(ent->parent, SA_PLM_READINESS_OUT_OF_SERVICE))) || (!plms_min_dep_is_ok(ent))) { @@ -379,6 +382,19 @@ SaUint32T plms_plmc_tcp_connect_process(PLMS_ENTITY *ent) return NCSCC_RC_FAILURE; } + if (ent->parent && ent->parent->entity_type == PLMS_EE_ENTITY && + plms_is_rdness_state_set(ent->parent, SA_PLM_READINESS_OUT_OF_SERVICE)) { + LOG_IN("host EE not instantiated yet: starting timer"); + ent->tmr.tmr_type = PLMS_TMR_EE_HOST_INSTANTIATED; + ret_err = plms_timer_start(>tmr.timer_id, + ent, + SA_TIME_ONE_SECOND * 20); + if (ret_err != NCSCC_RC_SUCCESS) { + LOG_ER("failed to start host EE instantiated timer"); + return ret_err; + } + } + if (plms_is_rdness_state_set(ent, SA_PLM_READINESS_IN_SERVICE)) { TRACE("Ent %s is already in insvc.", ent->dn_name_str); return NCSCC_RC_SUCCESS; @@ -532,6 +548,13 @@ SaUint32T plms_plmc_tcp_connect_process(PLMS_ENTITY *ent) ret_err = plms_plmc_unlck_insvc(ent, trk_info, aff_ent_list_flag, is_set); } + + /* If this is a host EE, stop timer for all child EEs */ +if (ret_err == NCSCC_RC_SUCCESS && ent->entity_type == PLMS_EE_ENTITY && + ent->leftmost_child) { + plms_ee_stop_host_timer(ent->leftmost_child); +} + TRACE_LEAVE2("Return Val: %d", ret_err); return ret_err; } @@ -1052,6 +1075,12 @@ SaUint32T plms_plmc_get_os_info_response(PLMS_ENTITY *ent, to insvc.*/ ret_err = plms_plmc_unlck_insvc( ent, trk_info, aff_ent_list_flag, is_set); + + /* If this is a host EE, stop timer for all child EEs */ + if (ret_err == NCSCC_RC_SUCCESS && ent->entity_type == PLMS_EE_ENTITY && + ent->leftmost_child) { + plms_ee_stop_host_timer(ent->leftmost_child); + } } } } else { @@ -2658,6 +2687,28 @@ SaUint32T plms_ee_term_failed_tmr_exp(PLMS_ENTITY *ent) TRACE_LEAVE2("Return Val: %d", ret_err); return ret_err; } + +SaUint32T plms_ee_host_instantiate_tmr_exp(PLMS_ENTITY *ent) +{ + SaUint32T ret_err = NCSCC_RC_SUCCESS; + + TRACE_ENTER2("Entity: %s",ent->dn_name_str); + + if
[devel] [PATCH 0/1] Review Request for plm: don't set readiness state to in-service if EE is terminating [#2734]
Summary: plm: don't set readiness state to in-service if EE is terminating [#2734] Review request for Ticket(s): 2734 Peer Reviewer(s): Mathi, Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2734 Base revision: 3c636068409de2fcb21ffeda839125809c5d1a0c Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision 1a3ad81467d91b4f98b76657821e256645a3e5ab Author: Alex Jones <alex.jo...@genband.com> Date: Wed, 13 Dec 2017 08:39:08 -0500 plm: don't set readiness state to in-service if EE is terminating [#2734] If an EE goes down during a controller switchover the TERMINATED message sent by plmc to plmd may not be received because of the switch over. In this case the EE will be stuck in terminating presence state. If any parent of the EE is in OOS, then we can definitely set the presence state to UNINSTANTIATED after the switchover. If not, then we can just set the management-lost flag because we don't know whether or not the EE terminated. Complete diffstat: -- src/plm/plmd/plms_stdby.c | 72 +++ src/plm/plmd/plms_utils.c | 53 -- 2 files changed, 110 insertions(+), 15 deletions(-) Testing Commands: - 1) Reboot a bunch of VMs including the controller. 2) After the controller failover, using immlist, check the presence state of the EEs that were rebooted Testing, Expected Results: -- EE presence state for rebooted VMs should not be stuck in TERMINATING Conditions of Submission: - Dec. 19 or ack from developer. Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://s
[devel] [PATCH 1/1] plm: don't set readiness state to in-service if EE is terminating [#2734]
If an EE goes down during a controller switchover the TERMINATED message sent by plmc to plmd may not be received because of the switch over. In this case the EE will be stuck in terminating presence state. If any parent of the EE is in OOS, then we can definitely set the presence state to UNINSTANTIATED after the switchover. If not, then we can just set the management-lost flag because we don't know whether or not the EE terminated. --- src/plm/plmd/plms_stdby.c | 72 +++ src/plm/plmd/plms_utils.c | 53 -- 2 files changed, 110 insertions(+), 15 deletions(-) diff --git a/src/plm/plmd/plms_stdby.c b/src/plm/plmd/plms_stdby.c index d1a0d27..52f5701 100644 --- a/src/plm/plmd/plms_stdby.c +++ b/src/plm/plmd/plms_stdby.c @@ -56,6 +56,76 @@ plms_perform_pending_admin_clbk(PLMS_ENTITY_GROUP_INFO_LIST *grp_list, PLMS_CKPT_TRACK_STEP_INFO *track_step); void plms_process_client_down_list(); +static void modify_presence_state(PLMS_ENTITY *ee) +{ + bool set = false; + PLMS_ENTITY *parent = ee->parent; + + TRACE_ENTER(); + + while (parent) { + if ((parent->entity_type == PLMS_EE_ENTITY && + parent->entity.ee_entity.saPlmEEReadinessState == + SA_PLM_READINESS_OUT_OF_SERVICE) || + (parent->entity_type == PLMS_HE_ENTITY && + parent->entity.he_entity.saPlmHEReadinessState == + SA_PLM_READINESS_OUT_OF_SERVICE)) { + plms_presence_state_set(ee, + SA_PLM_EE_PRESENCE_UNINSTANTIATED, + parent, + SA_NTF_OBJECT_OPERATION, + SA_PLM_NTFID_STATE_CHANGE_ROOT); + set = true; + break; + } + + parent = parent->parent; + } + + if (!set) { + plms_readiness_flag_mark_unmark(ee, + SA_PLM_RF_MANAGEMENT_LOST, + true, + ee, + SA_NTF_OBJECT_OPERATION, + SA_PLM_NTFID_STATE_CHANGE_ROOT); + } + + TRACE_LEAVE(); +} + +static void check_presence_state(void) +{ + /* +* If an EE was in the middle of terminating, and we switchover, we may +* not get the notification from the EE that it has terminated. It's +* probably not a good idea to restart the termination-failed timer +* because we don't know if the EE already terminated. If there is a +* parent, and it is OOS, then we can definitely set the state to +* UNINSTANTIATED. Otherwise, let's just set the readiness flag to +* management-lost. +*/ + PLMS_CB *cb = plms_cb; + PLMS_ENTITY *plm_ent = (PLMS_ENTITY *)ncs_patricia_tree_getnext( + >entity_info, 0); + + TRACE_ENTER(); + + while (plm_ent) { + if (plm_ent->entity_type == PLMS_EE_ENTITY) { + if (plm_ent->entity.ee_entity.saPlmEEPresenceState == + SA_PLM_EE_PRESENCE_TERMINATING) { + modify_presence_state(plm_ent); + } + } + + plm_ent = (PLMS_ENTITY *)ncs_patricia_tree_getnext( + >entity_info, (SaUint8T *)_ent->dn_name); + } + + TRACE_LEAVE(); +} + /*** * Name :plms_proc_standby_active_role_change * @@ -89,6 +159,8 @@ SaUint32T plms_proc_standby_active_role_change() plms_process_client_down_list(); + check_presence_state(); + cb->is_initialized = true; TRACE_LEAVE(); diff --git a/src/plm/plmd/plms_utils.c b/src/plm/plmd/plms_utils.c index d09d94e..d3479e4 100644 --- a/src/plm/plmd/plms_utils.c +++ b/src/plm/plmd/plms_utils.c @@ -3009,9 +3009,14 @@ void plms_move_chld_ent_to_insvc(PLMS_ENTITY *chld_ent, SaUint8T inst_chld_ee, SaUint8T inst_dep_ee) { SaUint32T ret_err; + + TRACE_ENTER(); + /* Terminating condition. */ - if (NULL == chld_ent) + if (NULL == chld_ent) { + TRACE_LEAVE(); return; + } /* If chld_ent is already insvc then return.*/ if (plms_is_rdness_state_set(chld_ent, SA_PLM_READINESS_IN_SERVICE)) { @@ -3040,6 +3045,7 @@ void plms_move_chld_ent_to_insvc(PLMS_ENTITY *chld_ent, LOG_ER("Entity %s can not be moved to insvc, as parent is \ not in service", chld_ent->dn_name_str); + TRACE_LEAVE();
Re: [devel] [PATCH 0/1] Review Request for plm: handle plmc clients which abruptly terminated [#2529]
signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1/1] clmd: add dynamically created EEs to PLM entity group on standby [#2730]
signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] plm: handle plmc clients which abruptly terminated [#2529]
In virtual environments nodes can reboot very quickly (less than 1 minute). If the reboot is abrupt, plmd may not be aware that the EE went down until after it has already come back up because plmd relies on the TCP connection to plmcd on the node. In this case, plmd will set the readiness state to OOS after the EE is already back up. This causes CLM to evict the node from the cluster. plmd should use TCP_USER_TIMEOUT to notice that plmcd has exited abruptly. This enhancement also refactors the threading involved with handling the plm clients, to support a large number of them. --- src/plm/plmcd/plmc.h |3 + src/plm/plmcd/plmc_lib.c | 94 +-- src/plm/plmcd/plmc_lib_internal.c | 1313 +++-- src/plm/plmcd/plmc_lib_internal.h | 13 +- src/plm/plmcd/plmc_read_config.c | 16 + src/plm/plmcd/plmcd.conf |8 + src/plm/plmd/plms_plmc.c |4 + 7 files changed, 618 insertions(+), 833 deletions(-) diff --git a/src/plm/plmcd/plmc.h b/src/plm/plmcd/plmc.h index e02523f..6145f89 100644 --- a/src/plm/plmcd/plmc.h +++ b/src/plm/plmcd/plmc.h @@ -37,6 +37,7 @@ #define KEEPIDLE_TIME 7200 #define KEEPALIVE_INTVL 75 #define KEEPALIVE_PROBES 9 +#define USER_TIMEOUT 5000 /* Tag value and message data lengths. */ #define PLMC_MAX_TAG_LEN 256 @@ -92,6 +93,7 @@ typedef enum { TCP_KEEPIDLE_TIME, TCP_KEEPALIVE_INTVL, TCP_KEEPALIVE_PROBES, + TCP_USER_TIMEOUT_VALUE } PLMC_config_tags; /* This struct holds the contents of the plmcd.conf configuration file. */ @@ -113,6 +115,7 @@ typedef struct { int tcp_keepidle_time; int tcp_keepalive_intvl; int tcp_keepalive_probes; + int tcp_user_timeout; } PLMC_config_data; /* The PLMC daemon command numerical index. */ diff --git a/src/plm/plmcd/plmc_lib.c b/src/plm/plmcd/plmc_lib.c index 5b3f11a..99574ea 100644 --- a/src/plm/plmcd/plmc_lib.c +++ b/src/plm/plmcd/plmc_lib.c @@ -22,6 +22,8 @@ #include #include #include +#include +#include "base/logtrace.h" #include "plm/plmcd/plmc_lib_internal.h" #include "plm/plmcd/plmc_cmds.h" @@ -44,8 +46,6 @@ int do_command(char *ee_id, int (*cb)(tcp_msg *), char *cmd, PLMC_cmd_idx cmd_enum) { thread_entry *tentry; - pthread_attr_t client_mgr_attr; - pthread_t plmc_client_mgr_id; tentry = find_thread_entry(ee_id); if (tentry == NULL) { @@ -57,15 +57,6 @@ int do_command(char *ee_id, int (*cb)(tcp_msg *), char *cmd, "lock for a client"); return (PLMC_API_LOCK_FAILED); } - /* Check if there are pending work */ - if (tentry->thread_d.done == 0) { - if (pthread_mutex_unlock(>thread_d.td_lock) != 0) { - syslog(LOG_ERR, "plmc_lib: encountered an error " - "unlocking a mutex for a client"); - return (PLMC_API_UNLOCK_FAILED); - } - return (PLMC_API_CLIENT_BUSY); - } /* Check if there is valid socket */ if (tentry->thread_d.socketfd == 0) { @@ -80,27 +71,8 @@ int do_command(char *ee_id, int (*cb)(tcp_msg *), char *cmd, strncpy(tentry->thread_d.command, cmd, PLMC_CMD_NAME_MAX_LENGTH); tentry->thread_d.command[PLMC_CMD_NAME_MAX_LENGTH - 1] = '\0'; tentry->thread_d.callback = cb; - tentry->thread_d.done = 0; - - /* Initialize and start the client_mgr_thread */ - pthread_attr_init(_mgr_attr); - pthread_attr_setdetachstate(_mgr_attr, PTHREAD_CREATE_DETACHED); - if (pthread_create(&(plmc_client_mgr_id), _mgr_attr, - plmc_client_mgr, (void *)tentry) != 0) { - syslog(LOG_ERR, "plmc_lib: Could not create a " - "new client mgr thread for connection"); - send_error(PLMC_LIBERR_SYSTEM_RESOURCES, - PLMC_LIBACT_CLOSE_SOCKET, ee_id, cmd_enum); - /* Unlock mutex */ - if (pthread_mutex_unlock(>thread_d.td_lock) != 0) { - syslog(LOG_ERR, "plmc_lib: encountered an error " - "unlocking when updated " - "thread_id"); - } - return (PLMC_API_FAILURE); - } - /* Update the thread_entry with the thread ID */ - tentry->thread_d.td_id = plmc_client_mgr_id; + + plmc_client_mgr(tentry); /* Unlock */ if (pthread_mutex_unlock(>thread_d.td_lock) != 0) { @@ -164,23 +136,27 @@ int plmc_initialize(int (*connect_cb)(char *, char *), int (*udp_cb)(udp_msg *), callbacks.udp_cb = udp_cb; callbacks.err_cb = err_cb; - /* Set these threads detached as we don't want to join them */ pthread_attr_init(); - pthread_attr_setdetachstate(, PTHREAD_CREATE_DETACHED); + + tcp_listener_stop_fd = eventfd(0, EFD_NONBLOCK); + +
[devel] [PATCH 0/1] Review Request for plm: handle plmc clients which abruptly terminated [#2529]
Summary: plm: handle plmc clients which abruptly terminated [#2529] Review request for Ticket(s): 2529 Peer Reviewer(s): Mathi, Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2529 Base revision: 37983760835c40056c0a2d404e47f17f2a50b102 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - revision caa9f9f93e507748ec6fb43c97d83967f4c6045b Author: Alex Jones <alex.jo...@genband.com> Date: Thu, 7 Dec 2017 11:31:46 -0500 plm: handle plmc clients which abruptly terminated [#2529] In virtual environments nodes can reboot very quickly (less than 1 minute). If the reboot is abrupt, plmd may not be aware that the EE went down until after it has already come back up because plmd relies on the TCP connection to plmcd on the node. In this case, plmd will set the readiness state to OOS after the EE is already back up. This causes CLM to evict the node from the cluster. plmd should use TCP_USER_TIMEOUT to notice that plmcd has exited abruptly. This enhancement also refactors the threading involved with handling the plm clients, to support a large number of them. Complete diffstat: -- src/plm/plmcd/plmc.h |3 + src/plm/plmcd/plmc_lib.c | 94 +-- src/plm/plmcd/plmc_lib_internal.c | 1313 +++-- src/plm/plmcd/plmc_lib_internal.h | 13 +- src/plm/plmcd/plmc_read_config.c | 16 + src/plm/plmcd/plmcd.conf |8 + src/plm/plmd/plms_plmc.c |4 + 7 files changed, 618 insertions(+), 833 deletions(-) Testing Commands: - In a virtualized environment, abruptly reboot a payload node (e.g. using reboot -f) Testing, Expected Results: -- The EE presence state should be UNINSTANTIATED within 5 seconds, and the node should come back into the cluster Conditions of Submission: - Dec 13 or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your cha
[devel] [PATCH 0/1] Review Request for clmd: add dynamically created EEs to PLM entity group on standby [#2730]
Summary: clmd: add dynamically created EEs to PLM entity group on standby [#2730] Review request for Ticket(s): 2730 Peer Reviewer(s): Anders, Hans, Mathi, Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2730 Base revision: 9ab54933456632260be87c2c763bd36b1ab7e5d2 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): - revision be1f5e166884737b1786b4eab3a47f82c54e47f8 Author: Alex Jones <alex.jo...@genband.com> Date: Wed, 6 Dec 2017 11:58:33 -0500 clmd: add dynamically created EEs to PLM entity group on standby [#2730] If EEs and corresponding CLM nodes are dynamically created, after a middleware si-swap when the former standby has become active, then one of those EEs is rebooted, clmd has not enabled PLM readiness state tracking on the EE and will not know when it comes back. Thus, the node will not be allowed back into the cluster because it thinks it is not a member. The dynamically created EE is not being added to the PLM entity group on the standby. Add the dynamically created EE to the PLM entity group on the standby. Complete diffstat: -- src/clm/clmd/clms_evt.c | 2 +- src/clm/clmd/clms_imm.c | 2 +- src/clm/clmd/clms_mbcsv.c | 20 +++- 3 files changed, 21 insertions(+), 3 deletions(-) Testing Commands: - 1) dynamically create a CLM node with an EE 2) middleware si-swap 3) reboot the EE node Testing, Expected Results: -- It should come back into the cluster Conditions of Submission: - Dec 12, or ack from developer Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://s
[devel] [PATCH 1/1] clmd: add dynamically created EEs to PLM entity group on standby [#2730]
If EEs and corresponding CLM nodes are dynamically created, after a middleware si-swap when the former standby has become active, then one of those EEs is rebooted, clmd has not enabled PLM readiness state tracking on the EE and will not know when it comes back. Thus, the node will not be allowed back into the cluster because it thinks it is not a member. The dynamically created EE is not being added to the PLM entity group on the standby. Add the dynamically created EE to the PLM entity group on the standby. --- src/clm/clmd/clms_evt.c | 2 +- src/clm/clmd/clms_imm.c | 2 +- src/clm/clmd/clms_mbcsv.c | 20 +++- 3 files changed, 21 insertions(+), 3 deletions(-) diff --git a/src/clm/clmd/clms_evt.c b/src/clm/clmd/clms_evt.c index 4d7010d..b65036d 100644 --- a/src/clm/clmd/clms_evt.c +++ b/src/clm/clmd/clms_evt.c @@ -992,7 +992,7 @@ static uint32_t proc_mds_node_evt(CLMSV_CLMS_EVT *evt) if (delete_existing_nodedown_records(node_id) == true) { TRACE_LEAVE(); return rc; - } else if (node->member == SA_FALSE) { + } else if (node->member == SA_FALSE && node->admin_state != SA_CLM_ADMIN_UNLOCKED) { /* One possibility is that an admin operation has made * this a non-member */ TRACE_LEAVE(); diff --git a/src/clm/clmd/clms_imm.c b/src/clm/clmd/clms_imm.c index 6809ce8..c245f67 100644 --- a/src/clm/clmd/clms_imm.c +++ b/src/clm/clmd/clms_imm.c @@ -2291,7 +2291,7 @@ SaAisErrorT clms_node_ccb_apply_cb(CcbUtilOperationData_t *opdata) rc = saPlmEntityGroupRemove(clms_cb->ent_group_hdl, entityNames, 1); if (rc != SA_AIS_OK) { - LOG_ER("saPlmEntityGroupAdd FAILED rc = %d", + LOG_ER("saPlmEntityGroupRemove FAILED rc = %d", rc); return rc; } diff --git a/src/clm/clmd/clms_mbcsv.c b/src/clm/clmd/clms_mbcsv.c index 47e4494..6976b03 100644 --- a/src/clm/clmd/clms_mbcsv.c +++ b/src/clm/clmd/clms_mbcsv.c @@ -282,6 +282,9 @@ static uint32_t ckpt_proc_node_csync_rec(CLMS_CB *cb, CLMS_CKPT_REC *data) CLMSV_CKPT_NODE *param = >param.node_csync_rec; CLMS_CLUSTER_NODE *node = NULL, *tmp_node = NULL; uint32_t rc = NCSCC_RC_SUCCESS; +#ifdef ENABLE_AIS_PLM + SaNameT *entityNames = NULL; +#endif TRACE_ENTER2("node_name:%s", param->node_name.value); @@ -315,6 +318,21 @@ static uint32_t ckpt_proc_node_csync_rec(CLMS_CB *cb, CLMS_CKPT_REC *data) LOG_ER("Patricia add failed"); } } +#ifdef ENABLE_AIS_PLM + /* Add it to the plm entity group */ + entityNames = >ee_name; + if (clms_cb->reg_with_plm == SA_TRUE) { + SaAisErrorT aisrc = saPlmEntityGroupAdd( + clms_cb->ent_group_hdl, + entityNames, + 1, + SA_PLM_GROUP_SINGLE_ENTITY); + if (aisrc != SA_AIS_OK) { + LOG_ER("saPlmEntityGroupAdd FAILED rc = %d", + aisrc); + } + } +#endif } TRACE_LEAVE(); return NCSCC_RC_SUCCESS; @@ -357,7 +375,7 @@ static uint32_t ckpt_proc_node_del_rec(CLMS_CB *cb, CLMS_CKPT_REC *data) rc = saPlmEntityGroupRemove(clms_cb->ent_group_hdl, entityNames, 1); if (rc != SA_AIS_OK) { - LOG_ER("saPlmEntityGroupAdd FAILED rc = %d", rc); + LOG_ER("saPlmEntityGroupRemove FAILED rc = %d", rc); return rc; } } -- 2.9.5 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 0/1] Review Request for plm: remove child EE info when given standby role [#2710]
signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1/1] plmd: fix mbc in PLM [#2724]
MBC isn't working in PLM, so no info is being checkpointed to the standby plmd. When the code to handle more than 2 SCs was put in to PLM, the MBC selection object was gotten at a later time -- after the while loop containing the "poll" system call. Thus, the mbc file descriptor was never being set in the poll call. Move the setting of the mbc file descriptor to inside the while loop, so it gets set. --- src/plm/plmd/plms_main.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/plm/plmd/plms_main.c b/src/plm/plmd/plms_main.c index b512741..23b0194 100644 --- a/src/plm/plmd/plms_main.c +++ b/src/plm/plmd/plms_main.c @@ -482,12 +482,13 @@ int main(int argc, char *argv[]) fds[FD_AMF].fd = plms_cb->nid_started ? plms_cb->usr1_sel_obj.rmv_obj : plms_cb->amf_sel_obj; fds[FD_AMF].events = POLLIN; - fds[FD_MBCSV].fd = plms_cb->mbcsv_sel_obj; - fds[FD_MBCSV].events = POLLIN; fds[FD_MBX].fd = mbx_fd.rmv_obj; fds[FD_MBX].events = POLLIN; while (1) { + fds[FD_MBCSV].fd = plms_cb->mbcsv_sel_obj; + fds[FD_MBCSV].events = POLLIN; + if (plms_cb->oi_hdl != 0) { fds[FD_IMM].fd = plms_cb->imm_sel_obj; fds[FD_IMM].events = POLLIN; -- 2.9.5 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0/1] Review Request for plmd: fix mbc in PLM [#2724]
Summary: plmd: fix mbc in PLM [#2724] Review request for Ticket(s): 2724 Peer Reviewer(s): Mathi, Ravi Pull request to: Affected branch(es): develop Development branch: ticket-2724 Base revision: d40172a1afb2f95afdb6b6b5cf4804d559ac6c50 Personal repository: git://git.code.sf.net/u/trguitar/review Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision 10e87432f563f5a4c30e584e12c6ce82662ba8c1 Author: Alex Jones <alex.jo...@genband.com> Date: Fri, 1 Dec 2017 14:37:20 -0500 plmd: fix mbc in PLM [#2724] MBC isn't working in PLM, so no info is being checkpointed to the standby plmd. When the code to handle more than 2 SCs was put in to PLM, the MBC selection object was gotten at a later time -- after the while loop containing the "poll" system call. Thus, the mbc file descriptor was never being set in the poll call. Move the setting of the mbc file descriptor to inside the while loop, so it gets set. Complete diffstat: -- src/plm/plmd/plms_main.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) Testing Commands: - Bring up 2 controllers with tracing on. Testing, Expected Results: -- Make sure MBC sync is done when standby plmd comes online. Conditions of Submission: - Dec 7, or developer ack. Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel